The challenge

Extract conservation knowledge from PDF documents

Much of Australia's knowledge about threatened species listed under the Environmental Protection and Biodiversity Conservation (EPBC) Act 1999 occurs only in official ministerial advice documents that designate the conservation status of these species.

The challenge was to extract conservation knowledge from PDF documents and store this in a re-usable database. The database needed to be independent (i.e. for general purposes) of specific downstream data analyses.

Given the complexities of this task (involving NLP, AI, User Experience, and Software engineering), this project focused on a feasibility study to see which state-of-the-art components would serve best for this problem, and was conducted in collaboration with the Department of Agriculture, Water and Environment.

Our response

Develop a platform

We approached this task in an empirical manner. We identified the different candidate components that could contribute to an overall solution. Where possible, we used best practice from research and engineering methodology to gauge the performance of these components.

We developed a general platform, tested with DAWE documents, that performs knowledge extraction from PDFs.

Leadbeater Possum - Photograph: Zoos Victoria

The results

Preliminary platform for knowledge extraction from PDF

We developed a preliminary platform for knowledge extraction from PDFs (as demonstrated on DAWE data) and a proof-of-concept prototype for the Department. We have also identified which components of the data processing pipeline worked best in this context.

There is a technical report for this pipeline, and the work was presented at the 2020 Australian Public Sector Innovation Month seminar series.

In future work, we will identify how to transform the extracted knowledge into actionable data for DAWE workflow, as this will involve an in-depth understanding of the user's tasks and processes.

We are in discussion with the DAWE for a follow-on project. The scope includes using this prototype to unify conservation knowledge across state, territory and Commonwealth jurisdictions; and developing a task-specific interface to utilise the extracted knowledge (such as AI-assistant for authoring conservation documents).

Do business with us to help your organisation thrive

We partner with small and large companies, government and industry in Australia and around the world.

Contact us now to start doing business

Contact Data61

How can we help you create your data-driven future? Use the form below to send us a message.
Your contact details
0 / 100
0 / 1900
You shouldn't be able to see this field. Please try again and leave the field blank.

For security reasons attachments are not accepted.