The challenge

Provide convenient and secure access to data

As the key collection point for personal, economic and geospatial information, governments are uniquely positioned to generate valuable data, and to benefit from insights drawn from it as an input to policy decision making. The challenge was to provide convenient and secure access to data combined from different content silos, while guaranteeing the privacy of people’s information and allowing custodians adequate control of their data assets.

Our response

Privacy by design approach

Pivotal to the L-LEEDS system was a privacy by design approach, leading to a distributed federated model to provide security and control of data assets. Each data silo remains entirely within its government department, and access to the data can be turned off like a tap if and when required. To enable this vision, Data61 was able to draw on pre-existing technologies and develop new techniques to complete a proof-of-concept system.

Data silos are joined using incremental and persistent anonymous linkage, meaning that links are changed when the data changes, and don’t need to be regenerated each time a user accesses the databases, and access to the data is provided by SQL database queries, direct access, or via a re-identification resistant perturbation. Finally, the system provides a convenient workflow to enable analysts to identify their data requirements, apply for access to the appropriate custodians, receive approval and access the resulting combined data.

The results

Cloud deployed proof-of-concept system

L-LEEDS is a cloud deployed proof-of-concept system that uses a synthetic L-LEED dataset generated using correlations between data attributes across the Australian Taxation Office (ATO) and the Department of Education and Training (DET) databases. It provides an approved analyst with access to the data via a workflow user interface (UI) and through SQL queries, Protari, or the Application Program Interface (API) to allow analysis of queries using a Jupyter notebook. Approval of analyst requests are provided by government department data curators using the L-LEEDS workflow system.

Testing showed that while performance of the proof-of-concept system is sufficient for data sets up to approx. 1 million people, it does not yet scale as required with large datasets (e.g. 10 million people's data over a period of 18 years), due to the computational and data transfer requirements of returning and joining such large datasets from remote databases.

Further work on L-LEEDS would include enabling scaling to very large data sets, and to expand the system to include either real or synthetic data sets from multiple government departments.

Do business with us to help your organisation thrive

We partner with small and large companies, government and industry in Australia and around the world.

Contact us now to start doing business

Contact Data61

How can we help you create your data-driven future? Use the form below to send us a message.
Your contact details
0 / 100
0 / 1900
You shouldn't be able to see this field. Please try again and leave the field blank.

For security reasons attachments are not accepted.