The challenge

Automating a manual process

Historically, transformations have been performed based on correspondence tables based on population counts. These correspondences have previously been created manually by the ABS and distributed in the form of Excel spreadsheets. Users were responsible for applying these to their data to perform transformations. In cases where correspondences are not available, the user would need to develop one themselves or apply to the ABS to get a custom one created. These manual processes are time consuming for both the ABS and the analyst and do not provide a means for assessing the quality of the transformed data.

Our response

Application Programming Interface

YDYR provides a web Application Programming Interface (API) that performs these transformations. Users submit their data to the system specifying the input and desired output geographies, and the system performs the transformation and returns the results including estimates of the quality of the transformation. Rather than using a static correspondence, YDYR uses machine learning to build a custom transformation for the user’s data against ancillary data that resides within the system.

The API also allows the addition of new geographies and ancillary data, and as well exposing new improved methods, YDYR also implements the traditional population based correspondence methods currently provided by the ABS.

The results

Improved quality

YDYR can improve the quality of transformations of data between geographies and provide information about their quality. As an automated API it has the potential to remove much of the manual effort required to create correspondences and apply them to data. It can be accessed both graphically (via a web application) and pro grammatically, allowing it be integrated into other automated workflows and ‘wrapped’ in extension modules for programming libraries.

YDYR also has a web interface for users to interact directly with the API without needing to know anything about the API, making these methods available to a non-technical audience who may have traditionally struggled with performing this work.

The machine learning phase of performing a transformation can be computationally expensive and finding cost effective ways of hosting YDYR for mass consumption will be challenging. Versioning and maintaining the ancillary data and geographies hosted by the system is also a challenge, as is the issue of determining how best to provide this to a mass audience securely.

In future, we will look to focus more on providing transformations for various other data domains (e.g. agricultural and environment data, environmental data).

Do business with us to help your organisation thrive

We partner with small and large companies, government and industry in Australia and around the world.

Contact us now to start doing business

Contact Data61

How can we help you create your data-driven future? Use the form below to send us a message.
Your contact details
0 / 100
0 / 1900
You shouldn't be able to see this field. Please try again and leave the field blank.

For security reasons attachments are not accepted.