The challenge
Quantify risk in aggregated data
The initial challenge put to Data61 in an earlier project was to quantify risk in aggregated data. A risk metric was devised, which was applied in an actual use case within APRA. APRA publishes a number of aggregated reports derived from individual contributor's (company, fund, bank, etc.) provided data. However, some of those aggregated values are assessed as carrying risk of learning something of the individual contributors. Those values are hence suppressed. Additionally, some additional related cell values need to be suppressed due to the ability to reconstruct other suppressed cells. The challenge is to understand the residual risk in suppressed cells and whether the risk is controlled.
Our response
Produce software
The response was to produce software that measures the risk of every suppressed cell being reconstructed or partially reconstructed from within the APRA published data itself. This problem required consideration of the relationships of data in publications, capturing those relationships, attempting to reconstruct values, and then analysing the risk of reconstruction.
Proof Of Concept (POC) software has been produced and delivered that is suitable for one specific APRA publication. The software uses an APRA publication specification to construct most of the relationships in the published data. A number of relationships in the data were not directly captured in that document, but rather in the contributor input specifications. It was discovered some relationships are not captured anywhere in APRA workflows and had to be manually generated.
The results
POC software has been delivered to APRA for trial
The POC software has been delivered to APRA for trial, however due to COVID and difficulties in running third party software within the APRA IT environment, APRA has only recently begun testing the software and assessing the functionality.
APRA is currently assessing the proof of concept software to evaluate the usefulness of identify risks. It is also considering current workflows and how publications are generated and will consider whether the technology might play a role in future workflow designs.