Balancing public good with privacy and confidentiality
Many organisations have large databases of private or confidential information. They want to allow analysis of their data to help them make better decisions, and publication of the results of the analysis, for public good to benefit individuals.
These organisations, or data custodians, may include:
- health services such as hospitals, Medicare and state government health departments
- social welfare agencies such as Centrelink and the Department of Families, Housing, Community Services and Indigenous Affairs
- financial institutions such as banks or tax departments
- national statistical services such as the Australian Bureau of Statistics.
While data use agreements, access controls, and other technological approaches provide privacy protection during access by researchers, there can remain privacy issues associated with analysis outputs appearing in the academic literature. Such outputs can sometimes be “re-engineered” to reveal private or sensitive information about individuals.
We have identified the need for broadly applicable and easily understandable guidelines for anonymising the outputs of statistical analyses, in addition to other privacy protections.
Guidelines including anonymity tests and treatments
We reviewed solutions and approaches being applied around the world. Online data centres (also called virtual data centres or data enclaves) are an increasingly popular choice for making confidential data available for research. Such centres provide good confidentiality protection during access by trusted researchers.
However, many of these online data centres still rely on manual checking of outputs by an expert. This solution can be expensive and inefficient, and is becoming increasingly difficult to sustain as the number of datasets and researchers grows.
Our solution has been to develop guidelines designed for use by researchers to anonymise their own analysis outputs. The guidelines are presented in the form of a set of tests and a checklist designed to be applied by researchers who are not necessarily expert in statistics or statistical disclosure control, and to eventually be further refined and automated.
Opening up access to data
Use of the guidelines could help to open up access to data currently not available for research and policy analysis, due the lack of sufficient experts to manually anonymise analysis outputs.
The guidelines were developed in collaboration with the Sax Institute, for use in the Secure Unified Research Environment (SURE).