Anonlink is a suite of technologies that allows two organisations to carry out private record linkage - finding matching records of entities between their respective datasets without disclosing personally identifiable information (PII).

Organisations wishing to link matching entities across disparate datasets use the Anonlink client to create linking codes, and send them to a central service that performs entity resolution and returns a linkage. The central service only receives encrypted data, so no PII ever crosses organisational boundaries. Linkages are of similar accuracy to those generated by non-private probabilistic linking, and Anonlink is fast & efficient at generating linkages - currently scaling to millions of records.

Anonlink offers a valuable new approach to creating linked data assets, which conceals identifying information and protects the privacy of individuals or entities represented in data, allowing organisations to fuse and unlock insights from data whilst remaining compliant.

Features and benefits

Link records without sharing personal information

Anonlink allows two parties to create indexes linking records across separate datasets, whilst sharing only encrypted information externally. Personally identifiable information such as names, addresses, and phone numbers, get transformed by each party into an anonymous linking code, before being uploaded to an entity service to generate the linkage.

Fuzzy matching

Anonlink uses a probabilistic matching technique, calculating similarity across pairs of records from two datasets, and can return either raw similarity scores or a proposed mapping based on best-match above a user-defined similarity threshold. This approach means that record linkage can still be carried out in the presence of errors in the personally identifiable information, and that users have some control over the trade-off between precision and recall when matching.

Easily integrated and automated

Anonlink is highly modular, consisting of different libraries for generating hashes, calculating similarity scores, and offering an entity service that the clients can request mappings from. The encoding and entity services communicate with each other via well-documented interfaces (REST APIs) accepting only limited parameters. This means that not only are individual components easily swapped out for customised versions, or independently developed alternatives, but that all steps in the linkage process are highly susceptible to being automated.

Scalable and cloud-ready

Anonlink's entity service is designed for deployment to a cluster environment, and readily scales horizontally. The system has been designed to scale across multiple nodes and handle node failure without data loss. The entity service has been tested with 35 million entities, and benchmarking of one million x one million record linkage completes in under five minutes. Linking even larger datasets is simply a matter of adding resources to a cluster, and performance scales close to linearly with resources.

Bar chart showing time taken to match 100,000 records given a number of workers, showing time decreasing slightly less than linearly as more computing resources are added

Open source and cross-platform

Anonlink is open source under the Apache 2.0 license: the libraries for cryptographic linkage key hashing & client requests - CLKhash, anonymous linkage - Anonlink, and matching - EntityService, are already publicly available on GitHub.

The CLKHash client encoding library is supported on Windows, macOS and Linux, meaning that in whatever environment your data resides, you can encode and submit it to the Anonlink entity resolution service to generate a linkage. The Anonlink library and Entity Service run on Linux, leveraging Docker for containerisation.

Contact Data61

Your contact details

First name must be filled in

We'll need to know what you want to contact us about so we can give you an answer.