NEWS26 July 2019

Anonymised data may not protect privacy

AI Data analytics Europe GDPR News North America Privacy UK

UK – Current approaches to anonymising data used by companies and governments are not enough to protect privacy, according to research from University of Louvain (UCLouvain) and Imperial College London.


Individuals are at risk of being re-identified even after data has been anonymised using methods such as stripping out characteristics such as names and email addresses, the study found.

The researchers developed a machine learning tool to evaluate the likelihood of re-identifying the right person using characteristics.

Using this model, the study found that 99.98% of Americans would be correctly re-identified in any anonymised dataset using 15 demographic attributes including age, gender and marital status.

Dr Luc Rocher of UCLouvain, one of the report authors, said: “While there might be a lot of people who are in their thirties, male, and living in New York City, far fewer of them were also born on 5 January, are driving a red sports car, and live with two kids (both girls) and one dog.”

The findings challenge the standards for data anonymisation set by GDPR and other laws such as the California Consumer Privacy Act, as the principles of data protection do not apply once personal data has been de-identified.

Senior author Dr Yves-Alexandre de Montjoye, of Imperial’s Department of Computing, and Data Science Institute, said: “This is pretty standard information for companies to ask for. Although they are bound by GDPR guidelines, they’re free to sell the data to anyone once it’s anonymised. Our research shows just how easily –and how accurately – individuals can be traced once this happens.

“Companies and governments have downplayed the risk of re-identification by arguing that the datasets they sell are always incomplete. Our findings contradict this and demonstrate that an attacker could easily and accurately estimate the likelihood that the record they found belongs to the person they are looking for.”

The researchers have also launched a publicly available online tool to highlight the issue and allow people to see which pieces of information could be used to re-identify them. 

The paper is published in Nature Communications.