Anonymisation in the Panopticon, or Being Naked in the Cyber Agora

Can we ever live without the fear that what we share online will affect us negatively offline?

Data Never Sleeps 8.0
Amount of data generated every minute in 2020

What Luciano Floridi coined ‘onlife’ [1], has become truer than ever in the past year. Our self-conception, our mutual interactions, the way we conceive reality and how we interact with it, have all been heavily influenced by our interaction with technology. It has become impossible to separate our lives offline from our lives online. Everything gets done either exclusively through technology, or is undoubtedly facilitated by it. In that line, we are all living ‘onlives’ and generating constant data, to the point of having our actions and thoughts fully documented in the cyber agora.

In 2018, Forbes magazine claimed that “Every day, we create 2.5 quintillion bytes of data – so much that 90 per cent of the data in the world today has been created in the last two years alone”. [2] While, IBM predicts that by 2025, the world will store 250 zettabytes (or 150 trillion gigabytes). [3]
Not that we would know how to count to that…

One of the tools we have created to protect ourselves and our data from misuse and abuse is the GDPR. However, it can be argued that the GDPR’s scope has become too broad [4] — its attempt to be all-encompassing is backfiring. Whilst it was well intended, the broadness creates confusion and will end up applying to everything, as all data will at some point be personal (or can at least be argued to be). This will lead to a system overload, which will lead to the failure of the GDPR. It is thought that in such a scenario, the GDPR will be largely ignored and its enforcers unable to close the floodgates it opened.

“It knows too much.” – Barron’s Cartoon, Kaamran Hafeez

As technology advances, means of (re-)identification advance. Therefore, anonymisation must be irreversible. ‘Information’, while seemingly uncontroversial, is the exact problem. Personal data can be anything, regardless of the data’s nature or content. Whether anything can be considered personal data should, in theory, be based on whether it can be used with the purpose of influencing individuals.

Case law relating to this notion follows the WP29 approach.

The first data protection case where the meaning of ‘personal data’ was discussed in the Court of Justice was Lindqvist. [5] It constitutes the defining case for interpreting the scope of the household exemption. The case concerned a catechist, Mrs Lindqvist, setting up a website containing information of herself and 18 of her colleagues including names, hobbies, telephone numbers and even personal injuries. All of the above was done without the consent of the relevant people it inlcuded.

The case was discussed in regards to the EU Data Protection Directive 95/47 and the questions referred for preliminary ruling included among others (a) whether the mention of a person in the manners discussed above falls within its scope and (b) whether such information found in a private home page, accessible given an address, could constitute an exception under Article 3(2) of the Directive. “Household activity” is addressed in Recital 18 of the regulation as non-commercial activity along with a description of what this ‘could include’. This creates uncertainty as to the scope of application.  

“Remember when, on the Internet, nobody knew who you were?” – New Yorker Cartoon, Kaamran Hafeez

The Court suggested that both the scope and nature of processing falls with the limits of the Regulation and that the activities of Mrs Lindquist cannot be considered as exclusively personal household activity. In addressing the same exemption of the Directive, the Court in Ryneš [6] addressed the situation of a CCTV operation, that despite being attached to a single household, it was nevertheless monitoring a public space. For that reason, the activity could not be considered as purely “personal or household”.

In addressing the scope of the household exemption, it is important to note that the strict approach taken in court proceedings, revealing its extremely narrow nature, is further reinforced by the fact that there has not been a successful claim falling within the exemption so far.

In contrast, in Breyer, the Court used a very broad definition in relation to the identifiability criterion – ‘identification measures reasonably likely to be taken’. [7]

“Cloud Data in the West” – Chris Slane

Breyer concerned the possibility of identifying the dynamic IP addresses of visitors to the websites of the German Federal institutions and whether such dynamic addresses were personal data.

The Court addressed the notion through analysing “whether the possibility to combine a dynamic IP address with the additional data held by the Internet service provider constitutes a means likely reasonably to be used to identify the data subject.”

Since website providers were found to have the means likely to identify website visitors through third parties (i.e. internet providers), dynamic IP addresses were considered personal data.

The case reaffirmed the broad reading by the WP29 of “all the means likely reasonably to be used either by the controller or by any other person”. Here, the Court explicitly stated that it is not necessary “that all the information enabling the identification must be in the hands of one person”. Thus, with the ruling that a legal ban on identification would make the identification means not reasonably likely to be used, the Court followed the absolute approach.

It has also been suggested that a functional anonymisation approach would be the most favourable in maintaining the possibility of anonymous data. [8] This approach focuses on the relationship between the data and the environment within which the data exists. Anonymisation has become an important part of the data-sharing toolkit, both ethically but also as a procedure who adds to the business. It is a procedure which is applied to the data as algorithm that takes a privacy-breaching dataset as input and produces a dataset from which individuals could not be identified.

‘Accidental Data Release’ – Chris Slane

However, several critics and market failures, such as the release of data in 2013 about journey details of New York cabs, prove that a naive application of anonymisation could lead to a threat of re-identification of personal data, which is defined as any information relating to an identified or identifiable natural person. It is argued that there is indeed a higher risk in recent years considering both the evolution of technology and the increasing financial rewards to hack systems.  

Due to these market failures, a new approach to anonymisation was developed – the Functional approach. According to it, one cannot tell from the data alone whether a dataset is anonymous.

“I changed my privacy settings” – Matt Percival

Certain additional relevant issues shall be taken into account, e.g. the motivation of an adversary wishing to attach anonymised data in order to re-identify somebody, the potential consequences of disclosure, and how a disclosure might happen without malicious intent.

The importance of taking into account the ‘data environment’ has been highlighted, as it is “the set of all possible data that might be linked to a given dataset”. This consists of four elements: other data, data users, governance processes and infrastructure. 

Therefore, the notion of functional anonymisation ties together the ideas of disclosure risk and the data environment best. By applying it, we reduce the risk of re-identification through controls on the data and its environment so that it is at an acceptably low level. This approach is a practical framework which delivers the desired benefits without compromising the concept of information privacy.

 

Footnotes

[1] Luciano Floridi (2015) The Onlife Manifesto, Springer, Cham, DOI https://doi.org/10.1007/978-3-319-04093-6

[2] https://www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-we-create-every-day-the-mind-blowing-stats-everyone-should-read/?sh=1bfe9edc60ba

[3] https://www.ibm.com/blogs/services/2020/05/28/how-to-manage-complexity-and-realize-the-value-of-big-data/

[4] Nadezhda Purtova (2018) The law of everything. Broad concept of personal data and future of EU data protection law, Law, Innovation and Technology, 10:1, 40-81, DOI: 10.1080/17579961.2018.1452176

[5] Case C-101/01 Lindqvist EU:C:2003:596

[6] Case C‑212/13 Ryneš ECLI:EU:C:2014:2428

[7] Case C-582/14 Breyer ECLI:EU:C:2016:779

[8] Mark Elliot, Kieron O’Hara, Charles Raab, Christine M. O’Keefe, Elaine Mackey, Chris Dibben, Heather Gowans, Kingsley Purdam, Karen McCullagh, Functional anonymisation: Personal data and the data environment, Computer Law & Security Review, Volume 34, Issue 2, 2018, Pages 204-221, ISSN 0267-3649, https://doi.org/10.1016/j.clsr.2018.02.001.