Anne Jordan De-anonymising public data
Jinfo Blog

9th February 2010

By Anne Jordan

Item

“De-anonymise” does not yet appear in the Oxford English dictionary, but it may be a word the editors of that venerable authority on the English language may want to review for addition. I recently came across the word in an article about how public data may be misused by criminals to discover personal level data, potentially limiting the roll-out of recent government data initiatives. Criminal activity on the web is nothing new. Last weekend the website of Tata Consultancy Services, one of India’s largest software and services companies was hacked and a “For Sale” message posted. Google and Twitter have also suffered breaches in the past. Whether a joke, for political ends, or criminal purposes, these events can be embarrassing and potentially commercially damaging. A recent article has reported another risk for website owners to guard against, and particularly sites hosting public data sets, such as the UK local and national data initiatives, the Greater London Authority’s Datastore and the UK government’s data.gov.uk. These have been launched since the New Year and welcomed in LiveWire postings by myself and Michele Bate at http://digbig.com/5bbbmn and http://digbig.com/5bbbmq. The article in The Guardian (http://digbig.com/5bbbnr) looks at how statistical "de-anonymisation" techniques might limit the roll-out of such public data initiatives. Computer scientists in the US have discovered ways to "re-identify" the names of people included in supposedly anonymous datasets. The example cited is a movie rental company but there are more serious implications. The discovery that lists can be "de-anonymised" needs to be included in the debate about how information is released and where to draw the line. Dr Ian Brown, of the Oxford Internet Institute believes the discovery raises concerns about initiatives such as Data.gov.uk. He says: "they are looking at releasing crime reports down to street level. You have to think about how people might be able to link that back to individuals."

« Blog