We leave an increasingly detailed digital footprint. Ric Parkin worries who can see it.
What have you been doing in the last week? Month? Year? Obviously you’ll have some vague idea, but not all that detailed.
Let’s try just today. Waking up from the phone alarm, you stream the morning news via iPlayer, before checking your email and any missed calls. A commute to work might involve using your Oyster card, or perhaps a trip on a motorway through roadworks with average speed cameras. Work is pretty hectic but you need to look up some things on Stackoverflow and look at a RSA algorithm on github, and manage to make time at lunch to transfer some money between accounts, book a doctor’s appointment via the practice website. At least your amazon delivery arrived today, although you had to sign for it. After work, you text a few friends and meet up in a local pub you found after searching on tripadvisor (after phoning your partner to check it’s okay of course!), which you pay for with a credit card. Once you get home, you check Facebook and your twitter account, both of which you’ve updated during the day, and upload some photos of the sunset you saw this evening, tagging them with the time and location automatically. A quick read of some news and politics websites, and it’s time for bed.
This is a pretty normal scenario for our lives, day in day out. And most of it will go via an electronic network at some point.
Now imagine that someone could see every single one of those transactions. Not necessarily the detail, but just the basic fact that something happened between you and the other end. I think you could get a pretty good idea of the shape of someone’s life, what they did, where they go, who their friends are, what their opinions were.
This could be seen as a touch paranoid, but I have some good reasons to think this is feasible – one of my early jobs involved working on a system to build and visualise networks of information for law enforcement purposes, and a major tool was to import phone call logs to make connections between people. There was also another part that implemented the legal datatrail to record the justification for asking for such data and get it from the telecoms companies. This was a long time ago though, and since then internet connections, mobile phone data (including mast and wifi connections), unsecured web searches (and even secured ones are often logged by the engine), email data, and webcam traffic can all be looked at. See Figure 1 for examples of various visualisation techniques that are used to make sense of these sorts of networks.
And it has been looked at. Not just with a warrant like in my naive youth, but wholesale collection. As we’ve found out via the Snowden leaks, mass collection of data is now routine [NSA, GCHQ]. Sure, most of the ‘uninteresting’ stuff will get thrown away pretty soon as it costs too much to look at, and most people aren’t of much interest, but it’s still there and is seen as a normal thing to do. Although, who knows what’s of interest, and what will that be in the future? Even old fashioned intelligence gathering can be remarkably intrusive [ Cambridge ].
But what of the oft repeated argument that you have nothing to fear if you’ve nothing to hide? Well, that depends doesn’t it? Think of investigative journalists, online legal spats such as Simon Singh and the British Chiropractic Association, climate scientists getting hacked for political purposes, people stalked by new lovers or abused by old, and adopted children tracked down by their unstable relatives, and phone hacking scandals. And what if you knew someone who was interest? Your life has just become of interest itself. And let’s not forget when the rules are ignored – there are many known instances of the system being abused by those on the inside.
In the past you could still be tracked if someone really wanted to, but it took time effort people and money. Now it is much, much easier and cheaper, and we now know that the security services have been doing it for quite a while. The lack of oversight, and even complicity, of these activities by the people who are meant to make sure these efforts don’t go too far is troubling. Many people on the committees due to their expertise have that because they have close connections to the organisations they are meant to oversee! And often say they cannot tell us why they’ve agreed to such behaviour because the evidence of why it is useful is itself secret. But how do we trust them? Some level of access is needed, but we must balance that with the right to privacy and potential for abuse.
What other sorts of data security issues can we think of?
Another potential worry is the NHS.care database. In theory, this is a great idea [ Goldacre ] – we already get a lot of data from looking at the huge numbers of hospital visits, and good analysis can give us valuable insights into the efficacy of procedures, and identify potentially dangerous hospitals by seeing which operations have a higher than expected mortality or problem rate. By combining this with data from GP visits, we can gain even more insights, especially about how interventions work and detect rare side effects. But the rollout has been badly handled, with poor communication, and worse a poor understanding of people’s concerns that even with anonymisation, it can still be easy to discover people and their medical history. Then there were the other purposes to which this data could be put, including potentially being sold to insurance companies –while having better actuarial data could be useful, the worry of being able to tailor insurance to the individual surely defeats the purpose of insurance, which is the pooling of risk. Thankfully this will never happen we are assured, and yet somehow it does [ Sold ]. This should give us pause for thought [ Goldacre2 ], and has already led to a welcome improvement in transparency [ Publish ]
At least you’re not broadcasting when you’re out of your home. Yet. With the upcoming rollout of smart meters [ SmartMeters ] we are potentially doing just that – while the idea of more accurate metering and billing, not to mention pooling data to enable the energy grid to be better managed, there is a danger that the times you are in or out can be inferred, for example if the meter uploads every half-hour then you can easily work out people’s work hours, and even if you choose to only update occasionally, long holidays can be spotted [Frequency]. How can this sort of issue be avoided, while keeping the benefits? One idea would be to have the meters only upload data that can be traced to you only when absolutely required in order to calculate the bill. Other times lots of detailed data could be sent, but only tagged with broad categories and not anything identifiable, although even then you can often work out quite a lot if you can afford the time and effort – this leads to a similar principle to an important rule in cryptography: make the cost of finding out more than it’s worth. (Talking of cryptography, it seems that even it may not be enough to be secure, as adding backdoors can weaken it for everybody [ Backdoor ])
These sorts of situations are becoming more and more frequent, and most people do not have the time, knowledge, or the tools to adequately evaluate and choose what happens to their data, if they even have a choice. Openness and transparency with data collection and storage is important, as are strong checks and balances with an independent watchdog with the ability to investigate abuses and issue strong punishments. But even if we manage to build a good regime in our own country, it is only part of the solution as data is often stored in the cloud, and the datacentres could be in jurisdictions with completely different laws where the hosting company is required to give security agencies access, or can be collected at an intervening point. International rules and co-operation is required with all the problems that are implied by that. But at least some people are thinking about it [ Berners-Lee ].