Privacy By Design – The Secret Inside the Internet
by Peter
Shush – I’m going to tell you a secret. It’s one that no one has paid attention to for over a decade now. And like most secrets, it’s been hiding in plain sight. The story starts years ago, but we can skip most of that, and begin in June of 1999 when they finalized the HTTP 1.1 standard for the Internet (RFC 2616).
The Internet is essentially the one ring that binds us all. We can’t remember life before the Internet, and we certainly can’t imagine life without it. So what secrets does it hold that haven’t been exposed already?
Well let’s start with the abstract and see if there’s anything obvious…
The Hypertext Transfer Protocol (HTTP) is an application-level protocol for distributed, collaborative, hypermedia information systems. It is a generic, stateless, protocol, which can be used for many tasks beyond its use for hypertext, such as name servers and distributed object management systems, through extension of its request methods, error codes and headers. A feature of HTTP is the typing and negotiation of data representation, allowing systems to be built independently of the data being transferred.
Nothing appears to jump out at me, but wait, maybe there is something? Notice in the third sentence that the protocol can be used for many tasks beyond hypertext. That’s interesting – it’s indicating that it’s an “extensible” protocol i.e. we can add things to it. Which of course begs the question “what can add, and how can we add it”?
Well from the text it says that we can use it’s request methods (not sure what that is) it’s error codes (doesn’t sound too exciting) and it’s headers – now that sounds like it could be interesting. Because a header is “data” so that means I can add new data to the protocol that touches everything on the planet.
Interesting – but we better check to see if there are any gotcha’s?
Oh dear – this document goes on forever and as I get down close to Section 12.1 I spy what could be trouble for the header idea.
Section 12.1 talks about Server Driven Negotiation. (Fancy talk for a Web server receiving a request from a browser.)
This section states…
If the selection of the best representation for a response is made by an algorithm located at the server, it is called server-driven negotiation. Selection is based on the available representations of the response (the dimensions over which it can vary; e.g. language, content-coding, etc.) and the contents of particular header fields in the request message or on other information pertaining to the request (such as the network address of the client)
It then goes on to state…
Server-driven negotiation has disadvantages:
- It is impossible for the server to accurately determine what might be “best” for any given user, since that would require complete knowledge of both the capabilities of the user agent and the intended use for the response (e.g., does the user want to view it on screen or print it on paper?).
- Having the user agent describe its capabilities in every request can be both very inefficient (given that only a small percentage of responses have multiple representations) and a potential violation of the user’s privacy.
- It complicates the implementation of an origin server and the algorithms for generating responses to a request.
Crikey (as they say down under), this does not look too promising. It appears that if we add header data we can expose people’s privacy, slow down the Web server, and worst of all, item 1 above says that server can never know enough to determine what would be good for the user.
So Dear Reader it appears that what I first thought was a secret is not really a secret at all – or is it?
What if…
- You sent enough data so that the server could accurately determine what might be best for any given user without slowing it down
- You encrypted the users private data
- You made it very simple for the origin server to generate a request
Well then Section 12.1’s disadvantages wouldn’t be a disadvantage anymore and in fact could become an “advantage”. And you would have just discovered a way to extend the Internet protocol so that it supported more data.
And that Dear Reader is the Privacy By Design secret inside the Internet. The very protocol that binds us can accept new data that allows it to become “Contextually aware” – in essence making it smarter about who we are, what device we’re using and where we are. It also allows us to encrypt that data to ensure privacy and send it such a way that servers of today can easily handle those extra 1,000 bytes or so of data.
In my next blog I’m going to discuss how you can use this secret to take a contextual approach to online privacy.