Data and Inference: They'll Drink Our Milkshakes

In the privacy and data debate over the last two years, the movie There Will Be Blood has often come to mind. Loosely based on Upton Sinclair's book Oil, it colorfully chronicles the period of discovery, technology development, land grab and extraction of oil in California during the early twentieth century through the story of a colorful oil tycoon, Daniel Plainview, played by Daniel Day-Lewis.

The idea of major interests taking advantage of a resource whose power is yet unknown, whose extraction is a technical mystery to most and whose value is dependent on the proper application and refinement of the resource is not a new phenomenon. The work of a team at Carnegie Mellon presented yesterday at the annual cybersecurity conference Blackhat in Las Vegas unveiled the modern day parallel to the early oil industry in the United States with a talk on facial recognition technology and its power to lead to sensitive inferential data about users.

Companies are jumping in quickly to lead the pack when it comes to the discovery of inferential data. The acquisition of companies specializing in facial recognition is being led by the same familiar faces we have been finding on Capitol Hill in the thick of it with the privacy and regulation conversation: Apple, Google and Facebook. They are planning for the very real future where the use of facial recognition technology takes shape in a way that we once thought was a futuristic vision depicted in a science fiction flick. These major technology players know that inferential data is critical to their futures and are locating the necessary extraction tools. The "land grab" in this modern case, access to our profiles, pages and accounts will be easy: we already provide this data for free. We just did not know that the non-obvious data that can be inferred from our provision of this information is the type of sensitive data that we probably had no intention of ever divulging publicly online, like a Social Security Number.

At the end ofThere Will Be Blood, there is a climactic scene where Daniel Plainview explains to his foil, an evangelical preacher, how oil can be extracted from underneath land without having to be physically on the land itself. The preacher in the film, no angel himself, believes a tract of land he wishes to sell has a large monetary value due the vast reserves of oil it sits on top of. It is a valid assumption, but it is very quickly and harshly destroyed by Plainview when he exclaims, "I drink your milkshake," a reference to testimony from congressional hearings during the Teapot Dome scandal in 1924. Plainview had already extracted the valuable resource from underneath the land without the preacher even knowing. He had assumed the value of the resource, through its availability, was dependent on his granting of access to it.

In our modern data grab, the shock of our sensitive data being discoverable without us knowing it is disquieting for many reasons. Some arguments in the personal data management conversation have hinged upon users having the ability to operate in the market to make decisions as to how their data can be used, creating of system of incentives for those who want to share data by potentially making it an economic choice. Other discussions have merely centered on users having the simple choice to share data.

One photo being the beginning of the process in discovering sensitive data is frankly scary. We assume that we are managing our digital "domain" by utilizing the appropriate privacy settings and only sharing this data with certain people. The data that could be produced inferentially from what may seem like the silly and seemingly inconsequential details that we do provide on these profiles will likely surprise most users. The lack of seriousness with which some users and critics take the privacy risks inherent to social network use needs to be revisited.

The warning from this facial recognition work: protect your milkshake.