Curly Fries and Big Data: An Appetizing Path Forward for Responsible Data Use

Click here to watch the TEDTalk that inspired this post.

In a TEDx MidAtlantic talk, computer scientist Jennifer Golbeck explains how companies use the digital information they obtain about us to make inferences that can be highly predictive.

Sometimes, these inferences are logical but rather creepy, as in the now infamous example of Target - which has a prediction algorithm that analyzes subtle shopping clues to determine which of its customers is pregnant. Target was pilloried in the media after sending pregnancy-related coupons to a pregnant girl, exposing the pregnancy to her father...who had yet to learn this news from his daughter.

Others examples may be less creepy - but less intuitive - such as the discovery that smart people are highly likely to have "liked" curly fries on Facebook.

Before heading off to gorge on fries, readers should understand that this odd correlation is likely due to some smarty pants with a big network of Facebook friends "Liking" a page about fries. Her friends, likely to also be smart, saw the "Like" and were influenced to take the same action, and that influenced their friends and so on.

The Curly Fries case study shows very effectively how Big Data can be incredibly accurate in a descriptive manner, but completely off-base if interpreted to always indicate causation. For example, if MENSA - the national organization for geniuses - wanted a quick way to quickly reach the smart people with an ad to join its club of smart people, targeting an ad to those who have liked curly fries on social media would probably be far more effective than a generic ad to all Facebook members. But feeding your kids curly fries will do nothing to improve their chances of getting into Harvard.

Ms. Golbeck shares a concern that a company could assemble Facebook likes and other web surfing data and sell it to companies for use in making employment decisions. Luckily, we do have laws that strictly regulate the kind of information that can be used for such background checks on employees, as well as the use of data for health insurance, credit, housing, employment, and many others areas. But indeed there are gaps in protection that have policymakers and privacy advocates scrambling for legal solutions that can better protect civil rights in this age of Big Data.

In the wake of their Big Data report, issued in early May, the Administration is collecting formal public comments this month and next for ways that a proposed new Consumer Bill of Rights can protect online users while continuing to ensure beneficial innovative uses of data.

But legislation and public laws are likely to only be a small part of the solution, as Ms. Golbeck makes clear.

At the Future of Privacy Forum, we have been strong advocates for a range of solutions [PDF] that can advance the transparency and control that could help put users back in charge of their digital data.

By urging companies to give users more access to their own data, and by providing greater transparency to the purposes of algorithms that are being used, more scrutiny can be brought to bear on what companies are up to behind the scenes.

Furthermore, by allowing social media 'posts' to be expunged over time, companies can ensure that data we put up on social media sites isn't around forever to be used against us. My blog posts are intended to be permanent, but other digital footprints - such as logfiles indicating my web surfing history, or my tweets - might be more ephemeral and should fade over time. The rise of ephemeral messaging tools - like SnapChat, Frankly and many others - shows that an increasing number of us want the ability, even if imperfect, to indicate that some of our actions are trivial and shouldn't be left for posterity.

The timely and relevant question is, should companies and scientists even be studying this 'leave-behind' data in the first place?

Let's return to the curly fries example, where an odd correlation turned out to be due to "homophily" - the phenomenon of people being more likely to associate with similar people. Other correlations could point researchers to clues that are incredibly meaningful. Do the people who develop a certain disease all have some history in common that could actually be scientifically meaningful? Can we predict famines, earthquakes or other disasters by finding some early seemingly meaningless clues in bytes of data that yield actionable meaning when scrutinized? Can we ferret out hidden discrimination that is preventing qualified people from advancing? These are just some examples of the positivity and benefits of responsible Big Data use.

Big Data could lead to the greatest advances society has seen in generations. Or, it could take us down a path of poor decisions and increased discrimination. Eating curly fries (unfortunately!) wont make us smart enough to guide the right decisions, but collaboration between technologists, policymakers, and businesses could. Finding that nexus of the right stakeholders guided by the right principles is the foundation for the next steps in the Big Data arena.

We want to know what you think. Join the discussion by posting a comment below or tweeting #TEDWeekends. Interested in blogging for a future edition of TED Weekends? Email us at