Redefining Big Data

If we're going to nominate a buzzword for 2015, big data is already a major contender for the title. It seems to be all the rage these days -- the promises are huge -- yet very few businesses are leveraging big data well. The challenge that most business leaders face with big data isn't the numbers or even the analysis -- at its core, the math is really not that complicated -- the failure exists in the underlying approach to big data. Perhaps the main reason "big data" has gotten so misconstrued is because of semantics.

When we think about big data, we think numbers, not words. We build data warehouses to collect these numbers and then we buy tools. Oh, the tools! We need tools to "dissect the data." And then we use these tools to build charts and reports. On a good week, these reports get highlighted and circulated around the office. On a bad week, they get buried in a pile of emails. Either way, none of it ever seems to matter. We are great data aggregators and then professionals at ignoring it.

There are several definitions of the word big. With big data, we most often define it as large. Digital media was built on these large data metrics; "We can track it." We started tracking every parameter we could -- clicks, page views, time on site -- to "prove" what worked, but the problem was that we never proved anything, we were simply able to show what happened. So when social media emerged and business leaders wanted our metrics, our only rebuttal was, "you don't understand..." Today, we can put sensors on anything and are collecting data at an accelerating rate, yet the same problem remains: we're accumulating data, but we aren't gaining any knowledge.

In order for us to shift our thinking on big data, to make it truly useful, we must take another definition of big: significant. Large data produces charts and graphs, significant data tells a story that can transform how we do business. The best part is that finding significance in data does not require a large quantity of data, it just requires the right data. In order to collect significant data, we must start by knowing the question we want to answer with it. Specific questions. What are we trying to learn?

Now some will reject this notion. These are the individuals who tag everything with the mentality that they will go back and find a gem of insight in it someday: the data miners. At a recent conference, I asked an audience of business leaders how many of them had built expensive data warehouses filled with numbers that no one will ever look at; every hand in the room went up. Beyond the waste, this approach to data mining falls victim to semantics once again. As Mick McWilliams, SVP of LRW, puts it, "Data mining should really be called 'knowledge mining'. The analogy for data mining is more like dirt mining." He goes on to explain, "Most of big data is big noise," filled mostly with dirt and fools gold.

There are two major problems with this "tag everything" approach. First, with large quantities of data, savvy analysts can "create" any outcome they'd like, a problem when you're searching for unbiased answers. Second, while data mining (or knowledge mining) can be a fruitful method for analysis when done correctly, if you don't start with this end objective, you'll inevitably be making decisions based on data that is, at best, inconsistent. Bad information is cheap to acquire but expensive to use.

To get the most out of your big data, and to create a foundation of knowledge that can be leveraged to inform future work, start by identifying specific questions that you want to answer. Does including the month in our newsletter subject line increase conversion rate? Does the use of text buttons or icons improve time on site? These types of specific questions will help you establish what you need to measure, as well as how to track variables in a way that ensures you're actually measuring what you think you are. Start simple, document and socialize learnings, then build upon them.

Today, we can build smart sensors the size of human red blood cells, and near ubiquitous Internet connectivity makes data aggregation easier and faster than ever. We can truly measure anything we want. When we stop seeking data and start seeking knowledge, we can shift the question from "what happened" to "what do we want to know?" And that is big.