This post first appeared in the CIOReview magazine in Nov 2015…
On Monday, 24 Aug 2015, 1 in 7 people on Earth used Facebook to connect with their friends and family. That’s billion with a “b”. Just on one day.
Every 60 seconds, we send or post 168 million emails, 11 million instant messages, 98,000 tweets and 695,000 Facebook updates. Every minute, more than 300 hours of video are uploaded to YouTube alone.
What does this mean for individuals and organizations? How does “Big Data” shape the world around us? How can we use it to our advantage?
What is “Big Data”?
The term “Big Data” refers to an exponential growth and availability of data, both structured and unstructured, that is difficult to process using traditional (database and software) techniques. Such data, when captured and analysed, can help an organization gain useful insights, ranging from customer retention to increasing revenues.
In a research report in 2001, industry analyst Doug Laney articulated three defining characteristics of Big Data: 1. Volume, 2. Velocity and 3. Variety…
Volume refers to the fact that Big Data does not sample; it merely observes and tracks what happens. Think of it as N=all.
Wikipedia asserts that “Big Data size is a constantly moving target, ranging from a few dozen terabytes to many petabytes of data”. And it will only continue to grow as we adopt more modes of communication and connectivity.
Velocity implies that Big Data is available in real time, and not after all the action is over.
When scientists first decoded the human genome in 2003, it took them nearly a decade to sequence all the three billion pairs – a task that can now be done in a day. The pace at which data is being generated is increasing exponentially. So, it follows, that the pace at which it can be used must also be in sync.
Variety means that Big Data is available in a variety of formats – structured and unstructured.
Maps are data, and so are GPS coordinates. Tweets are data, so are likes on your friend’s Facebook post. The photos you share on Instagram are data, and so are the sentiments of your fans and followers.
Experts estimate that over 90% of world’s data was generated over the last two years; More data has been created in the last couple of years than in the entire history of the human race. Some folks like to think of Big Data as data that cannot be handled by an Excel worksheet, while others argue that it’s not so much the actual size of the data, as what you do with it.
Whichever way you look at it, Big Data implies a vast quantity of data that we now have the ability to process in real-time.
“Big Data” Thinking
“Big Data” thinking allows you to do things at a large scale that simply cannot be done at a smaller one.
Let’s take the example of Retailing. While brick-and-mortar stores could always keep track of their inventory and know what was selling (and what was not), an online storefront enables the business to not only track what customers are buying, but also what else was considered during the buying process (cookies and server logs), what promotions influenced the outcome (ad impressions and banner optimization) and how other customers’ opinions helped or hurt the purchase (social media and comment analytics). A traditional business could not access such a vast repertoire of information, let alone act on it in a timely manner.
Access to large volumes of data also impacts how we work with it.
Until recently, most of our experiences were based on “small data” thinking, wherein we developed elaborate techniques (like sampling methods) to use as little data as possible. In fact, the need for sampling is a relic of the analogue era – a time when information was too scarce and too expensive to collate. Think census, for instance. By definition, a census is a complex exercise, and one that involves considerable cost and resources. Therefore, it is conducted only rarely.
But, when all of the data is available at hand, our reliance on ‘exactitude’ can take a back seat. More data often means it’s ok for it to be more messy, as long as we get a sense of general direction, with more accurate insights at the macro level.
The Enterprise Challenge
In enterprise, traditional notions of using data have often involved concepts like data warehouses, management information systems, and extract- transform-load (ETL) operations to make the data more usable for decision making. But, that approach does not help Amazon offer recommendations to customers, based on what they (and other customer groups) are adding to their shopping carts, in near real-time.
The fact is that data doesn’t just reside in neat tables inside the enterprise. Mobile devices, sensors, scanners, microphones, GPS devices, programs, software, social media, cameras – almost every electronic device or service around you generates data. There are already more than 4.6 billion mobile-phone users across the globe, with Cisco estimating that there will be 50 Billion+ devices and objects (IoT) connected to the cloud by 2020.
That’s why organizations will need to understand their customers in all their dimensions, and learn how to work with the vast amount of data being generated every day.
Consider the example of a website revamp or a new mobile app being developed on behalf of a large enterprise. When organizations start taking their first steps towards gathering data, it is typically in response to a very specific need, and often based on the assumption that building processes and mechanisms to collect, store and analyse data on a large set of parameters will be resource-intensive. Accordingly, they may conclude that it is better to stick to a few parameters that are high priority and of immediate value, rather than build the means to track every thing.
But, “Big Data” thinking shows us that the opposite is true – patterns can emerge from large data sets of seemingly trivial information, if only we had enough data along with the means to track and analyse it. And today, we do.
In a “Big Data” world, ‘more’ trumps ‘better’, messiness (in a very, very large data set) trumps high accuracy (from a small sample), and tags yield better insights than taxonomies.
A New, World Order
We may just be at the beginning of a revolution that will shape every one of us, for years to come.
Google Flu Trends already uses search terms, combined with mathematical models, to predict the spread of the flu virus long before the CDC is able to spot it. In Finance, two-thirds of the US equity market is already traded by computer algorithms that crunch enormous amounts of data to predict gains and reduce risk. As more and more organizations gain a better understanding of how the world of Big Data works, they will be in a better position to put it to work.
The digital age has already given us the means to collect, store and analyse enormous quantities of data, with ease. We now live in a society that shares more content, from more sources, with more people, more often and more quickly. Cloud computing is now available at our fingertips, offering us access to highly-scalable infrastructure at affordable costs.
What’s needed is a change in our thinking. What’s needed is for us to embrace the new, world order.