BIG data is suddenly everywhere. Everyone seems to be collecting it, analyzing it, making money from it and celebrating (or fearing) its powers. Whether we’re talking about analyzing zillions of Google search queries to predict flu outbreaks, or zillions of phone records to detect signs of terrorist activity, or zillions of airline stats to find the best time to buy plane tickets, big data is on the case. By combining the power of modern computing with the plentiful data of the digital era, it promises to solve virtually any problem — crime, public health, the evolution of grammar, the perils of dating — just by crunching the numbers.
Or so its champions allege. “In the next two decades,” the journalist Patrick Tucker writes in the latest big data manifesto, “The Naked Future,” “we will be able to predict huge areas of the future with far greater accuracy than ever before in human history, including events long thought to be beyond the realm of human inference.” Statistical correlations have never sounded so good.
Is big data really all it’s cracked up to be? There is no doubt that big data is a valuable tool that has already had a critical impact in certain areas. For instance, almost every successful artificial intelligence computer program in the last 20 years, from Google’s search engine to the I.B.M. “Jeopardy!” champion Watson, has involved the substantial crunching of large bodies of data. But precisely because of its newfound popularity and growing use, we need to be levelheaded about what big data can — and can’t — do.
The first thing to note is that although big data is very good at detecting correlations, especially subtle correlations that an analysis of smaller data sets might miss, it never tells us which correlations are meaningful.
Second, big data can work well as an adjunct to scientific inquiry but rarely succeeds as a wholesale replacement.
Third, many tools that are based on big data can be easily gamed.
Fourth, even when the results of a big data analysis aren’t intentionally gamed, they often turn out to be less robust than they initially seem.
A fifth concern might be called the echo-chamber effect, which also stems from the fact that much of big data comes from the web.
A sixth worry is the risk of too many correlations.
Seventh, big data is prone to giving scientific-sounding solutions to hopelessly imprecise questions.
FINALLY, big data is at its best when analyzing things that are extremely common, but often falls short when analyzing things that are less common.
Wait, we almost forgot one last problem: the hype.
Phi Beta Iota: Big data is only as good as the holistic analytic model, the assumptions, and the true cost economics integrated into its collection, processing, and machine-enabled analysis. This is to say, most big data is thin gruel.