
Someone is once again raining on the big data parade, urging us to consider carefully before jumping on the bandwagon. FT Magazine warns, โBig Data: Are We Making a Big Mistake?โ Writer Tim Harford points to Googleโs much-lauded Google Flu Trends as an emblematic example in the field. That project notes an increase in certain search terms, like โflu symptomsโ or โpharmacies near meโ, by point of origin. With those data points, its algorithm extrapolates the spread of the disease. In fact, it does so with only one dayโs delay, compared to a week or more for the CDCโs analysis based on doctorsโ reports.
The thing is, this successful project is also an example of the blind faith many are putting into the results of data analysis. The scientists behind it arenโt afraid to admit they donโt know which search terms are most fruitful or how, exactly, its algorithm is constructing its correlationsโitโs all about the results. Correlation over causation, as Harford puts it. However, Google Flu Trends hit a speed bump in 2012: it greatly over-estimated the fluโs spread, unnecessarily alarming the public. Correlation is much, much easier to determine than causation, but we must not let ourselves believe it is just as good.
The article cautions:
โCheerleaders for big data have made four exciting claims, each one reflected in the success of Google Flu Trends: that data analysis produces uncannily accurate results; that every single data point can be captured, making old statistical sampling techniques obsolete; that it is passรฉ to fret about what causes what, because statistical correlation tells us what we need to know; and that scientific or statistical models arenโt needed because, to quote โThe End of Theoryโ, a provocative essay published in Wired in 2008, โwith enough data, the numbers speak for themselvesโ.
โUnfortunately, these four articles of faith are at best optimistic oversimplifications. At worst, according to David Spiegelhalter, Winton Professor of the Public Understanding of Risk at Cambridge university, they can be โcomplete bollocks. Absolute nonsense.โโ
Another quote from Spiegelhalter summarizes the problem with letting ourselves be seduced by big dataโs promise of certainty: โThere are a lot of small data problems that occur in big data. They donโt disappear because youโve got lots of the stuff. They get worse.โ The article goes on to discuss in detail the statistical flaws behind big dataโs promises. It is an important read for anyone facing the alluring shimmer of the big data trend.
Cynthia Murrell, April 25, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
See Also: