Stephen E. Arnold: Big Data and Data Quality — Plus Robert Steele on the Role of the University

IO Impotency
Stephen E. Arnold
Stephen E. Arnold

Big Data Should Mean Big Quality

Posted: 28 Aug 2014 03:28 AM PDT

Why does logic seem to fail in the face of fancy jargon? DataFusion’s Blog posted on the jargon fallacy in the post, “It All Begins With Data Quality.” The post explains how with new terms like big data, real-time analytics, and self-service business intelligence that the basic fundamentals that make this technology work are forgotten. Cleansing, data capture, and governance form the foundation for data quality. Without data quality, big data software is useless. According to a recent Aberdeen Group study, data quality was ranked as the most important data management function.

Data quality also leads to other benefits:

“When examining organizations that have invested in improving their data, Aberdeen’s research shows that data quality tools do in fact deliver quantifiable improvements. There is also an additional benefit: employees spend far less time searching for data and fixing errors. Data quality solutions provided an average improvement of 15% more records that were complete and 20% more records that were accurate and reliable. Furthermore, organizations without data quality tools reported twice the number of significant errors within their records; 22% of their records had these errors.”

Data quality saves man hours, discovers missing errors, and deleted duplicate records. The Aberdeen Group’s study also revealed that poor data quality is a top concern. Organizations should deploy a data quality tool, so they too can take advantage of its many benefits. It is a logical choice.

Whitney Grace, September 02, 2014
Sponsored by, developer of Augmentext

Robert Steele
Robert Steele

ROBERT STEELE: There are some very important universities that eschew “commercial” applications among which “big data” is included. This is in my view an error, in part because the role of the university is to ensure data quality and data replicablility and — in the ideal — data exploitability and integrability across all disciplinary boundaries. This should include geospatial tagging of all data at all points in the time spectrum for any given behavior, policy, product, or service. The university is the data incubator for society — it is the university that should be setting open source everything standards for open data such that the commercial world has no excuse for its deep and broad failures in the big data arena — failures that will persist for decades into the future absent the adoption of the Open Source Everything meme and mind-set.

See Also:

Big Data @ Phi Beta Iota

Open Source Everything Home Page