World Future Society March 11, 2012
Institute for Ethics & Emerging Technologies
Okay. You got me. I can’t really tell you everything you need to know about big data. The one thing I discovered last week – as I joined more than 2,500 data junkies from around the world for the O’Reilly Strata conference in rainy Santa Clara California—is that nobody can, not Google, not Intel, not even IBM. All I can guarantee you is that you’ll be hearing a lot more about it.
What is big data? Roughly defined, it refers to massive data sets that can be used to predict or model future events. That can include everything from the online purchase history of millions of Americans (to predict what they’re about to buy) to where people in San Francisco are most likely to jog (according to GPS) to Facebook posts and Twitter trends and 100 year storm records.
Phi Beta Iota: Big data is most important for what it can tell you about true cost and whole system cause and effect, inclusive of political corruption and organizational fraud. These are past and present issues, not future issues. We design the future based on the integrity present today. This is why “open everything” matters.
With that in mind, here’s the three most important things you need to know about big data right now:
1. The data experts are organizing and they want a revolution!
Data mining, (the primordial ancestor to what is today called predictive analytics) used to be considered a company or organization-specific problem. The data and the people who worked on it were “siloed” in effect. What would a statistics expert in the military and a number cruncher working in retail marketing have to talk about?
These days, it turns out, there’s a lot to discuss. First, new open source data crunching tools like Hadoop (a distributed operating system that lets you gang together thousands of computers to solve problems) are helping organizations big and small develop their own data departments at a small fraction of how much specialized software used to cost a few years ago. That means that the skills that miners are acquiring in one industry like retail are increasingly applicable across sectors, like in government. Second, combining data sets yields new insights, and the number of available sets (in some easily crunchable form like XML or just Excel) is growing. [Emphasis added.]
“Over the last couple of years we’ve seen the horizontalization of data scientist,” says Alistair Croll of Bitcurrent, one of the organizers of the conference.
There was a considerable (but not surprising) consensus among attendees that data and analytics should drive a lot more decision-making within organizations, even if that better-informed strategizing comes at the expense of traditional managers, who will argue that their hard-won expertise is much more valuable than any model based on statistics. More and more often, they’ll be shown to be wrong.
Phi Beta Iota: Reality-based decision making — and collective intelligence combined with multinational consensus — is superior to both political ideology, academic stovepipes, and corporate vapor-ware. What has been missing is a public appreciation of how little the “elite” and the “experts” actually know. Open space wave leadership and open everything are the human side of “big data” done right. See Also: COLLECTIVE INTELLIGENCE: Creating a Prosperous World at Peace, Thinking Fast and Slow, Too Big to Know: Rethinking Knowledge, Wave Rider: Leadership for High Performance in a Self-Organizing World, and THE OPEN EVERYTHING MANIFESTO: Transparency, Truth & Trust.
There’s plenty of debate over whether everyone who works with large data sets in a technical way should get to call themselves a “data scientist.” It may be a matter of the uniqueness of the research, or just a price point.
According to JC Hertz, “if you’ve got someone in your organization that can do analytics, don’t call them a data scientist. They’ll ask you for a $20,000 raise and then get a job down the road.”
What that means: according to Hertz: “Data driven decisions have consequences. There can be political and cultural fallout. This is a gating condition that you need in the beginning. You have to say, this might [anger] x, y, z, and know that in the beginning. Not just outside the organization, but within. You need to know the political consequences of any given data-driven decision and who that decision will tick off.”
Phi Beta Iota: Hertz means well but he lacks integrity. Big data is about true cost. The truth at any cost lowers all other costs. a *major* benefit of big data is that it reveals corruption on all fronts by showing the public the true cost of political decisions based on ideology and campaign finance. That is the center of gravity for fixing our world and achieving resilience and sustainability. See also:2012-03-08 GOD MAN INTERVAL Reformatted & Linked, Journal: Politics & Intelligence–Partners Only When Integrity is Central to Both, Journal: Reflections on Integrity UPDATED + Integrity RECAP
2. You’re going to be asked to opt-in to sharing your data a lot more.
A major topic for discussion this week was the Target Snafu. As originally reported in the New York Times (reg req.), Target raised a lot of eyebrows when the company used customer data and predictive analytics to figure out that one of their customers was pregnant, and, more remarkably, what trimester she was in. They emailed her some promotional material and the girl’s father discovered his daughter was pregnant based on the coupons she started receiving from a big box retailer, which gave rise to an awkward conversation, no doubt.
Most of the people I spoke with here agreed that Target made a mistake in that case, but they believed the error wasn’t in collecting the data and then using it for marketing so much as doing so without permission.
Big organizations are just beginning to realize the huge upside potential of using massive amounts of data to predict everything from what their customers are going to start buying to which of their employees will complete a certain project on time. More importantly, that data is getting increasingly easy and cheap to collect, and there’s already an enormous storehouse of it to aid in pattern extrapolation.
So where is the middle ground? According to many of the folks here, it’s the point where people knowingly agree to contribute data. As one programmer put it, “Spying is the act of collecting data secret. Transparent data collection with defined boundries is NOT spying.”
What that means: more companies will look to make the case that allowing them to track your behavior will benefit you. If enough people buy the pitch, societal attitudes about data tracking will change. There are a lot of things organizations can do to make the offer a good one for consumers, but they haven’t yet.
As Alistair Croll of Bitcurrent put it, “Imagine if that [New York Times] article had said, Target figured out that 1% of its customer base had cancer and it told them. I would sign up for a program that tracked my purchases to let me know if there was a correlation between what I bought and what people that got colon cancer bought.”
Phi Beta Iota: Again, the lack of integrity and integral consciousness has the corporations focusing on how they can exploit data to sell more stuff, rather than on how they can exploit data to render true cost information that makes their profit sustainable in a triple bottom line sense. It’s probably also true that they are not collecting true cost data.
3. The stuff you can predict is amazing, the stuff you can’t is frustrating.
This conference was full of amazing case examples of people using big data to predict things. According to Google’s Hal Varian, unemployment query volume on “Sign up for unemployment” can predict future unemployment claims with a high degree of accuracy one week before official numbers are released from the U.S. government. Coupon and rebate search queries are an excellent predictor of weak economic times ahead.
Having said that, the hype on big data is likely to grow faster than the actual capabilities, as are incidents of “data washing” or making some especially considering how early we are on the hype cycle.(see graph above)
“The most prevalent model in the industry to address this problem is MCU, make crap up,” according to marketing guru Avinash Kaushik.
What that means: Too many organizations are too focused on collecting data without a clear sense of what to do with it. The order should be reversed, according to several presenters. If you want to get started with data-driven decision making first set goals and then start amazing and crunching data sets around those goals.
Phi Beta Iota: “Too many organizations are too focused on collecting data without a clear sense of what to do with it.” The US Intelligence “Community” is the poster-child for corrupt decision-making on this point, something that Robert Steele and others have been saying since at least 1995. The longer we allow the pinnacle of “national intelligence” the remain corrupt, the longer we remain ignorance and ineffective.
Most importantly, many agreed that having great data collection and analysis capability is useless if an organization doesn’t have internal processes in place to allow people to use the new info, and not just at the top of the corporate pyramid.
“You’ve got to empower every person to make decisions with data” according to Kaushik. “Say, ‘You, Janitor! You will be in charge of using Data to make your job better!”
Phi Beta Iota: Nice. Exactly right. We need to train students to access and leverage all information in all languages all the time. To be constructive, not to earn PhD’s that know everything about nothing.
Bottom line: Big data is going to change the way organizations and individuals deal with information and plan ahead. Many of those transitions will be difficult; but, ten years from now, we’ll wonder how we got along without it. Even after the hype cycle on big data goes from peak to valley, there’s still a lot to look forward to.
Phi Beta Iota: Afterthought – big data is multilingual and multimedia. Single nations cannot get it right without creating the global grid of open source standards and M4IS2 (multinational, multiagency, multidisciplinary, multidomain information-sharing and (human-based, machine assisted) sense-making.
Graphics that came with this article: