Stephen E. Arnold: Search is a Lost Cause — And Big Data Is Incapable of “Social Reflection”

0Shares

View at Source:

A Non Search Person Explains Why Search Is a Lost Cause

Big Data: Is Grilling [A Steak] Better with Math?

View Both in Full Below the Line

A Non Search Person Explains Why Search Is a Lost Cause

The author of “2013: the Year ‘the Stream’ Crested” is focused on tapping into flows of data. Twitter and real time “Big Data” streams are the subtext for the essay. I liked the analysis. In one 2,500 word write up, the severe weaknesses of enterprise and Web search systems are exposed.

The main point of the article is that “the stream”—that is, flows of information and data—is what people want. The flow is of sufficient volume that making sense of it is difficult. Therefore, an opportunity exists for outfits like The Atlantic to provide curation, perspective, and editorial filtering. The write up’s code for this higher-value type of content process is “the stock.”

The article asserts:

This is the strange circumstance that obtained in 2013, given the volume of the stream. Regular Internet users only had three options: 1) be overwhelmed 2) hire a computer to deploy its logic to help sort things 3) get out of the water.

The take away for me is that the article makes clear that search and retrieval just don’t work. Some “new” is needed. Perhaps this frustration with search is the trigger behind the interest in “artificial intelligence” and “machine learning”? Predictive analytics may have a shot at solving the problem of finding and identifying needed information, but from what I have seen, there is a lot of talk about fancy math and little evidence that it works at low cost in a manner that makes sense to the average person. Data scientists are not a dime a dozen. Average folks are.

Will the search and content processing vendors step forward and provide concrete facts that show a particular system can solve a Big Data problem for Everyman and Everywoman? We know Google is shifting to an approach to search that yields revenue. Money, not precision and recall, is increasingly important. The search and content vendors who toss around the word “all” have not been able to deliver unless the content corpus is tightly defined and constrained.

Isn’t it obvious that processing infinite flows and changes to “old” content are likely to cost a lot of money. Google, Bing, and Yandex search are not particularly “good.” Each is becoming a system designed to support other functions. In fact, looking for information that is only five or six years “old” is an exercise in frustration. Where has that document “gone.” What other data are not in the index. The vendors are not talking.

In the enterprise, the problem is almost as hopeless. Vendors invent new words to describe a function that seems to convey high value. Do you remember this catchphrase: “One step to ROI”? How do you think that company performed? The founders were able to sell the company and some of the technology lives on today, but the limitations of the system remain painfully evident.

Search and retrieval is complex, expensive to implement in an effective manner, and stuck in a rut. Giving away a search system seems to reduce costs? But are license fees the major expense? Embracing fancy math seems to deliver high value answers? But are the outputs accurate? Users just assume these systems work.

Kudos to Atlantic for helping to make clear that in today’s data world, something new is needed. Changing the words used to describe such out of favor functions as “editorial policy”, controlled terms, scheduled updates, and the like is more popular than innovation.

Stephen E Arnold, December 16, 2013

Big Data: Is Grilling Better with Math?

Is there a connection between Big Data and grilling? Is there a connection between Big Data and your business?

I read “Big Data Beyond Business Intelligence: Rise Of The MBAs.” The write up is chock full of statements about large data sets and the numerical recipes required to tame them. But none of the article’s surprising comments matches one point I noticed.

Here’s the quote:

Software automation can’t improve without reorganizing a company around its data. Consider it organizational self-reflection, learning from every interaction humans have with work-related machines. Collaborative, social software is at the heart of this interaction. Software must find innovative ways to interface data with employees, visualization being the most promising form of data democratization.

I will be the first to admit that the economic revolution has left some businesses reeling, particularly in rural Kentucky. Other parts of the country are, according to some pundits, bursting with health.

Is a business reorganization better with Big Data?

Will Big Data deliver better grilled meat? Buy a copy of this book by Lilly and Gibson and see if there are ways to reorganize the business of grilling around self reflection. Big Data cannot deliver a sure fire winning steak? Will Big Data deliver for other businesses?

But for the business that is working hard to make sales, meet payroll, and serve its customers, Big Data as a concept is one facet of senior managers’ work. Information is important to a business. The idea that more information will contribute to better decisions is one of the buttons that marketers enjoy mashing. Software is useful, but it is by itself not a panacea. Software can sink a business as well as float it.

However, figuring out the nuances buried within Big Data, a term that is invoked, not defined, is difficult. The rise of the data scientist is a reminder that having volumes of data to review requires skills many do not possess. Data integrity is one issue. Another is the selection of mathematical tools to use. Then there is the challenge of configuring the procedures to deliver outputs that make sense.

Are there business owners who want to toss out traditional methods of organizing work around Big Data? There are examples of companies able to exploit successfully information from systems that report on clicks, tweets, and similar sources.

The notion of “self reflection” adds another twist to the challenge Big Data presents to organizations.

Are managers today equipped to integrate “self reflection” into business processes based on Big Data? Is “automation” dependent on Big Data? Is software the key to business success?”

When I think about information retrieval, Big Data is easy to talk about. Tossing in a handful of buzzwords works like the secret sauce used by cooks on Barbeque Pitmasters television show. The base ingredient is ketchup or some equally mundane commodity. The difference between the winning chef and the losers is how a human deals with cooking a slab of beef.

Big Data, at the present time, requires the equivalent of skilled grill masters. The “self reflection,” the democratization of data, and the notion of reorganization of a business around Big Data are parts of the cheerleaders’ spiel. Effective use of data may require more than self reflection, a smattering of math, and a lot of go go confidence.

Like search and retrieval systems, Big Data systems seem like such a darned good idea. Neither is a new kid on the block. Unsupported assertions about enterprise search contributed to the downfall of a number of vendors. Are enthusiasts of Big Data following a similar path even if it adds a stopover to perform self reflection?

Selling glittering generalities like Big Data or search for that matter is risky. When the systems do not deliver a pay off, some serious issues arise. Self reflection can make it easy not to notice erroneous assumptions like the need for specific expertise to make sense of certain types of data, big or little.

When I want a good steak, I don’t rely on software alone. When I need an informed decision, I don’t rely on software alone. Do you?

Stephen E Arnold, December 16, 2013