Stephen E. Arnold: Search Hits the “Big O” – Robert Steele Comments on IT’s Three Sucking Chest Wounds

0Shares

Posted: 27 Jul 2013 04:30 AM PDT

In several of my recent lectures, I pointed out that most end users cannot differentiate among search systems. The comment made about these systems is often, “Why can’t these systems be like Google?” I concluded that the similarity of requests suggests that systems are essentially identical.

One reason is that training in university and the “use what works” approach in the real world produces search, content processing, and analytics systems that are pretty much indistinguishable. There are differences, but these can be appreciated only when a person takes the systems apart. Even then, differences are difficult to explain; for example, why a threshold value in System A is 15 percent lower than in System B. When dealing with sketchy data, the difference is usually irrelevant.

Another reason is that today’s systems are struggling to cope with operations that stretch the capabilities of even the most robust systems. Developers have to balance what the engineering plan wants to do with what can be done in a reasonable amount of time on an existing system.

Enter Big O.

You may want to take a look at “Big O Notation Explained by a Self-Taught Programmer.” I found the write up interesting and clear. The main point in my opinion is:

Consider this function:

def all_combinations(the_list): results = [] for item in the_list: for inner_item in the_list: results.append((item, inner_item)) return resultsThis matches every item in the list with every other item in the list. If we gave it an array [1,2,3], we’d get back [(1,1) (1,2), (1,3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3)]. This is part of the field of combinatorics(warning: scary math terms!), which is the mathematical field which studies combinations of things. This function (or algorithm, if you want to sound fancy) is considered O(n^2). This is because for every item in the list (aka n for the input size), we have to do n more operations. So n * n == n^2.

Below is a comparison of each of these graphs, for reference. You can see that an O(n^2) function will get slow very quickly where as something that operates in constant time will be much better.

One of Three -- Click on Image to Enlarge — One of Three — Click on Image to Enlarge

Net net: Developers have to do what works. Search and related content processes are complex. In order to get the work done, search systems have embraced “what works.” Over time, we get undifferentiable systems.

Disagree? Use the comments section to explain.

Stephen E Arnold, July 27, 2013

Sponsored by Xenky

ROBERT STEELE: Information Technology (IT) has three sucking chest wounds that will persist into the foreseeable future.

1. Free energy and unlimited clean desalinated water have not been a priority for the information era nation-state and corporations. Big mistake. NSA is the poster child for poor leadership in this regard, putting a massive computing center in Utah that has neither renewable energy nor any concept of what it means to need 1.7 million gallons a day from aquifers that are so low the entire state of Utah is on water restriction for front lawns.

2. IT continues to ignore the human factor — as Jim Bamford so famously concludes one of his books on NSA, the human brain is vastly more powerful than any computer NSA might build for 20 years into the future — and as Crisis Mapping and humanitarian technologies are showing, harnessing the distributed intelligence of any given diaspora changes everything about what and when and how one can know — stuff CIA will never master under its current paradigm.

3. IT continues to ignore the demonstrated limitations of proprietary software badly coded and undocumented, generally far from standardization. Proprietary is unaffordable, is not inter-operable, and does not scale. Until we made the turn to Open Source Everything (OSE), IT will continue to return — as Paul Strassmann has documented so ably — a NEGATIVE return on investment. More money for IT in its present configuration “makes bad management worse.”

1995 GIQ 13/2 Creating a Smart Nation: Strategy, Policy, Intelligence, and Information

2002 The New Craft of Intelligence–What Should the T Be Doing to the I in IT?

2010 The Ultimate Hack Re-Inventing Intelligence to Re-Engineer Earth (Chapter for Counter-Terrorism Book Out of Denmark)