Stephen E. Arnold: Search Hits the “Big O” – Robert Steele Comments on IT’s Three Sucking Chest Wounds

Advanced Cyber/IO, IO Impotency
Stephen E. Arnold
Stephen E. Arnold

Big O Explained: Why Systems Are Alike?

Posted: 27 Jul 2013 04:30 AM PDT

In several of my recent lectures, I pointed out that most end users cannot differentiate among search systems. The comment made about these systems is often, “Why can’t these systems be like Google?” I concluded that the similarity of requests suggests that systems are essentially identical.

One reason is that training in university and the “use what works” approach in the real world produces search, content processing, and analytics systems that are pretty much indistinguishable. There are differences, but these can be appreciated only when a person takes the systems apart. Even then, differences are difficult to explain; for example, why a threshold value in System A is 15 percent lower than in System B. When dealing with sketchy data, the difference is usually irrelevant.

Another reason is that today’s systems are struggling to cope with operations that stretch the capabilities of even the most robust systems. Developers have to balance what the engineering plan wants to do with what can be done in a reasonable amount of time on an existing system.

Enter Big O.

You may want to take a look at “Big O Notation Explained by a Self-Taught Programmer.” I found the write up interesting and clear. The main point in my opinion is:

Consider this function:

def all_combinations(the_list): results = [] for item in the_list: for inner_item in the_list: results.append((item, inner_item)) return resultsThis matches every item in the list with every other item in the list. If we gave it an array [1,2,3], we’d get back [(1,1) (1,2), (1,3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3)]. This is part of the field of combinatorics(warning: scary math terms!), which is the mathematical field which studies combinations of things. This function (or algorithm, if you want to sound fancy) is considered O(n^2). This is because for every item in the list (aka n for the input size), we have to do n more operations. So n * n == n^2.

Below is a comparison of each of these graphs, for reference. You can see that an O(n^2) function will get slow very quickly where as something that operates in constant time will be much better.

One of Three -- Click on Image to Enlarge
One of Three — Click on Image to Enlarge

Net net: Developers have to do what works. Search and related content processes are complex. In order to get the work done, search systems have embraced “what works.” Over time, we get undifferentiable systems.

Disagree? Use the comments section to explain.

Stephen E Arnold, July 27, 2013

Sponsored by Xenky

Robert David Steele
Robert David Steele

ROBERT STEELE:  Information Technology (IT) has three sucking chest wounds that will persist into the foreseeable future.

1.  Free energy and unlimited clean desalinated water have not been a priority for the information era nation-state and corporations.  Big mistake.  NSA is the poster child for poor leadership in this regard, putting a massive computing center in Utah that has neither renewable energy nor any concept of what it means to need 1.7 million gallons a day from aquifers that are so low the entire state of Utah is on water restriction for front lawns.

Click on Image to Enlarge
Click on Image to Enlarge

2.  IT continues to ignore the human factor — as Jim Bamford so famously concludes one of his books on NSA, the human brain is vastly more powerful than any computer NSA might build for 20 years into the future — and as Crisis Mapping and humanitarian technologies are showing, harnessing the distributed intelligence of any given diaspora changes everything about what and when and how one can know — stuff CIA will never master under its current paradigm.

Paul Strassmann
Paul Strassmann

3.  IT continues to ignore the demonstrated limitations of proprietary software badly coded and undocumented, generally far from standardization.  Proprietary is unaffordable, is not inter-operable, and does not scale.  Until we made the turn to Open Source Everything (OSE), IT will continue to return — as Paul Strassmann has documented so ably — a NEGATIVE return on investment.  More money for IT in its present configuration “makes bad management worse.”

See Also:

1994 Talking Points to the Public Interest Summit: Connectivity, Content, Coordination, and C4 Security

1995 GIQ 13/2 Creating a Smart Nation: Strategy, Policy, Intelligence, and Information

2002 The New Craft of Intelligence–What Should the T Be Doing to the I in IT?

2010 The Ultimate Hack Re-Inventing Intelligence to Re-Engineer Earth (Chapter for Counter-Terrorism Book Out of Denmark)

Stephen E. Arnold: Metadata for Documents

Advanced Cyber/IO, Data
Stephen E. Arnold
Stephen E. Arnold

Metadata for Documents

Posted: 19 Jul 2013 06:37 PM PDT

We read numbers about the amount of time wasted on searching for documents all the time, and they are not pretty. When we stumbled upon Document Cloud, we could not help but wonder if this type of service will help with the productivity and efficiency issues that are currently all too common.

The homepage takes potential users through the steps of what using Document Cloud is like. First, users will have access to more information about their documents. Secondly, annotations and highlighting sections are functionalities that can be done with ease.

Finally, sharing work is possible:

Continue reading “Stephen E. Arnold: Metadata for Documents”

Robin Good: Where to Find Hot Trending Fluff Online

Advanced Cyber/IO
Robin Good
Robin Good

SEOMomma provides some really useful pointers for finding “trending content” online:

  1. http://www.google.com/trends/hottrends takes you to where Google curates the trending queries, if you can find something here that you can spin and link to your niche you could get a nice bump in traffic.
  2. http://www.google.com/trends/topcharts everyone loves ‘top tens’ and at this links Google curates the most popular ‘top ten charts’ song to space objects. Children’s TV to Politicians, whisky to coffee and lots I between. It may inspire you to produce your own ‘top ten’.
  3. http://www.hashtags.org/ will give you a list of trending hashtags and http://www.hashtags.org/trending-on-twitter/ will give you what’s trending on Twitter.
  4. http://whatthetrend.com/ has general subjects and if you investigate you’ll see how sites like Huffington Post use the hashtag to create content that could pull in visitors.

If you want more of these, just head on to: http://seomomma.com/content-creation-curated-content/ for the full list.

Useful. Resourceful. 8/10

Original article: http://seomomma.com/content-creation-curated-content/

Stephen E. Arnold: IBM Makes Hadoop Quick and Easy with BigInsight

Advanced Cyber/IO

Stephen E. Arnold
Stephen E. Arnold

IBM Makes Hadoop Quick and Easy with BigInsight

The article titled InfoSphere BigInsights on IBM promotes the use of Apache Hadoop, an open source software framework, with IBM’s BigInsight. Not only is the product free to download, but IBM offers BigInsight to simplify Hadoop for users. To begin, visit the Quick Start Edition page, with video tutorials that walk you through each step toward collecting insights from Big Data. The article explains,

“InfoSphere BigInsights can help you increase operational efficiency by augmenting your data warehouse environment. It can be used as a query-able archive, allowing you to store and analyze large volumes of multi-structured data without straining the data warehouse. It can be used as a pre-processing hub, helping you to explore your data, determine what is the most valuable, and extract that data cost-effectively. It can also allow for ad hoc analysis, giving you the ability to perform analysis on all of your data.”

IBM has managed to turn Hadoop into something resembling user-friendly. The complexity of Big Data scares many people, but IBM hopes to change that bias by allowing users a hands-on learning experience without any data capacity or time limits. The ability to explore large data sets and how to extract information from them is enabled through features including Text analytics, BigSheets, Development Tools and Management Capabilities.

Chelsea Kerwin, July 22, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Owl: Putin Switches to Typewriters — Are All Computers Compromised?

Advanced Cyber/IO
Who?  Who?
Who? Who?

Is Going Back to Typewriters the Answer?

“In Russia, President Putin’s office just stopped using PC’s and switched to typewriters.  What do they know that we don’t? Perhaps it’s Intel NSA inside.”

The NSA has been incredibly thorough in nailing down every possible way to tap into communications. Yet the one company’s name that hasn’t come up as part of the surveillance network is Intel. Perhaps they are the only good guys in the entire Orwellian mess.
Or perhaps the NSA, working with Intel and/or Microsoft, have wittingly have put backdoors in the microcode updates. A backdoor is a way of gaining illegal remote access to a computer by getting around the normal security built-in to the computer. Typically someone trying to sneak malicious software on to a computer would try to install a rootkit (software that tries to conceal the malicious code.) A rootkit tries to hide itself and its code, but security conscious sites can discover rootkits by tools that check kernel code and data for changes.

But what if you could use the configuration and state of microprocessor hardware in order to hide? You’d be invisible to all rootkit detection techniques that checks the operating system. Or what if you can make the microprocessor random number generator (the basis of encryption) not so random for a particular machine? (The NSA’s biggest coup was inserting backdoors in crypto equipment the Swiss sold to other countries.)

Rather than risk getting caught messing with everyone’s updates, my bet is that the NSA has compromised the microcode update signing keys  giving the NSA the ability to selectively target specific computers. (Your operating system ensures security of updates by checking downloaded update packages against the signing key.) The NSA then can send out backdoors disguised as a Windows update for “security.” (Ironic but possible.).

More (check out the many very instructive replies to this article):

Your Computer May Already be Hacked – NSA Inside?

 

Reflections on Tired Databases versus Wired Analytics + Jack Davis & Analytic Tradecraft RECAP

Advanced Cyber/IO, All Reflections & Story Boards
Robert David STEELE Vivas
Robert David STEELE Vivas

There are multiple analytic flaws with most source data, and particularly with any source data labeled in relation to terrorism. If Israelis have have touched the data in any way, shape, or form (especially including the software), it must be considered contaminated and severely suspect.  While I have nothing critical to say about the application of Sentinel to flawed data, I do feel obliged to point out the sucking chest wounds that afflict most source data.

01 Terrorism is not a threat, it is a tactic. Defining every criminal or political incident in a pre-determined “hot-spot” as being “terrorism-related” is not analysis, it is sophmoric.

02  Databases are an important counterintelligence target.  Who created the analytic model and source data parameters, who wrote every bit of code touching the data, who the source feeders and intermediaries are — all of this matters.  Apart from data bases corrupted by elements of the US Government to their own ends, we have data bases and information technology systems so totally penetrated by enemy powers and special interests as to call into question the integrity and utility of at least 50% of all standing databases.

Click on Image to Enlarge
Click on Image to Enlarge

03 Incidents are what should be plotted, in context not only of time and space, but of all of the other pre-conditions of revolution that the US Government in particular refuses to contemplate because it would make it obvious that it has severe shortcomings in its intelligence function. Still-born babies and babies born deformed as so many have been in Fallujah, Iraq, are just as important to stability analytics as are incidents of direct violence against “authority” figures or forces.

Continue reading “Reflections on Tired Databases versus Wired Analytics + Jack Davis & Analytic Tradecraft RECAP”

noble gold