Berto Jongman: Washington Post Discovers Deep Web — and the World Bank’s Unindexed PDFs — PBI Technical Team Comments

Advanced Cyber/IO, Commercial Intelligence, Earth Intelligence, Ethics
Berto Jongman
Berto Jongman

Only fifteen years after Abe Lederman said the same thing at OSS!

The solutions to all our problems may be buried in PDFs that nobody reads

What if someone had already figured out the answers to the world’s most pressing policy problems, but those solutions were buried deep in a PDF, somewhere nobody will ever read them?

Click on Image to Enlarge
Click on Image to Enlarge

According to a recent report by the World Bank, that scenario is not so far-fetched. The bank is one of those high-minded organizations — Washington is full of them — that release hundreds, maybe thousands, of reports a year on policy issues big and small. Many of these reports are long and highly technical, and just about all of them get released to the world as a PDF report posted to the organization’s Web site.

The World Bank recently decided to ask an important question: Is anyone actually reading these things? They dug into their Web site traffic data and came to the following conclusions: Nearly one-third of their PDF reports had never been downloaded, not even once. Another 40 percent of their reports had been downloaded fewer than 100 times. Only 13 percent had seen more than 250 downloads in their lifetimes. Since most World Bank reports have a stated objective of informing public debate or government policy, this seems like a pretty lousy track record.

Read full article.

Click on Image to Enlarge
Click on Image to Enlarge

Phi Beta Iota: The Post has published something useful. We held it back for a day while we queried our technical team. The Deep Web is not new — Deep Web Technologies led by Abe Lederman remains the best in the world — and also completely ignored by the US Government and the US Intelligence Community. There are some aspects of this discussion that we want to bring forward:

01 PDFs are only as good as the source diversity and integrity, the analytic processing, and the ostensibly expert analysis that goes into them, and the degree to which their findings — not necessarily the report in full — are disseminated and have effect. At this time the World Bank and virtually all other organizations fail on all four fronts.

Click on Image to Enlarge
Click on Image to Enlarge

02 Google Search is a very shallow service — and one that has been corrupted by paid advertising such that Google Search will often show you what someone else wants you to see, not what you need to see — but Google also has reserves of informaton. Google has indexed stuff that is not visible to the public, including proprietary and secret information accessed via its Enterprise services, information that was never supposed to escape those confines. If there is a financial demand for Deep Web access, Google has capabilities that are considerable.

03 PDFs can be indexed in full text. PDFs should be indexed in full text. Indeed, we would go so far as to suggest that among the greatest failings of the entire web infrastructure has been the absence of the combination of precision URLs for each and every document, and full text indexing of all forms of knowledge as a responsibility of the provider. This is something that could be franchised and leased, with free versions for edu and org.

Click on Image to Enlarge
Click on Image to Enlarge

04 Abe Lederman has thought about this more than any other person on the planet (at least in English), and has pointed out to us that “the long tail” reprsents an exponential opportunity for repurposing knowledge. As Amazon has discovered, the greatest value is not in the convenience of the now, but the convenience of the hard to find. In gross generalization, the web and the use of information on the web stop at the 20% that can “see” information as it is presented, and use it then. The other 80% remain unwitting of the existence, relevance, and potential ease of access of that information. If that information were more broadly available, this would have monetization implications all around.

Click on Image to Enlarge
Click on Image to Enlarge

05 All of the above refers to the digital world marketplace of knowledge. It does not cover the other 80% of knowledge, the knowledge that is in analog form or known by humans that must be found, approached, and the tailored knowledge elicited in near real time. That is the full spectrum Human Intelligence (HUMINT) challenge that no one is taking seriously, in part because no one is actually committed to providing decision-support for every Cabinet Department, every Congressional oversight committee, state and local officials, and so on across the eights tribes. Deep Web Technologies, in our view, is the foundation for doing what the secret world cannot do — get a grip on useful knowledge across all mission areas in all languages and mediums. There are other pieces, with geospacial tiling of all information in all langauges and mediums being among them, that no one, anywhere, is focusing on. Our mission at Earth Intelligence Network is to focus on these issues, and help anyone what wants help, to get it right.

Click on Image to Enlarge
Click on Image to Enlarge

See Especially:

Abe Lederman: Deep Web versus Dark Web — Correcting TIME Magazine’s 2013 Errors About the “Secret Web”

We Just Ran Twenty-Three Million Queries of the World Bank’s Website – Working Paper 362

See Also:

2014 Robert Steele: Appraisal of Analytic Foundations – Email Provided, Feedback Solicited – UPDATED

Anonymous Feedback on Robert Steele’s Appraisal of Analytic Foundations — Agreement & Extension

Mini-Me: Google Hurting

Mini-Me: US Army DCGS-A Failures — and Palantir Keeps Trying to Over-Sell Its Shallow Pit

Rob Dover & Robert Steele: Intelligence and National Strategy? Rethinking Intelligence – Seven Barriers to Reform

Robert Steele & Anonymous: Most Analysis Software Sucks — And Story of How Steele Correctly Called BSA Not Being Signed in Afghanistan

Robert Steele: Why Big Data is Stillborn (for Now) + Comments from EIN Technical Council

Yoda: Exascale by 2020? No Way, Jose! Four Socko Graphics and Bottom Line Upfront — Human Brain Still a Million Times More Power Efficient

Yoda: Tutorial – How GoogleEarth and “Tiling” Work to Enable All-Source Near-Real-Time Big Data Sparse Matric Compression & Tailored Exploitation

Yoda: US Government Gets It Right on Open Data

and generally:

2015 Steele’s New Book
2014 Beyond OSA
2014 PhD Proposal
2014 Steele’s Open Letter
2014 UN @ Phi Beta Iota
2013 Intelligence Future
2012 Academy Briefing
1989+ Intelligence Reform
1976+ Intelligence Models 2.1
1957+ Decision Support Story