By Marcus P. Zillman, Published on December 18, 2012
LLRX.com (Law and Technology Resources for Legal Professionals)
Bots, Blogs and News Aggregators (http://www.BotsBlogs.com/) is a keynote presentation that I have been delivering over the last several years, and much of my information comes from the extensive research that I have completed over the years into the “invisible” or what I like to call the “deep” web. The Deep Web covers somewhere in the vicinity of 1 trillion plus pages of information located through the world wide web in various files and formats that the current search engines on the Internet either cannot find or have difficulty accessing. The current search engines find hundreds of billions of pages at the present time of this writing. This report constantly updated at http://DeepWeb.us/ .
In the last several years, some of the more comprehensive search engines have written algorithms to search the deeper portions of the world wide web by attempting to find files such as .pdf, .doc, .xls, ppt, .ps. and others. These files are predominately used by businesses to communicate their information within their organization or to disseminate information to the external world from their organization. Searching for this information using deeper search techniques and the latest algorithms allows researchers to obtain a vast amount of corporate information that was previously unavailable or inaccessible. Research has also shown that even deeper information can be obtained from these files by searching and accessing the “properties” information on these files!
This report and guide is designed to give you the resources you need to better understand the history of the deep web research, as well as various classified resources that allow you to search through the currently available web to find those key sources of information nuggets only found by understanding how to search the “deep web”.
Articles, Papers, Forums, Audios and Videos
Cross Database Articles
Cross Database Search Services
Cross Database Search Tools
Peer to Peer, File Sharing, Grid/Matrix Search Engines
Resources – Deep Web Research
Resources – Semantic Web Research
Bot and Intelligent Agent Research Resources and Sites
Phi Beta Iota: Should be studied along with the pioneering of Stephen E. Arnold (Arnold IT), Mats Bjore (InfoSphere/Silobreaker), Ran Kock (Extreme Searcher’s) Abe Lederman (Deep Web Technologies), Arno Reuser ( Reuser’s Information Services) and Dr. Dr. Dave Warner (UnityNet) among so many others that we track here. Most have failed at leveraging the wider world of information for three reasons: 1) refusal to acknowledge that information is fragmented and worth going after in a deliberate manner that leverages the human factor via the eight tribes; 2) refusal to be serious about the 183 languages that the Chinese are now focusing on; and 3) refusal to create truly deep “help desks” to help analysts and consumers (desk officers, staff officers) both pull from the 80+ databases in the secret world AND access all information in all languiages all the time that is outside the wire. As a general rule, analysts should NOT spend time accessing databases — that needs to be part of the expanded librarian function, with librarian specialists cross-helping one another internally (e.g. the NSA librarian speaks NSA database, the CIA librarian speaks CIA, etcetera) and a vast array of external “finders” on retainer across ALL subject matter areas.