Phi Beta Iota: Read original by clicking above, in this case a complete safety copy is below. Google (and Amazon and other digital data services) are deleting history. This matters.
We spotted a very interesting article in Tablix: “Google Index Coverage”. We weren’t looking for the article, but it turned up in a list of search results and one of the DarkCyber researchers called it to my attention.
Background: Years ago we did a bit of work for a company engaged in data analysis related to the health and medical sectors. We had to track down the names of the companies who were hired by the US government to do some outsourced fraud investigation. We were able to locate the government statements of work and even some of the documents related to investigations. We noticed a couple of years ago that our bookmarks to some government documents did not resolve. With USA.gov dependent on Bing, we checked that index. We tried US government Web sites related to the agencies involved. Nope. The information had disappeared, but in one case we did locate documents on a US government agency’s Web site. The data were “there” but the data were not in Bing, Exalead, Google, or Yandex. We also checked the recyclers of search results: Startpage, the DuckDuck thing, and MillionShort.
We had other information about content disappearing from sites like the Wayback Machine too. From our work for assorted search companies and our own work years ago on ThePoint.com, which we sold to Lycos, we had considerable insight into the realities of paying for indexing that did not generate traffic or revenue. The conclusion we had reached and we assumed that other vendors would reach was:
Online search is not a “free public library.”
A library is/was/should be an archiving entity; that is, someone has to keep track and store physical copies of books and magazines.
Online services are not libraries. Online services sell ads as we did to Zima who wanted their drink in front of our users. This means one thing:
Web indexes dump costs.
The Tablix article makes clear that some data are expendable. Delete them.
Our view is:
Get used to it.
There are some knock on effects from the simple logic of reducing costs and increasing the efficiency of the free Web search systems. I have written about many of these, and you can search the 12,000 posts on this blog or pay to search commercial indexes for information in my more than 100 published articles related to search. You may even have a copy of one of my more than a dozen monographs; for example, the original Enterprise Search Reports or The Google Legacy.
- Content is disappearing from indexes on commercial and government Web sites. Examples range from the Tablix experience to the loss of the MIC contracts which detail exclusives for outfits like Xerox.
- Once the content is not findable, it may cease to exist for those dependent on free search and retrieval services. Sorry, Library of Congress, you don’t have the content, nor does the National Archives. The situation is worse in countries in Asia and Eastern Europe.
- Individuals — particularly the annoying millennials who want me to provide information for free — do not have the tools at hand to locate high value information. There are services which provide some useful mechanisms, but these are often affordable only by certain commercial enterprises, some academic research organizations, and law enforcement and intelligence agencies. This means that most people are clueless about the “accuracy”, “completeness,” and “provenance” of certain information.
Net net: If data generate revenue, it may be available online and findable. If the data do not, hasta la vista. The situation is one that gives me and my research team considerable discomfort.
Imagine how smart software trained on available data will behave? Probably in a pretty stupid way? Information is not what people believe it to be. Now we have a generation or two of people who think research is looking something up on a mobile device. Quite a combo: Ill informed humans and software trained on incomplete data.
Yeah, that’s just great.
Stephen E Arnold, April 28, 2019
ROBERT STEELE: I drafted the original warning letter to the US Government on cyber, in 1994, and have been generally ignored for a quarter century because the US Government does not “do” (or care about) “intelligence as decision-support. The US Government in its current “pay to play” mode sells decisions irrespective of the facts or the public good, and has absolutely no interest in holistic analytics or true cost economics. The US Government has also been out-sourcing inherent responsibilities of government, including intelligence and counterintelligence and commercial fraud investigations, to firms ultimately owned and controlled by Zionists. Those firms have been selling “indulgences” to those being investigated; fabricating false testimony against those to be framed (including any government without a Central Bank refusing to be “owned” by the Chabad Cult Zionist – Rothschilds, Vatican, City of London – Wall Street Deep State, and now we learn, doing wholesale deletion of what should be official permanently archived records.
An Open Source Agency such as I have been proposing since 1992 when I helped Senator David Boren (D-OK) with that part of the National Security Act of 1992 put down by Senator John Warner (R-VA), would create a digital national library and help others create their own variations. Time is the one strategic variable that cannot be bought nor replaced. History is the one strategic variable that can be too easily lost and never replaced. The eradication of languages and oral histories has been a large part of the decline of civilization and human liberty, and I am glad to be reminded by Stephen E. Arnold that our information is in enemy hands and being digitally assassinated.
Steele, Robert with James Anderson, William Caelli, and Winn Schwartau, “Correspondence, Sounding the Alarm on Cyber Security,” McLean, VA: Open Source Solutions, Inc., August 23, 1994.
Steele, Robert. “1993 Talking Points for the Director of Central Intelligence,” McLean, VA: Open Source Solutions Network, Inc., 20 July 1993.
Steele, Robert. “Augmented Intelligence with Human-Machine Integrity: Future-Oriented Hybrid Governance Integrating Holistic Analytics, True Cost Economics, and Open Source Everything Engineering (OSEE),” in Daniel Araya. Augmented Intelligence: Smart Systems and the Future of Work and Learning. Bern, CH: Peter Lang Publishing., 2018.
Steele, Robert. Letter to Secretary-General of the United Nations Antonio Guterres, “Subject: Achieving the Sustainable Development Goals (SDG) with Open Source Everything Engineering (OSEE),” January 2, 2017.
Steele, Robert. For the President of the United States of America Donald Trump: Subject: Eradicating Fake News and False Intelligence with an Open Source Agency That Also Supports Defense, Diplomacy, Development, & Commerce (D3C) Innovation to Stabilize World. Earth Intelligence Network, 2017.
Steele, Robert. Beyond Data Monitoring – Achieving the Sustainability Development Goals Through Intelligence (Decision-Support) Integrating Holistic Analytics, True Cost Economics, and Open Source Everything, Oakton, VA: Earth Intelligence Network, 14 October 2014, as submitterd to the High Level Panel on the Post-2015 Development Agenda of the United Nations
Steele, Robert. “The Evolving Craft of Intelligence,” in Robert Dover, Michael Goodman, and Claudia Hillebrand (eds.). Routledge Companion to Intelligence Studies, Oxford, UK: Routledge, July 31, 2013.
Steele, Robert. “The Ultimate Hack: Re-Inventing Intelligence to Re-Engineer Earth,” in U. K. Wiil (ed.), Counterterrorism and Open-Source Intelligence, Lecture Notes in Social Networks 2, Springer-Verlag/Wien, 2011.