Google: A Losing Battle for Relevance
I wrote a feature for Beyond Search which summarized the relevance problems for the query “ocr programs.” You can find that article at http://goo.gl/aBDjyI. The main point is that an average user would find links to crapware, flawed software, or irrelevant information. But Google was not the only offender. Bing and Yandex returned results almost as frustrating to me as Google’s output.
You may know that indexing the Web is expensive, technically challenging, and filled with pitfalls. Over the years, Web indexing systems which depend on advertising to pay the bills have walked a knife edge. On one side, are spoofers who want to exploit free visibility in a search results list. On the other side are purists like me who expect a search and retrieval system to return results which are objective and conform to standard tests such as those for precision and recall.
The Web indexes try to balance the two sides while calculating furiously how to keep traffic up, revenues growing, and massaging the two sides to remain faithful to Google. For those looking for free visibility, Google wants to offer an advertising option in the event that a site drops or disappears from a results list. For the inner librarians, Google has to insist that results are indeed relevant to the users.
I am okay with distorted results. I am okay with the search engine optimization folks who charge large sums to spoof Google. I am okay with librarians who grouse about the lack of date filtering and advanced search operations. I am pretty much okay with the state of search.
Search has degraded over the years. As the user base became the great clicking majority, vendors have had to make searching easy. A big part of easy is “good enough.” The idea is that most searchers would not know an accurate result from a silly result. As long as most users are happy, the system satisfies. For mobile, vendors rely on predictions based on various indicators. With each abstraction from a well-crafted Boolean queries, results become good for topics like Lady Gaga and not-so-good for a subject like “Google[x] Labs.” Those brackets almost guarantee a result set which requires considerable human effort to winnow for the useful items of information.
A larger problem has crashed into the automated indexing systems for Bing, Google, and Yandex. Exalead indexed a modest subset of the available Web, so I don’t rely on that system. Blekko has become too unpredictable. The implementation of Blekko on Topix, the news aggregation service returns nonsense for most of my queries. My email to Topix and Blekko has gone unanswered. With Gigablast becoming an open source search system, most of those looking for information online have few options. In my view, none of the Big Three are delivering objective results.
So what’s the new problem?
“Google Makes a Huge Change in Press Release SEO” explains that Google cannot differentiate from information which is accurate and information which is baloney. I reported this fact in several lectures last year when Google carried information reporting a nonexistent Google acquisition. (See http://goo.gl/ZIuRr). Not only did Google run the incorrect information, the company left the information in its index. Good enough? Sure.
The “huge change” is press releases with article marketing signals, advertorials with links, and juiced up “anchor text” (words to which a link is attached) will pull down a Google “score.” This is not just the PageRank score. Specific components of the PageRank such as “quality” will be returning the equivalent of a grade school “D” or “F.” SEO and PR mavens will be as annoyed as the parents of a doe eyed child who racks up Fs in conduct.
Several observations are in order:
First, Google and probably its competition will take further steps to minimize the vulnerability of the indexing methods to spoofing and outright manipulation. Yes, these Web indexing systems are spoofable. (I offer a for fee briefing which identifies eight points of vulnerability common to the modern indexing and content processing systems. I also explain why these “holes” and “vulnerabilities” will be tough to remediate. Write seaky2000 at yahoo dot com if you want a price quote.)
Second, the knowledge about Web indexing vulnerabilities is based on mathematical premises, not the baloney of the SEO world. However, information about these gaps is now diffusing, and Google is chasing the wave, not riding it.
Third, manipulating search results can have an impact on the Google advertising matching processes. That system operates on the assumption that most results are related to the user’s query. That assumption is based on the world as it was in 2001 and 2002. Today’s world is different.
The bottom line is that relevance for Web search is taking a hit. Is there a fix? Nope, just brute force, knee jerk reactions like the one reported in the “Huge Change” article. Is there a way to get better Web search results? Yes, but few users have the patience or the expertise to use the needed techniques. I will be reviewing some of the options in my ISS talk in September 2013. The briefing is intended for those who are active members of law enforcement or the intelligence community. But the good news is that there are ways to address a growing problem of irrelevant search results. For more information, write me at seaky2000 at yahoo dot com.
Stephen E Arnold, August 20, 2013