ArtsAutosBooksBusinessEducationEntertainmentFamilyFashionFoodGamesGenderHealthHolidaysHomeHubPagesPersonal FinancePetsPoliticsReligionSportsTechnologyTravel

Searching for a Better Search Engine

Updated on April 23, 2012

Crede Vaccam

What do you think this picture is about? Do you think it's about vacuum cleaners?
What do you think this picture is about? Do you think it's about vacuum cleaners? | Source

Why do we need search engines at all? Why don't we just search the entire internet ourselves for what we are trying to find? I always thought that the reason we didn't was because it would take us so much more time than it would for a computer to do it for us, and that's why we assigned this routine work to a search engine.

I was already searching legal databases using primitive search engines back in the 1980s when I was in law school. At the time, the search algorithm was not a big, proprietary secret. It was access to the database that was paid for by subscription. The search algorithm was spelled right out for us.

Here is how I understood the explanations:

  • If one search term was submitted with no special symbols, all documents containing that term would be listed in the search results, either chronologically ordered or alphabetically ordered, depending on the default of that particular search engine or a default that the user selected.
  • If two or more search terms were submitted in a sequence, (without quotes or other symbols) such as Vacuum County or crede vaccam or Nabal Cabeza de Vaca, then the search engine would first return all those documents that had the exact words in that exact sequence with nothing between them, followed by all other documents where the search terms appeared in that order, but possibly with other intervening terms. (They would be ordered so that the ones with the fewest intervening terms would appear first on the list.) After that, all documents that had all terms in whatever order would be returned. After this would be listed all the documents that had at least one of the search terms.
  • If two or more search terms were submitted between quotes, like this "Vacuum County" or "crede vaccam" or "Nabal Cabeza de Vaca", then only those documents that contained the exact search terms in exactly that sequence, without intervening items, would be returned.
  • If one of the search terms was submitted with a plus sign in front of it, such as +vaccam, then no document would be returned that did not contain it.
  • If one of the search terms was submitted with a minus sign in front of it, such as -cleaner, then no document would be returned that did contain it.
  • There was no magic formula and no secret algorithm, and the whole purpose of cluing us into these rules was the better to help us find what we were looking for.

Nobody supposed at the time that the documents we were searching for would have a partisan interest in being found first, or that the people who had written the documents would bribe the search engine into tricking us to overlook competing documents. The purpose was really to find the best possible match for the exact search term entered. If you typed in Vacuum you got documents that had the term Vacuum. If you typed in vaccam you got documents that contained the word vaccam. It was inconceivable that somebody could type in Nabal and the search engine would decide all by itself and without consulting us that a document with the word naval was a better match.

Those were the days!

What Google finds when we search for Vacuum County

Vacuum County is my second novel. I finished writing it in 1993, and then I spent a couple of years trying to interest an agent or a publisher in the manuscript. And then I figured, why not just let people read it for free? After all, there's this new fangled thing called the internet, and anybody can find anything just by looking it up on Altavista. All they have to do is put in a couple of search terms, and voila! -- the closest match will be returned. And since I'm the only person who has ever written a novel called Vacuum County about a man named Nabal Cabeza de Vaca and since the phrase crede vaccam is exceedingly rare, anybody who even just by accident juxtaposes those words will get a glimpse of my novel. Sure I'd love to have been able to sell it, but more importantly I want it to be read. So there it sat, and pretty much nobody read it for the past fifteen years.

For all you novelists out there, here's some information that might come in handy. The worst thing that can happen to your novel is not that someone will steal it. The worst possible thing that can happen is that nobody will ever read it.

But that's okay, right? People didn't read it, because they weren't interested. That's fair enough. Every time they looked up "Vacuum County" and found my book, they quickly left the site. Is that what happened? Or somewhere in the past ten years, when Google became the "best search engine in the world", might it not be possible that my search results didn't even come up, because the new algorithm made sure that they wouldn't?

Google lets us make little videos of search results. See what happened when I searched for Vacuum County.

Vacuum County is hard to find if you don't know what it is

Without the quotes, the sequence Vacuum County yielded a top result that didn't even have the words in that order: "North County Vacuum". This was followed with lots of other information about vacuum cleaners in which the word county appeared somewhere in the text. Wouldn't my pages that had the words in the correct order get priority in an old fashioned search?

When the words were placed within quotes, the top search item was "Wholesale Vacuum County buy Vaccum County lots." Now what is that? It doesn't even make sense. So if you click on it at aliex.press.com > Wholesale Product you will find there is no "Vacuum County" there. It's some kind of scam that whatever you happen to be looking for, they will insert the words you used into the search result.

This is followed by my own hub "The Problem with Genre" and a CreateSpace blog of mine that briefly mentioned Vacuum County and then more listings involving vacuum cleaners. Should I be happy that my hub and my blog surfaced at all? Fine, I'm happy, but don't you think that the novel is a better match?

For the search term Nabal Cabeza de Vaca, the top returns were about Alvar Nunez Cabeza de Vaca, and the word Nabal doesn't even appear in the entire document anywhere. The word naval is in bold in the Google listing, leading me to believe that Google overlooked pages that had the word Nabal in them in favor of pages with the word naval. Is that what the best search engine in the world does? Even a decent librarian wouldn't do that.

But now see what happens when we try to look for the phrase crede vaccam. The top two results substitute creed for crede. Meanwhile Google tries to tempt the searcher to go look at more vacuum cleaners by asking "Did you mean crede vacuum?"

If you agree with Google and say "yeah, that's it", I believe in the vacuum, not the cow, here's what you'd see: "Joe Crede is a f****ng vacuum", followed by more information about vacuum cleaners.

What's in a typo?

Why the Google algorithm Allows Popularity to Affect page rank

What was the idea behind the Google algorithm? The idea was to save time on searches by prioritizing based on popularity. This makes a certain amount of sense if you are looking for a sequence that comes up very, very often. If somebody is looking for the phrase vacuum cleaner, then because it is such a common phrase, maybe it would make sense to allow the strength of the linking to the site to play a part in deciding what search item appears first. But even here, you wouldn't reverse the order of the words. You wouldn't put a site that had the sequence cleaner vacuum above one that had vacuum cleaner, no matter how popular the cleaner vacuum site was. You'd go with the sequence the searcher gave you first.

When a search sequence is rare, then there is no competition, and no reason to look at popularity. If on the entire web there are only three documents with the sequence crede vaccam, those documents should rank first in a search for crede vaccam. If they don't, then something is wrong with the search engine. It's as simple as that.

Competing Search Engines: Yahoo

If I look up Vacuum County on http:www.yahoo.com, my top two results today are:

  1. VACUUM COUNTY, Chapter Twenty

    VACUUM COUNTY. PART THREE, Chapter Twenty. Copyright 1991 Aya Katz . Chapter 20 PROMISES KEPT. FROM VACUUM COUNTY FILES. PROGRESS REPORT. VACUUM COUNTY ADULT PROBATIONwww.well.com/user/amnfn/vac20.html - Cached
  2. VACUUM COUNTY, Chapter Twenty-Seven

    VACUUM COUNTY. PART THREE, Chapter Twenty-Seven. Copyright 1991 Aya Katz . Chapter 27 THE SHEEP AND THE SHEPHERD. THE NEW YORK TIMES. Tax-Evading Rancher Wonders Why He Doesn't ...www.well.com/user/amnfn/vac27.html - Cached


Now I don't know why they chose those particular chapters, or why they went into vacuum cleaner wholesalers immediately after those two entries. This is not an unqualified endorsement of yahoo. I think they've been bribed, too. But they're a lot more decent about it, don't you think?

If I look up crede vaccam on yahoo today, the first thing that comes up is something about vacuum cleaners, but then the second and third entry are two chapters from Vacuum County that contain the sequence crede vaccam.

If I look up Nabal Cabeza de Vaca on Yahoo today, I get two chapters from Vacuum County, followed by these two sites:

  1. Películas gratis de Nabal | Filmografia Nabal | Cartelera ...

    - Translate con la filmografía de Nabal Presentamos tráilers de cine gratis online para ... Cabeza de Vaca | 1990; Fugitivos Rebeldes | 1954; La mujer milagro | 1931; Calígula | 1979pejino.com/cine/nabal - Cached
  2. Personaje bíblico | cristianismo | Nabal | Laredo Cantabria

    - Translate 25:14 Pero uno de los criados dio aviso a Abigail mujer de Nabal, diciendo: He ... Cabeza de Vaca | 1990; Fugitivos Rebeldes | 1954; La mujer milagro | 1931; Calígula | 1979pejino.com/pelicula/cristianismo/nabal - Cached

The sites in Spanish actually contain all the words in the sequence Nabal Cabeza de Vaca, though not in that order. Yahoo can help the searcher identify the Biblical character Nabal, on whom my novel is based. The fact that in Spanish Cabeza de Vaca isn't just a name, it's also three independent words, allows searchers to identify the semantic relationship between the name of the famous explorer and the occupation of the biblical character Nabal. So even though I think yahoo violated the ordinary rules of priority in search, I can't feel very upset about it, because they contribute a better understanding of the background of my novel to anyone who might care to know what it is really about.

While we're thinking about those Spanish listings, don't you think it's interesting that all the top Google listings about Alvar Nunez Cabeza de Vaca weren't even in Spanish? How are those the best results, even if I were looking for the famous explorer? Wouldn't his own book Naufragios y Comentarios be a better, more primary source?

Competing Search Engines: DUCKDUCKGO results

At duckduckgo.com, the first result for Vacuum County is chapter twenty of my novel. The rest of the results are vacuum cleaner sites. For crede vaccam, at duckduckgo.com, the top result is chapter eighteen of my novel, followed by documents in latin that contain both words, though not in that order or without intervening words. Nabal Cabeza de Vaca at duckduckgo yields chapter nine of my novel, followed by the site in Spanish about the biblical character, followed by sites that contain long lists of names.

So? DuckDuckGo is less corrupt than Google, but not as generous as Yahoo.

Competing Search Engines: Bing

At Bing, Vacuum County yields chapters twenty-seven and eleven of my novel as the two top results, followed by vacuum cleaner listings. Crede vaccam on Bing gets us a vacuum cleaner listing in the top spot, followed by two of my chapters, followed by more vacuum cleaners. Nabal Cabeza de Vaca at Bing gets us chapters twenty-seven and seven of my novel, followed by the Spanish language biblical listing on Nabal, followed by the list of names, followed by Spanish texts.

I'd say Bing is not as good as Yahoo, but possibly equal to DuckDuckGo in the value of the results, though by no means identical.

Vacuum County is now available on Amazon

Is it paranoid to conclude that Google is corrupt?

In discussing the recent algorithm change, there are many opinions. Some are angry with Google and others think this is just a settling down period. Some even say that the bad results are getting top billing in order to find the "bad guys" and punish them.

Me? I don't think there are any bad guys among the listings.The listings are inanimate. They are just information. Information is neither black hat or white hat. It is what it is. The readers get to decide what they want to read. It should be up to the search engines to arrange the pages according to comprehensible rules. The algorithm should not be a proprietary secret. It should be known to all -- especially the people who are searching, so that they can know what terms to input in order to get the best results for them.

Google claims its algorithm is in order to help the searcher find the best results. But the best results are different depending on who you are. In fact, weighting of different search results based on prioritizing them by popularity should be something that a searcher can select by himself, and each searcher should be able to use his own private algorithm the better to help him find what he is looking for. If Google really cared about us, that is what they would let us do.

When someone asserts that Google wouldn't dare give slanted search results, for fear of losing its market share, I have to laugh. They've been doing it for years. All the major search engines are doing it, to a greater or lesser extent. They do it, because they don't get paid by the searchers. They get paid by advertisers. The algorithm is all about exactly how many vacuum cleaner sales sites will get higher priority in a search for Vacuum County that would never have gotten into the list in the first place under a simple boolean search.

How to get around this? Write your own search engine. You won't get rich doing it, because nobody will pay you. But if the results are slanted, they'll be slanted to your bias and nobody else's!

© 2011 Aya Katz

working

This website uses cookies

As a user in the EEA, your approval is needed on a few things. To provide a better website experience, hubpages.com uses cookies (and other similar technologies) and may collect, process, and share personal data. Please choose which areas of our service you consent to our doing so.

For more information on managing or withdrawing consents and how we handle data, visit our Privacy Policy at: https://corp.maven.io/privacy-policy

Show Details
Necessary
HubPages Device IDThis is used to identify particular browsers or devices when the access the service, and is used for security reasons.
LoginThis is necessary to sign in to the HubPages Service.
Google RecaptchaThis is used to prevent bots and spam. (Privacy Policy)
AkismetThis is used to detect comment spam. (Privacy Policy)
HubPages Google AnalyticsThis is used to provide data on traffic to our website, all personally identifyable data is anonymized. (Privacy Policy)
HubPages Traffic PixelThis is used to collect data on traffic to articles and other pages on our site. Unless you are signed in to a HubPages account, all personally identifiable information is anonymized.
Amazon Web ServicesThis is a cloud services platform that we used to host our service. (Privacy Policy)
CloudflareThis is a cloud CDN service that we use to efficiently deliver files required for our service to operate such as javascript, cascading style sheets, images, and videos. (Privacy Policy)
Google Hosted LibrariesJavascript software libraries such as jQuery are loaded at endpoints on the googleapis.com or gstatic.com domains, for performance and efficiency reasons. (Privacy Policy)
Features
Google Custom SearchThis is feature allows you to search the site. (Privacy Policy)
Google MapsSome articles have Google Maps embedded in them. (Privacy Policy)
Google ChartsThis is used to display charts and graphs on articles and the author center. (Privacy Policy)
Google AdSense Host APIThis service allows you to sign up for or associate a Google AdSense account with HubPages, so that you can earn money from ads on your articles. No data is shared unless you engage with this feature. (Privacy Policy)
Google YouTubeSome articles have YouTube videos embedded in them. (Privacy Policy)
VimeoSome articles have Vimeo videos embedded in them. (Privacy Policy)
PaypalThis is used for a registered author who enrolls in the HubPages Earnings program and requests to be paid via PayPal. No data is shared with Paypal unless you engage with this feature. (Privacy Policy)
Facebook LoginYou can use this to streamline signing up for, or signing in to your Hubpages account. No data is shared with Facebook unless you engage with this feature. (Privacy Policy)
MavenThis supports the Maven widget and search functionality. (Privacy Policy)
Marketing
Google AdSenseThis is an ad network. (Privacy Policy)
Google DoubleClickGoogle provides ad serving technology and runs an ad network. (Privacy Policy)
Index ExchangeThis is an ad network. (Privacy Policy)
SovrnThis is an ad network. (Privacy Policy)
Facebook AdsThis is an ad network. (Privacy Policy)
Amazon Unified Ad MarketplaceThis is an ad network. (Privacy Policy)
AppNexusThis is an ad network. (Privacy Policy)
OpenxThis is an ad network. (Privacy Policy)
Rubicon ProjectThis is an ad network. (Privacy Policy)
TripleLiftThis is an ad network. (Privacy Policy)
Say MediaWe partner with Say Media to deliver ad campaigns on our sites. (Privacy Policy)
Remarketing PixelsWe may use remarketing pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to advertise the HubPages Service to people that have visited our sites.
Conversion Tracking PixelsWe may use conversion tracking pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to identify when an advertisement has successfully resulted in the desired action, such as signing up for the HubPages Service or publishing an article on the HubPages Service.
Statistics
Author Google AnalyticsThis is used to provide traffic data and reports to the authors of articles on the HubPages Service. (Privacy Policy)
ComscoreComScore is a media measurement and analytics company providing marketing data and analytics to enterprises, media and advertising agencies, and publishers. Non-consent will result in ComScore only processing obfuscated personal data. (Privacy Policy)
Amazon Tracking PixelSome articles display amazon products as part of the Amazon Affiliate program, this pixel provides traffic statistics for those products (Privacy Policy)
ClickscoThis is a data management platform studying reader behavior (Privacy Policy)