WWW Search Engine Test Methods

by: Randy D. Ralph, John W. Felts and Ben J. Lea

Last updated April 30, 1996

Test Searches

Over twenty (20) search engine sites that purport to be global and to index the entire World Wide Web (WWW) were subjected to a general pass/fail test on a standard search for the phrase MANSOOR AMARNA as a bound phrase, coordinated with Boolean AND logic, or coordinated with text proximity operators, where applicable. Only ten (10) of the original test group found the WWW site for the The Mansoor Amarna Collection of Ancient Egyptian art. In a second test on the phrase CHATEAU CHAMBORD only eight (8) of the remaining search engines performed well: Alta Vista, Excite!, Infoseek, Inktomi, Lycos, Open Text Index, WebCrawler and Yahoo! For this reason extensive testing of search engines on ten (10) standard queries was limited to these top eight (8) sites.

Ten (10) standard test searches were devised in diverse subject areas and genres in order to gauge the overall performance of the eight (8) selected search engines. In general, single term search queries were presented to the search engines using their own search forms interfaces with default settings, except that the Lycos engine was configured for "fair match." Multiple term queries were presented to search engines which do not offer proximity operators in one of several ways:

  1. as bound phrases; that is, enclosed in double quotes, where applicable.

  2. linked with the "AND" Boolean operator, if available, where phrase searching was not an option.

  3. for the Alta Vista engine linked with the "NEAR" operator.

  4. for the Open Text Index engine using the "FOLLOWED BY" proximity option.

Additionally, one or more significant terms was used for sorting in the Alta Vista search engine to group the most relevant items toward the top of the output.

The actual test queries and the rationale behind them is presented below:

Query #1 - "Black '47"

To test the engines on current pop culture a search was performed on the Boston-based, Irish-American rock group "Black '47."

Query #2 - "Criminal Behavior"

To test the engines on a general topic for which false coordination might become a limiting factor.

Query #3 - "Fitzwilliam Virginal"

To test the engines on a very specific and relatively esoteric topic in music literature and performance arts - pieces from the Fitzwilliam Virginal Book.

Query #4 - "Kykuit"

To test the engines on a very specific bit of Americana - Kykuit, the baronial home of the Rockefellers on the Hudson River in New York.

Query #5 - "Melungeon"

To test the engines on a very obscure topic - the Melungeons of the Appalachian Mountain Region, often a subject on the STUMPERS-L listserv.

Query #6 - "Neuschwanstein"

To test the engines on a fairly well-known tourist attraction in Germany - Neuschwanstein Castle.

Query #7 - "Okidata OL840 Printer Driver"

To test the engines on a specific technology-oriented subject - locating printer drivers for the Okidata OL840 series printers.

Query #8 - "R. Crumb"

To test the engines on a current bit of pop culture - the '30s drawings of cartoonist Robert Crumb and a recent film about him.

Query #9 - "Randall Jarrell"

To test the engines on a well-known literary figure - Randall Jarrell.

Query #10 - "Tom Metzger"

To test the engines on locating the site of the the well-known but elusive white supremacist with a relatively common name - Tom Metzger.

The search engines were rated on these search queries in four ways:

  1. The total number of hits retrieved.

  2. The position in the list of the first relevant hit.

  3. The number of relevant hits in the first ten listed.

  4. A relative ranking for each query based on an evalutaion of the first three factors where:

    • 8 = top of 8 tested
    • 1 = bottom of 8 tested
    • 0 = failed the search

Relative rankings were cumulated over all search queries and relative rankings were computed as a percent of the highest ranking (AltaVista = 62). The number of times each of the eight (8) search engines was ranked in the top three (3) was also tabulated in the overall rankings. Each search engine was given a subjective overall ranking on a five (5) star scale.

Not significant in the overall rankings were the speed of returned search results. All of the tested search engines generally returned results within less than ten (10), and usually within five (5), seconds even at times of peak Internet usage. Significant in the overall rankings were the total number of hits returned. Search engines which either do not return total hits counts, or which make discovery of total hits difficult were ranked lower, all other things being equal. Search engines which returned enormous hit counts, all other things being equal, were rated lower as responding poorly for the average user. Search engines were ranked as having failed if no hits were returned at all, if no relevant hits were returned, or if no relevant hits were returned within the first twenty-five (25) sites listed.


Return to the WWW Search Test Results Page