WWW Search Engine Test Methods

by: Randy D. Ralph, MLIS, Ph.D.

In place April 18, 1997. Copyright © 1997, Randy D. Ralph. All rights reserved.
BACK:   Search Page NEXT:   Results TOP:   Search Page

Test Search Protocols

The top eight (8) global WWW search engines identified in previous comparative testing in 1996 as part of an Indexing and Abstracting (LIS565) course project at the Department of Library and Information Studies of the School of Education at the University of North Carolina at Greensboro were selected for comparative benchmark testing again in 1997. One search engine previously tested, Inktomi, has since the last test been wholly replaced with the new HotBot engine.

Twenty five (25) standard test searches were devised in diverse subject areas and genres in order to gauge the overall performance of the eight (8) selected search engines. In general, single term search queries were presented to the search engines using their own search forms interfaces with default settings, except that the Lycos engine was configured for "loose match." Multiple term queries were presented to search engines which do not offer proximity operators in one of several ways:

  1. as bound phrases; that is, enclosed in double quotes ("), where applicable.

  2. linked with the AND Boolean operator, if available, where phrase searching was not an option.

  3. linked with the NEAR, FOLLOWED BY or ADJ proximity operators, where applicable.

  4. prefixed with the (+) term qualifier to strengthen matching, where applicable.

Additionally, one or more significant terms was used for sorting in the Alta Vista search engine to group the most relevant items toward the top of the output.


The Search Queries

The actual test queries and the rationale behind them is presented below:

These initial ten queries were used in the 1996 benchmark testing.

Query #1 - "Black '47"

To test the engines on current pop culture a search was performed on the Boston-based, Irish-American rock group "Black '47."

Query #2 - "Criminal Behavior"

To test the engines on a general topic for which false coordination might become a limiting factor.

Query #3 - "Fitzwilliam Virginal"

To test the engines on a very specific and relatively esoteric topic in music literature and performance arts - pieces from the Fitzwilliam Virginal Book.

Query #4 - "Kykuit"

To test the engines on a very specific bit of Americana - Kykuit, the baronial home of the Rockefellers on the Hudson River in New York.

Query #5 - "Melungeon"

To test the engines on a very obscure topic - the Melungeons of the Appalachian Mountain Region, often a subject on the STUMPERS-L listserv.

Query #6 - "Neuschwanstein"

To test the engines on a fairly well-known tourist attraction in Germany - Neuschwanstein Castle.

Query #7 - "Okidata OL840 Printer Driver"

To test the engines on a specific technology-oriented subject - locating printer drivers for the Okidata OL840 series printers.

Query #8 - "R. Crumb"

To test the engines on a current bit of pop culture - the '30s drawings of cartoonist Robert Crumb and a recent film about him.

Query #9 - "Randall Jarrell"

To test the engines on a well-known literary figure - Randall Jarrell.

Query #10 - "Tom Metzger"

To test the engines on locating the site of the the well-known but elusive white supremacist with a relatively common name - Tom Metzger.


New Queries for 1997 Testing

The following fifteen new search queries were designed for the 1997 benchmark testing:

Query #11 - "Abelard and Heloise Letters"

To test the engines on fairly well-proscribed and commonly written about themes in Medieval literature and philosophy - the love letters of Abelard and Heloise.

Query #12 - "Amboog-A-Lard"

To test the engines in current pop culture the somewhat controversial Marilyn Manson death metal rock band was used.

Query #13 - "Ambreine"

To test the engines on retrieval of a specific and somewhat obscure chemical compound name - ambreine, a constituent of the ambergris aroma.

Query #14 - "Baroncelli Chapel"

To test the engines on a well-known and celebrated architectural subject - the Baroncelli Chapel of Santa Croce.

Query #15 - "Book of Kells"

To test the engines on retrieval of electronic sources of incunabula - the Book of Kells illuminated manuscript.

Query #16 - "Chateau de Villandry"

To test the engines on retrieval of a specific tourist attraction - the Chateau de Villandry in the Loire Valley of France.

Query #17 - "Francis Pilkington Madrigals"

To test the engines on retrieval of an obscure musical reference - the Elizabethan madrigals of Francis Pilkington.

Query #18 - "Long Haymes Carr Lintas"

To test the engines on retrieval of a local, but nationally known business establishment - the Long Haymes Carr Lintas advertising agency.

Query #19 - "Mansoor Amarna Collection"

To test the engines on retrieval of information on museums and art collections - the controversial Mansoor Amarna collection of ancient Egyptian art.

Query #20 - "Muir Woods"

To test the engines on retrieval of information on a very popular tourist attraction - Muir Woods redwood forest preserve in Marin County, California.

Query #21 - "Olympic Park Bomber"

To test the engines on retrieval of a celebrated news event - the bombing at the Centennial Olympic Park in Atlanta.

Query #22 - "Omega 3 Fatty Acids"

To test retrieval of information on popular health issues - sources of omega 3 fatty acids.

Query #23 - "Paphiopedilum callosum"

To test retrieval of information on a specific plant species - the orchid Paphiopedilum callosum.

Query #24 - "Percheron Breed History"

To test retrieval of information on the history of a specific animal breed - the Percheron horse breed.

Query #25 - "Peyronie's Disease"

To test retrieval of true medical and popular medicine information on a common complication of diabetes - Peyronie's disease.


Ranking Protocols and Computations

The search engines were rated on these search queries in four ways:

  1. The total number of hits retrieved.

  2. The position in the search output list of the first relevant hit.

  3. The number of relevant items in the first twenty five (25) hits listed.

  4. An Ordinal Ranking for each search engine within each query based on an evalutaion of the first three factors where:

    • 8 = top of 8 tested
    • 1 = bottom of 8 tested
    • 0 = failed the search

The Relative Ranking assigned to each search engine on the basis of a comparison of the Absolute Ranking computed for each search engine on each query. The Absolute Ranking achieved by each search engine was computed using three figures:

  1. Hit Ranking - an assigned index value based on the total of the hits retrieved, where:

    3 = less than 1,000 hits retrieved

    2 = 1,001 to 10,000 hits retrieved

    1 = over 10,000 hits retrieved

    Returns of less than 1,000 hits were considered to be positive for users and were given the highest Hit Ranking of 3. Returns of more than 10,000 hits were considered to be the most negative for users and were given the lowest Hit Ranking of 1.

  2. Position in the search output listing of the first solid hit.

  3. Number of solid hits in the first 25 returned items in the output list.

The Absolute Ranking was computed as follows:

Hit Ranking X (10 - Position of First Hit) X # Hits in the First 25 Items

The Hit Ranking times 10 minus the position of the first solid hit times the number of solid hits in the first 25 displayed items. This formula yielded a number for the Absolute Ranking of each engine for each query. The Absolute Rankings were used to compute the Relative Rankings, with the engine receiving the highest Absolute Ranking at 100%.

Ordinal Rankings from 8 (highest) to 1 (lowest) and 0 (failed) were assigned on the basis of the computed Absolute Ranking. Where computed Absolute Rankings were identical the highest ordinal ranking was given to the search engine with the best overall retrieval results. Where all else was equal search engines with identical computed Absolute Rankings were given the same Ordinal Ranking (8 - highest, 1 - lowest, 0 - failed).

Example:

QUESTION:  Abelard and Heloise Letters                         

Search Total Hit To First Hits/ Absolute % Ordinal Engine Hits Ranking Hit First 25 Ranking Ranking Ranking
AltaVista 50 3 1 15 405 22.13 6 Excite! 128 3 1 16 432 23.61 7 HotBot 216 3 1 16 432 23.61 8 Infoseek 1236 2 5 12 120 6.56 3 Lycos 0 0 0 0 0 0 0 Open Text 5 3 1 3 81 4.43 2 WebCrawler 6 3 1 6 162 8.85 4 Yahoo! 6380 2 1 11 198 10.82 5

In the example result table displayed above HotBot receives the highest Ordinal Ranking even though the computed Absolute Ranking for Excite! is identical because the Total Hits retrieved is higher. Lycos receives Ordinal Ranking 0 because it failed the search.

Not significant in the overall rankings generated using these methods were the speed of returned search results. All of the tested search engines generally returned results within less than ten (10), and usually within five (5), seconds even at times of peak Internet usage. Significant in the overall rankings were the total number of hits returned. Search engines which either do not return total hits counts, or which make discovery of total hits difficult were ranked lower, all other things being equal. Search engines which returned enormous hit counts, all other things being equal, were rated lower as responding poorly for the average user. Search engines were ranked as having failed if no hits were returned at all, if no relevant hits were returned, or if no relevant hits were returned within the first twenty-five (25) sites listed.

BACK:   Search Page NEXT:   Results TOP:   Search Page