Test Search Protocols
The top eight (8) global WWW search engines identified in previous comparative testing
in 1996 as part of an Indexing
and Abstracting (LIS565)
course project at the Department of Library and Information Studies of
the School of Education at the University of North Carolina at Greensboro were selected for
comparative benchmark testing again in 1997. One search engine previously tested, Inktomi, has
since the last test been wholly replaced with the new HotBot engine.
Twenty five (25) standard test searches were devised in diverse subject areas and genres in order
to gauge the overall performance of the eight (8) selected search engines. In general,
single term search queries were presented to the search engines using
their own search forms interfaces with default settings, except that the Lycos engine was configured
for "loose match." Multiple term queries were presented to search engines which do not
offer proximity operators in one of several ways:
- as bound phrases; that is, enclosed in double quotes ("), where applicable.
- linked with the AND Boolean operator, if available, where phrase searching was not
an option.
- linked with the NEAR, FOLLOWED BY or ADJ proximity operators, where applicable.
- prefixed with the (+) term qualifier to strengthen matching, where applicable.
Additionally, one or more significant terms
was used for sorting in the Alta Vista search engine to group the most relevant items
toward the top of the output.
The Search Queries
The actual test queries and the rationale behind them is presented below:
These initial ten queries were used in the 1996 benchmark testing.
- Query #1 - "Black '47"
- To test the engines on current pop culture a search was performed on the
Boston-based, Irish-American rock group "Black '47."
- Query #2 - "Criminal Behavior"
- To test the engines on a general topic for which false coordination might become
a limiting factor.
- Query #3 - "Fitzwilliam Virginal"
- To test the engines on a very specific and relatively esoteric topic in music
literature and performance arts - pieces from the Fitzwilliam Virginal Book.
- Query #4 - "Kykuit"
- To test the engines on a very specific bit of Americana - Kykuit, the
baronial home of the Rockefellers on the Hudson River in New York.
- Query #5 - "Melungeon"
- To test the engines on a very obscure topic - the Melungeons of the Appalachian
Mountain Region, often a subject on the STUMPERS-L listserv.
- Query #6 - "Neuschwanstein"
- To test the engines on a fairly well-known tourist attraction in Germany -
Neuschwanstein Castle.
- Query #7 - "Okidata OL840 Printer Driver"
- To test the engines on a specific technology-oriented subject - locating printer
drivers for the Okidata OL840 series printers.
- Query #8 - "R. Crumb"
- To test the engines on a current bit of pop culture - the '30s drawings of
cartoonist Robert Crumb and a recent film about him.
- Query #9 - "Randall Jarrell"
- To test the engines on a well-known literary figure - Randall Jarrell.
- Query #10 - "Tom Metzger"
- To test the engines on locating the site of the the well-known but elusive
white supremacist with a relatively common name - Tom Metzger.
New Queries for 1997 Testing
The following fifteen new search queries were designed for the 1997 benchmark
testing:
- Query #11 - "Abelard and Heloise Letters"
- To test the engines on fairly well-proscribed and commonly written about themes in Medieval literature and philosophy - the love
letters of Abelard and Heloise.
- Query #12 - "Amboog-A-Lard"
- To test the engines in current pop culture the somewhat controversial Marilyn Manson death metal rock band was used.
- Query #13 - "Ambreine"
- To test the engines on retrieval of a specific and somewhat obscure chemical compound name - ambreine, a constituent of the ambergris aroma.
- Query #14 - "Baroncelli Chapel"
- To test the engines on a well-known and celebrated architectural subject - the Baroncelli Chapel of Santa Croce.
- Query #15 - "Book of Kells"
- To test the engines on retrieval of electronic sources of incunabula - the Book of Kells illuminated manuscript.
- Query #16 - "Chateau de Villandry"
- To test the engines on retrieval of a specific tourist attraction - the Chateau de Villandry in the Loire Valley of France.
- Query #17 - "Francis Pilkington Madrigals"
- To test the engines on retrieval of an obscure musical reference - the Elizabethan madrigals of Francis Pilkington.
- Query #18 - "Long Haymes Carr Lintas"
- To test the engines on retrieval of a local, but nationally known business establishment - the Long Haymes Carr Lintas advertising agency.
- Query #19 - "Mansoor Amarna Collection"
- To test the engines on retrieval of information on museums and art collections - the controversial Mansoor Amarna collection of ancient Egyptian art.
- Query #20 - "Muir Woods"
- To test the engines on retrieval of information on a very popular tourist attraction - Muir Woods redwood forest preserve in Marin County, California.
- Query #21 - "Olympic Park Bomber"
- To test the engines on retrieval of a celebrated news event - the bombing at the Centennial Olympic Park in Atlanta.
- Query #22 - "Omega 3 Fatty Acids"
- To test retrieval of information on popular health issues - sources of omega 3 fatty acids.
- Query #23 - "Paphiopedilum callosum"
- To test retrieval of information on a specific plant species - the orchid Paphiopedilum callosum.
- Query #24 - "Percheron Breed History"
- To test retrieval of information on the history of a specific animal breed - the Percheron horse breed.
- Query #25 - "Peyronie's Disease"
- To test retrieval of true medical and popular medicine information on a common complication of diabetes - Peyronie's disease.
Ranking Protocols and Computations
The search engines were rated on these search queries in four ways:
- The total number of hits retrieved.
- The position in the search output list of the first relevant hit.
- The number of relevant items in the first twenty five (25) hits listed.
- An Ordinal Ranking for each search engine within each query based on an evalutaion of
the first three factors where:
- 8 = top of 8 tested
- 1 = bottom of 8 tested
- 0 = failed the search
The Relative Ranking assigned to each search engine on the basis of a comparison of the
Absolute Ranking computed for each search engine on each query. The Absolute Ranking
achieved by each search engine was computed using three figures:
- Hit Ranking - an assigned index value based on the total of the hits retrieved, where:
3 = less than 1,000 hits retrieved
2 = 1,001 to 10,000 hits retrieved
1 = over 10,000 hits retrieved
Returns of less than 1,000 hits were considered to be positive for users and were given
the highest Hit Ranking of 3. Returns of more than 10,000 hits were considered to be the most
negative for users and were given the lowest Hit Ranking of 1.
- Position in the search output listing of the first solid hit.
- Number of solid hits in the first 25 returned items in the output list.
The Absolute Ranking was computed as follows:
Hit Ranking X (10 - Position of First Hit) X # Hits in the First 25 Items
The Hit Ranking times 10 minus the position of the first solid hit times the number of
solid hits in the first 25 displayed items. This formula yielded a number for the
Absolute Ranking of each engine for each query. The Absolute Rankings were used to
compute the Relative Rankings, with the engine receiving the highest Absolute Ranking
at 100%.
Ordinal Rankings from 8 (highest) to 1 (lowest) and 0 (failed) were assigned
on the basis of the computed Absolute Ranking. Where computed Absolute Rankings were
identical the highest ordinal ranking was given to the search engine with the best overall
retrieval results. Where all else was equal search engines with identical computed
Absolute Rankings were given the same Ordinal Ranking (8 - highest, 1 - lowest, 0 - failed).
Example:
QUESTION: Abelard and Heloise Letters
Search Total Hit To First Hits/ Absolute % Ordinal
Engine Hits Ranking Hit First 25 Ranking Ranking Ranking
AltaVista 50 3 1 15 405 22.13 6
Excite! 128 3 1 16 432 23.61 7
HotBot 216 3 1 16 432 23.61 8
Infoseek 1236 2 5 12 120 6.56 3
Lycos 0 0 0 0 0 0 0
Open Text 5 3 1 3 81 4.43 2
WebCrawler 6 3 1 6 162 8.85 4
Yahoo! 6380 2 1 11 198 10.82 5
In the example result table displayed above HotBot receives the highest Ordinal Ranking even though the computed Absolute
Ranking for Excite! is identical because the Total Hits retrieved is higher.
Lycos receives Ordinal Ranking 0 because it failed the search.
Not significant in the overall
rankings generated using these methods were the speed of returned search results.
All of the tested search engines generally returned results within less than ten (10), and
usually within five (5), seconds
even at times of peak Internet usage. Significant in the overall rankings
were the total number of hits returned. Search engines which either do not return
total hits counts, or which make discovery of total hits difficult were ranked
lower, all other things being equal. Search engines which returned enormous hit
counts, all other things being equal, were rated lower as responding poorly for the
average user. Search engines were ranked as having failed if no hits were returned at
all, if no relevant hits were returned, or if no relevant hits were returned within
the first twenty-five (25) sites listed.
|