A Search Logic Primer

by Randy D. Ralph, MLIS, Ph.D.

BACK: Search Forms NEXT: Alta Vista Tutorial TOP: Search Engines

Logical (Boolean) and Text (Proximity) Operators
How to Build Effective Search Strategies Using Them
in the Major WWW Search Engines

Last update September 11, 1996
Logical Operators
AND OR NOT
Text Operators
NEAR FOLLOWED BY

Boolean Logic

Boolean Logic is used to construct logical search statements using logical operators. This may sound forbidding but the basic principles are pretty easy to understand and are fairly intuitive. Many of you who went to school during the Age of Relevancy in the sixties and seventies were introduced to it under the name Set Theory. Remember Set Theory? It doesn't matter. If you can organize your laundry, peel potatoes or put all the thumb tacks and push pins in a messy desk drawer into a single bin, you've already used Boolean Logic. Boolean Logic and its logical operators just gather or separate things into neat piles depending on how you use them.

Boolean Logic uses basically three so-called logical operators or Boolean operators - AND, OR, and NOT.


Boolean Operators


The AND Logical Operator

The diagram at the left shows how the Boolean logical operator AND works. It shows two universes: The universe of FROGS and the universe of TOADS. The area occupied by these two universes overlaps. The area at the left is pure FROGS. The area at the right is pure TOADS. But the area of overlap, in yellow, is FROGS AND TOADS. This special area, where the two universes intersect, contains both FROGS and TOADS.

If we were doing a real search for documents about FROGS and TOADS and used the logical expression FROGS AND TOADS, all of the resulting documents would have to contain both of the terms, FROGS and TOADS. Any documents containing just one of the two terms and not the other would be excluded.

Examples:

PARIS AND LOUVRE AND MUSEUM
would retrieve all documents that contained the terms PARIS, LOUVRE and MUSEUM in any order and anywhere in the document. The fact that they are all present in a document, though, does not mean that they're linked logically or conceptually in any way. It just means that they all co-occur in the document somewhere - that they're all present. That's all. False coordinations are common when AND logic is used. The classic example that's used to illustrate false coordination is:

VENETIAN AND BLIND
which could retrieve information on Venetian blinds or blind Venetians. All it guarrantees is that the two terms will co-occur in the resulting documents.

AND logic focuses, coordinates and narrows a search.


The OR Logical Operator

The diagram at the left shows how the Boolean logical operator OR works. It shows the same two universes: The universe of FROGS and the universe of TOADS. The area occupied by these two universes overlaps. The Boolean logical operator OR makes the overlap complete. Both universes are completely encompassed when OR is used. Obviously any point in either universe is either FROGS or it's TOADS. The whole space is FROGS OR TOADS.

If we were doing a real search for documents about FROGS and TOADS and used the logical expression FROGS OR TOADS, any resulting document could contain either the term FROGS or the term TOADS. Some documents, obviously, would also contain both terms, but the essential point is that only one need be present to cause the document to be found in our result.

Example:

GASOLINE OR PETROL
would retrieve all documents that contained either the term GASOLINE or the term PETROL anywhere in the document. OR logic is used to gather like things into a single place. Things like synonyms or alternate spellings, for example. Think of it like making up a shopping list. You want to leave the store with a bag filled with all the things on your list - eggs OR butter OR milk. That's what OR logic does.

OR logic broadens, includes and expands a search.


The NOT Logical Operator

The diagram at the left shows how the Boolean logical operator NOT works. Once again, it shows the same two universes: The universe of FROGS and the universe of TOADS. The area occupied by these two universes overlaps. The Boolean logical operator NOT excludes all TOADS from the FROGS universe. What's left is only that portion of the FROGS universe, in yellow, that contains no TOADS. The portion of the FROGS universe that also contains TOADS is cut away by the NOT logic.

Think of NOT logic sort of like peeling a potato. A peeled potato is POTATO NOT PEEL. There's only one trouble, some of the good part of the potato goes with the peel. It's unavoidable. So, you need to use NOT logic with as much care as you would a paring knife. Some search engines use the operator BUT NOT instead, but it works the same way.

If we were doing a real search for documents about FROGS and TOADS and used the logical expression FROGS NOT TOADS, any resulting document absolutely could not contain any reference to the term TOADS. This would also exclude those documents which mentioned both FROGS and TOADS. If we were interested in FROGS we might miss these by excluding TOADS. That's why you need to use NOT logic only when you're absolutely sure you want to exclude a term from your result.

Example:

TODDLERS NOT TEENAGERS
would retrieve only documents that mentioned TODDLERS but contained no references at all to TEENAGERS anywhere. The search would exclude any document that mentioned TEENAGERS from the result. We wouldn't see it at all. This might be dangerous. What if some of the documents that mentioned TEENAGERS did so only in passing and were really mostly about TODDLERS? We'd miss these altogether and that might not have been our intent. NOT logic is used to exclude things and needs to be used with care.

NOT logic narrows, excludes and limits a search.


Proximity Operators


The NEAR Text Operator

Two of the major WWW subject search engines permit the use of the text or proximity operator NEAR, Alta Vista and OpenText. It works slightly differently in each search engine but the result is nearly (forgive the pun) the same.

When two terms or phrases are linked with the NEAR operator the search engine finds documents in which these terms or phrases occur within a few words of one another somewhere in the text of the documents returned. This usually means that they'll be in the same sentence, or, at least, in the same paragraph.

Unlike simple AND logic which only requires that two terms or phrases be present together anywhere in a document, the NEAR operator ensures that they are close together in the text. Although this only really means that they probably occur in the same context in the document it also often means that there is a conceptual link, as well. This gives the NEAR operator considerably greater power in focusing in on a topic.

There are only two troubles with this approach. First, you are always at the mercy of the author's language, and there are WWW authors who delight in jokes. Second, sometimes it doesn't work quite as you might have expected it to. For example, the search engine may find two terms or phrases in close proximity but at the border between two distinct and unrelated sections of a document. This doesn't happen often, but it can explain otherwise obscure search results.

Example:

CHATEAU NEAR CHAMBORD
would find documents in which the terms CHATEAU and CHAMBORD were located within just a few words, usually ten (10), of one another. You might expect that the documents found would have something to do with the Chateau Chambord in the Loire Valley of France. But, you might just as easily run across a document that mentioned a bottle of Chambord liquer resting on the mantlepiece of a chateau in Quebec! Stranger things have happened.


The FOLLOWED BY Text Operator

Only the OpenText search engine uses this operator in its search form interface. The operator links two terms or phrases and requires that they follow directly on one another. This ensures the closest possible link between them. When this operator is used you can be virtually assured that the two terms or phrases occur within the same context within the document and you can also be pretty sure that they're also linked conceptually. The FOLLOWED BY operator really zeros in on a topic.

In place of this operator, many other search engines permit you to enter phrases within double quotes. These are called bound phrases and are searched in the same way, that is, as literal phrases in the text of documents. Either way, the obervations about the FOLLOWED BY operator below apply.

Like the NEAR operator the FOLLOWED BY operator also puts you at the mercy of the author's language. It depends largely on word order. Sense is usually conveyed by word order, especially in English, but sense is not always clear from word order alone. For example, consider the phrase SHIP SINKS. Does it refer to a disaster at sea or to an invoice for plumbing? Although tight word order usually confers specific meaning you may see strange results.

Example:

"CAESAR SALAD" or CAESAR FOLLOWED BY SALAD
would retrieve documents in which the phrase CAESAR SALAD was found. You might presume that this would focus in pretty well on this culinary delight but consider what might happen if the document mentioned Roman Emperor Julius Caesar, salad lover! Search engines largely ignore punctuation (with the possible exception of hyphens) when looking for terms and phrases. It's something to consider if you get a strange result.

BACK: Search Forms NEXT: Alta Vista Tutorial TOP: Search Engines


If you have any questions, suggestions or comments please contact:

Randy Ralph

Email: rdralph@netstrider.com