Searching
the World Wide Web
Search Engines
- Search Engines
periodically scan the web by reading and indexing individual homepages.
Master indexes are then created which are used to match keywords typed
in by the user with words found on the homepages. The search engine,
"Google," takes a somewhat different approach by examining
both keywords and the way different sites are linked to one another.
This has an impact on the way results are displayed.
Advantages: Indexes
cover 35+ million sites (High Recall). Indexes updated more frequently.
Disadvantages: Searches
often retrieve numerous, irrelevant sites (Low Precision). Possibility
of missing relevant sites due to the unique way each Search Engine operates.
Multiple Search
Engines
- Multiple Search
Engines run the same search over several search engines to increase
the possible number of sites retrieved. Sites are ranked for relevancy,
duplicate sites are eliminated, and then the top 10 or so sites are
displayed.
Advantages: A fast
way to find a few good sites on a topic. Easy-to-use search screen.
Disadvantages: Difficult
to do a comprehensive search. Complex searches involving keywords and
phrases may not work well with all search engines. Default settings
limit the number of sites retrieved and bypass search engines that do
not quickly respond to search requests.
Search Directories
Search Directories
list www sites by subject. Indexers examine individual www sites and place
them in the appropriate subject category.
Advantages: Searches
often yield relevant sites (High Precision). Useful for broad,
general topics.
Disadvantages: Indexes
less than 5% of the total number of www sites (Low Recall). Time
lag for new sites to be indexed.
-
When
NOT to Use a WWW Search Engine
To locate published
works on your subject:
To locate published
journal articles:
- Use a periodical
index. On the Library Homepage there is a listing of available
Indexes.
To begin research
in an unfamilar discipline:
- Consider using
a WWW Subject Guide. A listing of guides
created by Mount Holyoke Librarians is available off the Library Homepage.
How
to Use a Search Engine
General Tips
- Consult the Help
screens for each Search Engine. Each service uses different commands
and operates in different ways.
- Use more than
one Search Engine. Search engines vary in size, by method of searching,
and by the way the search is performed. Different search engines may
yield different results.
- Review some of
the literature that examines Search Engines. Examples:
- Keep a "Cheat Sheet"
handy. Example:
Internet
Search Engines.
Search Strategies
that Usually Work
- Keyword Searching.
Find pages that contain all of the words you have specified. Warning:
Unlike library catalogs which include broad subject headings that aim
to bring like things together, search engines look for words wherever
they appear in a document. Many irrelevant pages will be returned as
the search will find the words in any order and in any location. Conversely,
some pages may be missed that use different words to describe the same
topic.
- Example: The
search, Consumer Product Chemistry, could retrieve a page
with the following words -- Some consumer groups are advocating
product warning labels on children's chemistry sets -- and miss
a page containing the words -- Household Products Chemistry.
- Phrase Searching.
Generally requires quotation marks around the phrase. Find pages with
only the words you type in, in that exact order and with no words in
between them. Warning: Some relevant pages may be missed if the
above criteria is in any way violated.
- Example: The
search, "Mount Holyoke Library" would miss sites with the
words Mount Holyoke College Library.
- Boolean Operators
offer a way to include related keywords and/or refine a topic.
- AND. Makes
sure that all the words appear at the selected sites. For most Search
Engines AND is assumed if no operator is used. For example: Apples
Oranges will produce the same search as Apples AND Oranges.
- OR. Used to
retrieve sites that contain either word. Offers a good way to include
synonyms or broader/narrower terms in the search. For example: Apples
OR Fruit.
- NOT. Used to
exclude all sites that contain this word. For example: Modem
NOT Internal. Warning: This operator should be used with
caution as it could exclude many relevant sites that happen to contain
this word.
- NEAR. Finds
sites that contain the words located within a few characters of
one another. For example: Harry NEAR Truman will retrieve
sites with the words Harry S Truman and Harry Truman. Warning:
Not all Search Engines use NEAR in the same way; some do not offer
it at all; and others use a different term (eg. ADJ).
- ( ). Use to
combine Boolean Operators within a single search. For example: (Apples
OR Fruit) AND (pesticides OR insecticides)
- Wildcards.
Generally uses an * to include variant spellings.
- For example:
Colo*r will retrieve British and American spellings.
- Limiting by type
of WWW Site. Some Search Engines will restrict a search to retrieve
only certain type of sites such as those maintained by Educational,
Non-profit or Governmental institutions. This can sometimes be a way
to eliminate questionable commercial sites and personal pages.
- "Follow the Links."
If you do find a good page, see if the author provides links to other
useful sites.
Interpreting
and Evaluating Your Results
- Most Search Engines
list results based on a "relevancy" algorithm that varies from one service
to another and often is not very well documented. Common variables include:
- The query terms
are found in the first few words of the document (especially the
title of web pages or in the hidden "meta tag" field).
- The query terms
are found in close proximity to one another in the document.
- The document
contains more of the search terms than other documents.
- "Other Considerations."
- Tips on Evaluating
WWW sites. "Buyer Beware" -- Some sites are not what they appear to
be. For tips on evaluating sites:
Style
Guides for Citing Electronic Sources
- The use of electronic
resources in scholarly research and writing is still evolving. Here
are some sources with examples of how to cite electronic information,
following various established formats:
Learning
More About Searching the WWW
- Check current awareness
services such as the Scout
Report.
- For everything
you ever wanted to know about search engines, check Search
Engine Watch.
- Review some of
the recent literature published on the topic. For example: Rodrigues,
Dawn. The Research Paper and the World Wide Web. Prentice Hall,
1997. (MH Main: LB 2369 R585 1997 REF)
- Internet
Tutorials from the University of Albany Libraries include a
wealth of information ranging from basic connection and browser tips
to "second generation" searching highlights.
- Contact a Reference
Librarian.
|