Search Engines: Introduction
Electronic information is being created and distributed on the World Wide Web at a fantastic rate with no unifying organizational structure. The challenge has become not only finding information, but also assessing its value. Presently, search engines are the only means to meet this need. It is important that we recognize their strengths and limitations because they serve as the initial interface between users and web pages as users filter, analyze, and evaluate what is presented to them. Understanding search engines will give us a better awareness of just how the use of metadata can be improved.
While search engines like Google index billions of pages and other information resources, countless more remain undiscovered, and more are being added every day! Despite the shortcomings of search engines in terms of coverage, they are the primary tool for users to find out about the existence of web pages. Of greater concern for most users than reaching every possible page on a particular subject is relevance. Exact procedures for relevance ranking vary from engine to engine and are closely held proprietary secrets, but all share the common goal of matching the user-supplied search terms to page content and then ordering and presenting the results such that the highest scoring pages (presumably the most relevant to the user) are at the top and those of lesser interest are placed further down in a list that may range from dozens to thousands of items.
At first glance, metadata would seem to be the perfect solution to bringing users together with appropriate pages. Simple HTML meta tags like "keyword" and "description" offer web authors the opportunity to explicitly categorize their pages, while "author" allows for the most basic documentation. Most search engines, however, do not currently look at, or give primary consideration to, meta tags when indexing a page. Why? Primarily because of abuse of the system by unscrupulous authors. "Spamdexing" as it is known can take many forms including keyword stuffing (repeating keywords) and the use of false/misleading keywords. These practices have led many search engines to declare that author-provided metadata is unreliable across the board. It's a vicious circle: because unscrupulous authors have given meta tags a bad name, search engines tend to ignore them as an indexing tool; because there's little incentive to include meta tags, most scrupulous authors don't take the time to do it.
Metadata standardization efforts hold the promise of improving this situation. The Dublin Core Metadata Element Set (http://www.dublincore.org) is an international standard that consists of fifteen elements such as Title, Creator, Subject, and Date. The standard has proven especially popular for web resources among librarians, archivists, records managers, and in government settings. One example of the Dublin Core at work can be found on Minnesota's state government portal, North Star (http://www.state.mn.us). Thousands of government electronic information resources have been tagged with Dublin Core metadata, and the state's Inktomi/Ultraseek search engine is optimized to give those pages higher standing.
Another example of standardization is the Resource Description Framework (RDF), a language that allows for the exchange of information (e.g., metadata) between different applications on the Web (http://www.w3.org/RDF). RDF facilitates a variety of activities, such as cataloging and description, searching, and content rating. It's also helping set the stage for the Semantic Web - the next incarnation of the World Wide Web which "will bring structure to the meaningful content of Web pages, creating an environment where software agents roaming from page to page can readily carry out sophisticated tasks for users." (Tim Berners-Lee, James Hendler, and Ora Lassila, "The Semantic Web," Scientific American, May 2001, http://www.scientificamerican.com/article.cfm?colID=1&articleID=00048144-10D2-1C70-84A9809EC588EF21 ).
For more information about metadata, visit the State Archives' metadata home page (http://www.mnhs.org/preserve/records/metadata.html).
18 March 2003




