USGS - science for a changing world

USGS Thesaurus and Science Topics Catalog

Frequently Asked Questions about the USGS Thesaurus

What's a thesaurus?
A thesaurus is a type of controlled vocabulary, a collection of terms.

Terms represent concepts, but it is the concepts themselves and their relationships, not the terms, that constitute the thesaurus.

Terms are related to one another in three different ways:

Hierarchy
A term always has an "is a" relationship with its broader term (BT); a narrower term (NT) can always be said to be "a type of", "a part of", or "an instance of" the parent term.
Preference
For a given concept, one term is chosen as the preferred term or label, and is referred to as the descriptor. Other terms that refer to the same concept are referred to as lead-in terms or non-preferred terms. A non-preferred term is not necessarily a synonym of the preferred term.

We also support a USE-WITH relationship in which a non-preferred text is associated with two descriptors. An example is "soil pollution" USE "contamination and pollution" WITH "soil resources".

Generic relationships
Where concepts are related in some way that cannot be expressed as an "is a" sentence, the thesaurus simply connects one term to another without specifying the nature of the relationship. This is different from more elaborate knowledge-management systems such as topic maps or ontologies in which such generic relationships are always identified and categorized.
Is there only one thesaurus?
No. It is helpful to use different controlled vocabularies for the purposes they serve best. Science Topics currently uses the following thesauri:
TypeThesaurus nameInterface
theme USGS Thesaurus PHP AJAX
feature Alexandria Digital Library Feature Type Thesaurus PHP AJAX
place Common geographic areas PHP AJAX
lithology Lithclass 6.2 PHP
What is the purpose of the USGS Thesaurus?
The USGS Thesaurus is specifically intended to help people outside USGS find information on USGS web sites without specific knowledge of the organizational structure and operations of the USGS. For those inside USGS, the thesaurus provides a source of consistent index terms that spans the full range of USGS activities; such terms can be used to refine or clarify labels, to support internet search, and the relationships among them suggest linkages across programs.

Thesaurus interfaces such as Science Topics are not intended to replace traditional search or browse interfaces. In concert with the new "USGS by State" and "About USGS" sites, they supplement existing navigational aids to USGS web information.

Who has worked on the USGS Thesaurus?
The USGS Thesaurus Working Group is composed of specialists in library and information sciences, communications, the natural sciences, scientific software development, and data management. Its purpose is to create and maintain controlled vocabularies, use those vocabularies to create catalogs and indexes, and develop methodology that will help people find and understand online USGS information resources. The group is associated with the USGS home page design team and coordinates its work with other project tasks as appropriate.
Name Organization Expertise
Alan Allwardt Geology-Pacific Science Center Earth Science, Library Science
Dave Govoni GIO-SIEO Earth sciences, Information architecture
Peter Schweitzer Geology Earth science, software development
Lisa Zolly Biology Library Science

Former personnel

The following people have worked with the group at various times in the past. Their influence is substantial.
Name Organization Expertise
USGS employees
Hylan Beydler Geography-MCMC Land characterization
Nancy Blair GIO-Library Library coordination, cataloging & indexing
Linda Broussard Biology-Library Life sciences, records management
Pamela Callais GIO-Library Cataloging & indexing
Brian Carpenter GIO-Library Library Science
Liz Ciganovich Water-CAPP Publications
Wendy Danchuk Hydrology Cartography, publications
Jeff Dietterle GIO-EWeb Geography, publication
Carmelo Ferrigno GIO-EWeb Information architecture & design
Karen Kaye Biology Information architecture
Richard Huffine GIO-SIEO Library Science
Irena Kavalek GIO-Library Cataloging & indexing
Celso Puente Water Hydrology
Gary Waggoner Biology-CBI Life sciences
Gail Wendt Communications Hydrology, communication, publications
Consultants and outside reviewers
Linda Hill Alexandria Digital Library, UC Santa Barbara
Gail Hodge Information International Associates, Inc.
Candy Schwartz Graduate School of Library and Information Sciences, Simmons College
Jessica Milstead The JELEM Company
Amy Warner Lexonomy Information Architecture Consulting
How was the thesaurus developed? What other vocabularies did you consult?

Philosophy

Search alone is not sufficient to help people find information. Applications intended to help people find information must also help people understand the scientific, technical, and business context in which it is meaningful. People do not in any usable sense find information without knowing what it is they have found and how it relates to other information.

Design goals

  1. The USGS Thesaurus is designed to conform with a recognized standard, ANSI/NISO Z39.19. This standard has been in widespread use throughout the information science community for many years.
  2. The thesaurus is broad and shallow. It is not intended to enumerate or distinguish the fine details of USGS science, and it is not intended to duplicate detailed search within a scientific database on a particular topic that would ordinarily be provided by a web site developer.
  3. The thesaurus is explicitly intended for use in a web browsing environment. Consequently it is strictly hierarchical. No term has more than one broader term; alternative broader terms are shown as related terms instead. Also the number of top terms is intentionally kept small to enable browse interfaces to function well.
  4. The thesaurus is monolingual. Foreign-language equivalents are possible in principle but have not been incorporated into the current design.
  5. The thesaurus is intended to cover only those facets of information for which other controlled vocabularies were either not available or were not optimal for categorizing USGS information. Consequently the thesaurus does not include place names, types of named geographic features, detailed biological taxonomy, chemical and mineral names, USGS publication series names, or names of organizational units and programs.

Development methods

Specialists recognize two different strategies for building controlled vocabularies: top-down, in which terms and their relationships are defined intuitively prior to their direct application in an indexing situation; and bottom-up, in which terms and relationships are added to the vocabulary in the process of indexing. But the same specialists also recognize that most vocabularies are developed using a combination of these two abstract approaches. We developed the USGS thesaurus using this combined strategy. Beginning by simply listing lots of important terms, we grouped those terms using a card-sorting procedure, and then refined the hierarchy with intuitive processes (that is, by relying on what we know). Subsequent revisions have occurred by group deliberation.

Preliminary development of the thesaurus was conducted using commercial software (MultiTES) by a contractor. Subsequent development and revision has occurred in a web-based database application developed by the group meeting the specific needs of this project.

Review of existing controlled vocabularies

We examined many similar controlled vocabularies of various types before and during this process. Examples are the GEOREF thesaurus produced by the American Geological Institute, the CERES thesaurus ( http://ceres.ca.gov/thesaurus/) the Geographic Names Information System (GNIS), the Integrated Taxonomic Information System (ITIS), the categorization scheme used in the Marine Realms Information Bank (http://mrib.usgs.gov/), and numerous smaller or more specialized vocabularies such as glossaries of scientific and technical terms presented on USGS web sites.

How is the thesaurus stored? Can I get a copy of it?
The USGS thesaurus and the other controlled vocabularies we use are stored in a relational database. The structure of the database is described in a table named "about" included in the database.