Data Management

Keywords

Use of terms from a thesaurus can make your data more easily discoverable. They are useful for defining shared context or meaning within and across domains of science.

Keywords in a Thesaurus

Keywords in a Thesaurus

Keywords from a thesaurus (e.g. the USGS Thesaurus) help to categorize data and allow people and computers to compose lists of datasets that share important characteristics; your dataset will be included in some of those lists and not others.

USGS Thesaurus FAQs

Everyday Use for Keywords

We use keywords all the time. At the grocery store, we look for signs indicating which aisle contains certain items, like soup. The signs indicate types of items, and are used to help us separate things that are soup from things that are not soup.

Introduction to Keywords

Keywords is one of the most useful sections of formal metadata, yet it is often misunderstood and consequently is less effective than it could be. Keywords is a section within the Identification_Information section of a CSDGM metadata record, containing any number of subsections for Theme, Place, Stratum, and Temporal keywords. This page goes into detail about keywords in CSDGM metadata records, but Keywords can also be used in ISO 19115 metadata records. See the Metadata page for more information about these two standards. All of the subsections have similar structure, a thesaurus identifier indicating the source from which the keywords were drawn, and one or more keywords drawn from that source:

Identification_Information:

     Keywords:

          Theme:

               Theme_Keyword_Thesaurus: (Official name for a vocabulary)

               Theme_Keyword: a term

               Theme_Keyword: another term

               Theme_Keyword: yet another term

          Place:

               Place_Keyword_Thesaurus: (Official name for a vocabulary)

               Place_Keyword: a term

               Place_Keyword: another term

               Place_Keyword: yet another term

 

What is the purpose of keywords?

Keywords are intended to categorize your data. That allows people and computers to compose lists of datasets that share some important characteristics that are meaningful to other people; your dataset will be included in some of those lists and not others. It is tempting to add keywords that have other purposes such as identification or description, but those keywords do not improve the categorization of the data and those purposes are better served by improving other sections of the metadata record.
 

Keywords are NOT for the following purposes:

  • supporting full-text search, because full-text search should match text anywhere in the metadata record. 
     
  • specifying all of the scientific problems to which your data might be applied. 
     
  • listing identifiers for your data. Remember that an identifier doesn't say what the data are about, it says which data they are.
     
  • describing your data. Descriptions of your data should be in the Title, Abstract, Purpose, Process_Step, and especially in the Attribute_Definition elements.
     

What keywords should I use?

Keywords are provided by controlled vocabularies. A controlled vocabulary is a collection of terms chosen for a specific purpose with clearly indicated meanings and relationships. Those relationships can be important, as in a strict hierarchy where each narrower term is a type of or part of its broader term, or those relationships can be unimportant, as in an alphabetical list where each item in the list is no more strongly related to any term than to any other. The USGS Thesaurus is arranged in a strict hierarchy, so that an information search system can return related resources that were categorized using terms more specific than the one you asked for. So data in the category mine drainage are included if you search for pollution, because mine drainage is a type of pollution.

Many controlled vocabularies are available. You should use a vocabulary if its scope overlaps the ideas, methods, or other characteristics of your data, and if you might expect that vocabulary to be exploited well in the discovery interfaces provided by the organization.

Generally you should choose the most specific keywords that apply to your data. 

Use the keyword finder to explore some controlled vocabularies that are available through web services.
 

How can I assign keywords to my metadata?

When you're writing metadata, you should try to choose keywords that categorize your data well relative to other data produced by the organization. The keywords should be reviewed and possibly revised by another person who understands the data but who also has a broad knowledge of the larger collection of data produced by the organization. That means try to assign good category terms, but don't agonize about them, instead the metadata author should help the reviewer understand the data, so they can help to choose appropriate keywords.

The mechanics of assigning keywords to metadata vary with software tools you use to edit metadata. The simplest method is to type the elements and values in using a text editor. More sophisticated tools can include keyword-selection software that relies on web services to help you choose keywords. The Metadata Wizard, the Online Metadata Editor and Tkme are metadata tools that use web services to help you choose keywords and enter them into your metadata record. See “Tools for Creating Metadata Records” for more information about these tools.
 

How can I see the value of keywords?

The best reason to use controlled vocabularies when choosing keywords is so that the meaning and spelling of the terms you choose will match those that other people use in their metadata. When many metadata records use the same set of controlled terms, those terms can be shown as links to people who are looking for data, so that the data seekers don't have to guess what terms we used and they don't have to guess how we spelled them. If the controlled vocabulary is hierarchical, we can also give the users options to choose broader terms or narrower terms, so they can drill down to topics that interest them and see what data we have pertaining to those topics.

The Geoscience Data Catalog is an example in which a variety of category terms are shown as entry points from which users can choose metadata records or, if they wish, can navigate to more appropriate category terms.

 

Tools 

  • USGS Thesaurus (For Theme Keywords)

    Description: The USGS Thesaurus is designed as a formal thesaurus with rigid adherence to the hierarchical (BT, NT) term relationships, generic non-hierarchical (RT) relationships, and lead-in term relationships linking non-preferred terms to descriptors either singly (UF) or in a compound USE-WITH relationship. The thesaurus is faceted, meaning its top terms delineate general aspects of information resources. *Go the link below and check "USGS Thesaurus". Click on the right tab then browse through "Sciences" and "Topics"; to view theme keywords for inclusion in metadata.

    URL: www.usgs.gov/science/tab-term.html
     

  • USGS Biocomplexity Thesaurus Project (For Theme Keywords)

    Description: The Biocomplexity Thesaurus Project is a thesaurus of term relationships and definitions in nearly every scientific field. The Biocomplexity Thesaurus serves as a controlled vocabulary for facilitating improved access and retrieval of data and information. Users can query the thesaurus for matching and related terms both specific and broad.

    URL: https://www1.usgs.gov/csas/biocomplexity_thesaurus/
     

  • ISO Topic Themes (For Theme Keywords)

    Description: The International Organization for Standards (ISO) metadata standard (ISO 19115) provides a set of Core metadata elements that must occur in every national profile/implementation. Most of these elements either map to existing CSDGM metadata elements or represent properties of the data that can be determined and populated using a data integrated metadata tool. Topic Category is the only mandatory element of the ISO core metadata set that requires new information that cannot be directly captured from the data. *Go the link below and check "ISO 19115 Topic Category". Click on the right tab then browse through the topics to view theme keywords for inclusion in metadata.

    URL: https://www2.usgs.gov/science/tab-term.html
     

  • NAL Agricultural Thesaurus (For Theme Keywords)

    Description: The NAL Agricultural Thesaurus (NALT) is annually updated and the 2007 edition contains over 65,800 terms organized into 17 subject categories. NALT is searchable online and is available in several formats (PDF, ASCII text, XML, SKOS) for download from the web site. NALT has standard hierarchical, equivalence and associative relationships and provides scope notes and over 2,400 definitions of terms for clarity. Proposals for new terminology can be sent to thes@nal.usda.gov. Published by the National Agricultural Library, United States. Department of Agriculture.

    URL: https://agclass.nal.usda.gov/agt.shtml
     

  • NASA Thesaurus (For Theme Keywords)

    Description: Contains authorized subject terms of the NASA Aeronautics and Space Database for aerospace engineering, and all supporting areas of engineering and physics, the natural space sciences (astronomy, astrophysics, and planetary science), Earth science, and to some extent, the biological sciences. The Thesaurus contains over 18,000 terms, 4,000 definitions, and 4,400 USE references. Terms are organized within a hierarchical structure, and also include "related terms" lists. Edited by the NASA Center for AeroSpace Information (CASI).

    URL: https://www.sti.nasa.gov/thesvol1.pdf
     

  • ETDE/INIS - Department of Energy (For Theme Keywords)

    Description: The Joint Thesaurus contains the controlled terminology for indexing all information within the subject scopes of the International Nuclear Information System (INIS) and the Energy Technology Data Exchange (ETDE). It contains 21,147 valid descriptors and 9,114 forbidden terms.The terminology is intended for use in subject descriptions for input or retrieval of information in these systems. The thesaurus may be revised at any time; please refer to the supplements content for cumulative references for changes made to the initial printing of the thesaurus.

    URL: https://www.etde.org/edb/reference.html 
     

  • Geographic Names Information System (For US Place Names)

    Description: The GNIS contains information about physical and cultural geographic features of all types in the United States, associated areas, and Antarctica, current and historical, but not including roads and highways. The database holds the Federally recognized name of each feature and defines the feature location by state, county, USGS topographic map, and geographic coordinates.

    URL: https://geonames.usgs.gov/apex/f?p=136:1:0:::::
     

  • NGA GEONet Names Server (For Foreign Place Names)

    Description: The GEOnet Names Server (GNS) is the official repository of standard spellings of all foreign geographic names, sanctioned by the United States Board on Geographic Names (US BGN). The database also contains variant spellings (cross-references), which are useful for finding purposes, as well as non-Roman script spellings of many of these names. All the geographic features in the database contain information about location, administrative division, and quality. The database can be used for a variety of purposes, including establishing official spellings of foreign place names, cartography, GIS, GEOINT, and finding places.

    URL: https://geonames.nga.mil/gns/html/
     

  • Getty Thesaurus of Geographic Names (For Place Names)

    Description: The TGN includes names and associated information about places. Places in TGN include administrative political entities (e.g., cities, nations) and physical features (e.g., mountains, rivers). Current and historical places are included. TGN is intended to aid cataloging, research, and discovery of art historical, archaeological, and other scholarly information.

    URL: http://www.getty.edu/research/tools/vocabularies/tgn/index.html

 

What the U.S. Geological Survey Manual Requires 

The USGS Survey Manual Chapter 502.7 – Fundamental Science Practices: Metadata for USGS Scientific Information Products Including Data states that data metadata records should include, but are not limited to, authorship, title, abstract and purpose, theme keywords, data quality, temporal extent, and physical location.

 

Recommended Reading 

 

References