USGS - science for a changing world

Core Science Analytics and Synthesis - Metadata

Best Practices for Writing Metadata

Best Practices for Writing Quality Metadata

  1. Organize Your Information: Before you start to write metadata, spend some time getting organized. Gather information you have already have documented about your dataset, such as an abstract and purpose that was written when you applied for a grant.
  2. Prepare to Create a Robust Record: Minimal metadata is acceptable for discovery, but can be useless for the reuse of data. Therefore, document your entity and attributes carefully. It is important to define your units of measure, the meaning of your column headers, and the parameters of your domains.
  3. Create a Meaningful Title: A file name is not the same as a title. Develop a title for your dataset that conveys as much information in a short space as you can. Include such information as the topic, geographic location, dates, scale. Example: Greater Yellowstone Rivers from 1:126,700 U.S. Forest Service Visitor Maps (1961-1983). Note this title shows the “who”, “what”, “where”, “scale”, and “when” about the dataset – this is much more meaningful than a file name.
  4. Define Terminology: Remember that metadata can last for years. Try to avoid the use of jargon, and clearly spell out acronyms. Example: Geographic Information Systems (GIS). Use “None” and “Unknown” meaningfully. Example: “None” usually means that you knew about data and nothing existed. “Unknown” means that you don’t know whether that data existed or not.
  5. Be Specific and Quantify When Possible: The goal of a metadata record is to allow a user to acquire enough information about the data to use it without contacting the dataset owner. Let your reader know how well your data is quality controlled: Example of a vague statement: “We checked our work and it looks complete.” Example of a specific statement: “We checked our work using 3 separate sets of check plots reviewed by 2 different people. We determined our work to be 95% complete based on these visual inspections.”
  6. Select Keywords Wisely: A theme keyword is required in metadata. It is recommended that you use a thesaurus to complete this field because you will be able to include broader and narrower terms you may not have thought up on your own. Since scientists search for data differently, this will increase the chance that a metadata record will be discovered. Think of a few terms that describe your dataset, and then look those terms up in a thesaurus to include in your record. Generally, 4-6 keywords will suffice.
  7. Remember a Computer Will Read Your Metadata: Avoid the use of symbols or characters that might be misinterpreted by a computer. Example: < and > are HTML codes. When copying and pasting from other sources, use a text editor to eliminate hidden characters.
  8. Review Your Metadata Record: It is good practice to ask another person to review your metadata record. Edit your record based on comments you might receive. Remember that you are very familiar with your dataset. Someone unfamiliar with the dataset should be able to discern all the information they would need to reuse the data.

Best Practice: Creating a Title in Metadata


Description:
While individuals are searching for the most appropriate datasets on the clearinghouses, they are most likely going to use the title as the first criteria to determine if a dataset meets their needs. Crafting a title that provides meaning and conveys the most information using the least number of words is critical. At a glance, someone reading a title wants to know if the dataset is of interest. Treat the title as the opportunity to sell your dataset. If a title is very vague and simple, the record is less likely to be read.

Best Practice:

  • A complete title includes: What, Where, When, Who, and Scale
    • What:                   What is this dataset about?  For example, is it rivers coverage or is it a roads network?
    • Where:                   Where in the world does this data correspond? Be specific. Is it for North America, Michigan or the city of Detroit?
    • When:                   When was the dataset created? If a user is looking for information created after 2004, the title should tell him or her if the data set matches that criteria or not.
    • Scale:                   At what scale is this dataset most appropriately used? 
    • Who:                  Who created the dataset?
  • When writing metadata, use a title that helps readers locate and interpret your file quickly. Try to:
    • Avoid ambiguity;
    • Consider all the possible interpretations of your word choices;
    • Include as many details as you can so that readers can surmise what is in your data before they go further.

Examples: 

  • Good Title: Greater Yellowstone Rivers from 1:126,700 U.S. Forest Service Visitor Maps (1961-1983)
    • Why is this a good example? This title includes enough information to be informative: Greater Yellowstone (where) Rivers (what) from 1:126,700 (scale) U.S. Forest Service (who) Visitor Maps (1961-1983) (when)
  • Bad Title: Rivers
Why is this a bad title? It does not inform the reader of enough information to determine if the dataset fits his/her needs.

Best Practice: Selecting Informative Keywords for Metadata


Description:
Selecting good keywords is a critical task in creating metadata and essential for locating records quickly and efficiently in repositories. Use of standard thesauri is highly recommended for selection of terms. In metadata standards, a theme keyword is generally a required element, used to describe the topic of the dataset. Place keywords are also highly recommended if applicable to the data. Keywords are a quick and precise way to make a dataset retrievable.

Best Practice:

  • Select 4-6 theme keywords to describe a dataset. Use unambiguous terms: be clear and precise.
  • Use a thesaurus. A thesaurus suggests terms that have broader, narrower, and related meanings.
    • Begin by thinking of an informative keyword.
    • Enter the term into the thesaurus to determine other terms that may more effectively describe the dataset. 
    • Enter these terms into the metadata record.
  • Select keywords for a variety of categories, such as theme and place
    • Select Theme keywords that best explain the topic(s) of the dataset.
    • Select Place keywords that best illustrate where data was collected.
  • Document the thesauri used for term selection in the metadata.

Examples of Thesauri: 

  • Name: Biocomplexity Thesaurus
    Maintained by: Core Science Analytics and Synthesis
    Description: The CSAS Biocomplexity Thesaurus contains over 9,500 terms (broad (BT), narrow (NT), and ‘use-for’ (UF) terms), and is ‘rotatable’ –that is, the thesaurus can be rotated to examine facets of a particular concept by clicking on the hyperlinked terms in the results list.
    URL:
  • Name: USGS Thesaurus
    Maintained by: USGS
    Description: The USGS Thesaurus is the principal controlled vocabulary supporting Science Topics. It is designed as a formal thesaurus conforming to ANSI/NISO Z39.19, with rigid adherence to the hierarchical (BT, NT) term relationships, generic non-hierarchical (RT) relationships, and lead-in term relationships linking non-preferred terms to descriptors either singly (UF) or in a compound USE-WITH relationship. The thesaurus is faceted, meaning its top terms delineate general aspects of information resources.
    URL: http://www.usgs.gov/science/about/
  • Name: Global Change Master Directory
    Maintained by: NASA
    Description: The GCMD holds more than 25,000 Earth science data set and service descriptions, which cover subject areas within the Earth and environmental sciences.
    URL: http://gcmd.nasa.gov/index.html