Use of Controlled Vocabularies in USGS Information Applications: Requirements Analysis for Automated Processes and Services (Bureau-wide Application)

Science Center Objects

Large online data catalogs use controlled vocabularies to categorize datasets in ways that allow end users to sort and select data matching their needs. The eventual goal of this project is to build functional services so that the USGS Thesaurus and other USGS-controlled vocabularies will be available to the English-speaking scientific community, especially within the USGS where they can be use...

Large online data catalogs use controlled vocabularies to categorize datasets in ways that allow end users to sort and select data matching their needs. The eventual goal of this project is to build functional services so that the USGS Thesaurus and other USGS-controlled vocabularies will be available to the English-speaking scientific community, especially within the USGS where they can be used to improve metadata quality and data discovery.



The project team used the Tetherless World Constellation (TWC) Semantic Web Methodology, which is designed to examine use cases and determine both functional and nonfunctional system requirements without prejudicial commitments to meeting those requirements by utilizing particular technologies, platforms, hardware, or software. This iterative process was developed in 2008 by Peter Fox and Deborah McGuinness of the TWC at Rensselaer Polytechnic Institute and has been taught by Fox and his collaborators to members of the project team (Fox and McGuinness, 2008). During the first year of the project, the team developed a set of use cases and a conceptual model and engaged a panel of expert reviewers to evaluate them (fig. 21). Afterward, the resulting system requirements were tested by developing prototype vocabulary services and by modifying an existing USGS metadata tool, the Metadata Wizard, to make use of the vocabularies offered by the new services. The proposal also included modifying the Science Data Catalog to make use of the controlled vocabularies. The project team planned the catalog modifications; however, the plans were not implemented because of a shortage of metadata that included controlled terms. Finally, and largely as a result of input from the expert review panel, the project team drafted a “Controlled Vocabulary Manifesto” that proposes a strategy for full implementation of controlled vocabularies in USGS in order to enable people using USGS data catalogs to be confident that their search results are both comprehensive and focused, with good recall (nothing relevant missed) and good precision (nothing irrelevant included).



The project has taken the first steps toward development of Web services and applications that will enable researchers and data managers to use community-standard vocabularies so that USGS data can be found as easily as possible, especially when people do not already know that those data exist. In the CDI SSF, the project is integrating semantics into Web services and applications principally to support the “Describe” (Metadata) and “Publish/Share” components of the Science Data Lifecycle Model.

Accomplishments

Note: This description is from the Community for Data Integration 2015 Annual Report.