Office of Science Quality and Integrity

E.2 Data

E. Extended Guidance and Specific Products

 

 

E.2.1. What are some examples of a dataset and a database?

Aggregated data received from an analytical laboratory for field samples or measurements made directly during fieldwork are both examples of datasets.  If a number of datasets are combined together into a searchable product or defined system, this product or system is an example of a database regardless of whether a formal database management system is used. A geologic map has a geospatial dataset, and when this dataset is combined with other regional datasets, the result is another example of a database. The National Water Information System (NWIS) is a database. Data retrieved from NWIS (such as a table of data) are a dataset.

 

E.2.2. What are the requirements related to planning and conducting data collection and research?

USGS data collection and research activities are governed by work plans that are reviewed by appropriate experts and approved at some level higher than the project chief, generally by the Science Center Manager or equivalent. A work plan, which can be a component of a proposal, is handled through the Bureau planning process. Proper documentation is required to ensure that scientific goals are achievable and are appropriate to the mission of the USGS, and that research can be interpreted appropriately. Data collection and research activities are carried out in a consistent, objective, and replicable manner that has been vetted through a vigorous and open process of peer review (refer to SM 502.2). Scientific information products resulting from data collection and research activities, regardless of the outlet in which they are published, must follow the appropriate requirements for review, approval, and release.

 

E.2.3. What Federal Government policies require the release of scientific data and how does the USGS intend to meet these requirements?

The OSTP's February 22, 2013, memorandum Increasing Access to the Results of Federally Funded Scientific Research requires that all Federal Government agencies with a research budget greater than $100 million must develop and implement a plan to support increased public access to the results of federally funded research. Agencies must ensure that the public can read, download, and analyze, in digital form, final peer-reviewed manuscripts or final published documents within a timeframe that is appropriate for each type of research conducted or sponsored by the agency. Further, OSTP requires digitally formatted scientific data resulting from unclassified research supported wholly or in part by Federal funding to be stored and publicly accessible to search, retrieve, and analyze. Also refer to SM 502.8, which describes the requirements for review and approval of USGS scientific data prior to release. Additionally, the OMB's May 9, 2013, memorandum M-13-13, Open Data Policy—Managing Information as an Asset requires agencies to collect or create scientific information in a way that supports downstream information processing and dissemination activities, including using machine-readable and open formats, data standards, and common core and extensible metadata for all new scientific information creation and collection efforts. The Bureau's web page Public Access to Results of Federally Funded Research at the U.S. Geological Survey provides information related to how the Bureau meets these OSTP and OMB requirements and includes a link to the USGS Public Access Plan. The USGS Public Access Plan requires that digital data, upon which scholarly conclusions in USGS funded publications are based, be made available no later than the time of publication of those scholarly conclusions in conformance with applicable USGS data management policies.

 

E.2.4. How do I reference and cite the data supporting my publication?

Include a complete bibliographic citation for the data source in the references section of the publication. Example data citations are available on the USGS Data Management web site in the section titled "Citing Your Data." Cite the data source in the body of the report the same way you cite other references.

 

E.2.5. What is a USGS author's obligation when data collected by an outside source are used (with permission) in a USGS scientific information product and have not been publicly released by the data collector, and who is responsible for releasing the data?

Data used in USGS science information products should be made widely available to help ensure the accuracy, validity, and reproducibility of the scientific results. USGS scientists must ensure that the data associated with their research have proper acknowledgment regarding how the data were collected, where the data will reside, who will release the data, and how the data will be released. This information about data release must be described in data management plans (DMPs) that are included in associated research project plans (refer to guidance on developing DMPs). If the party collecting the data is another Federal agency, that agency has the primary responsibility for releasing the data according to their specific requirements. Refer to Guide to Data Releases With or Without a Companion Publication for six scenarios that describe data release obligations according to roles of USGS scientists and project funding arrangements.

Provisions for handling proprietary data and information, that is, data that cannot be released to the public for specific reasons, are found in SM 502.5. The author must ensure discussion about data release takes place with the data collector prior to signing any cooperative or collaborative agreement to use proprietary data, and decisions about these data releases should also be reflected in the DMP.

Data that are part of a USGS science data information product or used in interpretive work are subject to Freedom of Information Act (FOIA) requests. If the data are considered Federal records, we must comply with requirements related to responding to FOIA requests. Contact the USGS FOIA Officer for additional guidance.

 

E.2.6. If a non-USGS lead author does not release data collected using Federal funds, is the USGS coauthor responsible for providing public access to those data?

The OSTP and OMB requirements for open data apply to data collected using Federal funds. Regardless of authorship, if the research was federally funded, then the funding agency is responsible for providing public access to those data. If the research is not federally funded, then the non-USGS lead author is not required to but has discretion to release the data to the public. It is common practice throughout the scientific publishing community to release the data upon which scholarly conclusions are based. Major publishers including Science, Nature, American Geophysical Union, Elsevier, and Wiley require access to the data upon which scholarly conclusions are based as a condition for publication.

 

E.2.7. Who owns the data collected during research or produced as a scientific information product on behalf of the USGS?

Data collected on behalf of the USGS or by using USGS funds belong to the USGS and not to the individual who collected the data (for example, a USGS employee, student, emeritus or other volunteer, or contractor). If a USGS employee is under contract with a cooperator to collect data funded by the cooperator, the DMP should specify the data ownership, the distribution rights for USGS use of the data, the data preservation responsibilities, and the party responsible for providing the data to the public (refer to SM 502.6 and to guidance on developing DMPs).

 

E.2.8. What outlets are available for releasing data?

The preferred path for USGS data release is through USGS data repositories or portals, such as Science Base, NWIS, or Biodata. The goal is to ensure that the USGS maintains the authoritative copy of the data it releases. The USGS has guidance available on acceptable digital repositories for releasing USGS data at Standards for Establishing Trusted Repositories for USGS Digital Assets. This guidance includes a list of repositories that will be updated as additional repositories are deemed acceptable.

 

E.2.9. How are raw data handled?

"Raw data" refers to digital and nondigital data that are unprocessed and unverified. Examples include field observations and unaltered output from sensors. Retention of raw data is important in support of reproducible science and for recovering from processing errors. Raw data must be archived according to the USGS records disposition schedule and can be released as either provisional or approved data according to the USGS policy on data release (SM 502.8). Raw data may also be subject to FOIA requirements. In the event such data are requested, contact the USGS FOIA Officer for additional guidance.

 

E.2.10. What about using non-Federal data repositories to provide or host the required public access to my data?

Use of non-Federal repositories is acceptable as described at Standards for Establishing Trusted Repositories for USGS Digital Assets. The authoritative copy of the data, however, must be hosted on USGS servers or a federally maintained data service (refer to SM 502.9). For established agreements with the USGS, these arrangements, including a hosting agreement, need to be clearly spelled out in the DMP. In all cases, a metadata record, as described in SM 502.7, must be included in the USGS Science Data Catalog that includes a DOI link back to the data source, regardless of where the data reside or are hosted.

 

E.2.11. What are the recordkeeping requirements regarding research activities?

Documentation and recordkeeping requirements associated with data collection and research activities are found in the USGS Mission-Specific Disposition Schedules and General Records Disposition Schedules.

 

E.2.12. What data qualify as USGS funded scientific data and therefore are subject to USGS open data requirements and must be released to the public?

USGS funded scientific data are those data collected with federally appropriated funds provided to the Bureau as part of the congressionally enacted USGS budget and therefore are subject to USGS requirements for data release (refer to SM 502.8). Data collected by USGS scientists with funding from another Federal agency or a non-Federal cooperator are not considered USGS funded data, and the responsibility for providing public release of these data should be clearly specified in the contract or agreement with the other agency or cooperator as well as in the USGS project data management plan. All final scientific data either released by the USGS or provided to research partners under cooperative or other collaborative agreements must first be reviewed for quality and accuracy and must include complete metadata.

 

E.2.A. Data Management Planning

E.2.A.1. What are data management plans or DMPs and why do I need to create them?

A DMP is a document that outlines the data management considerations of a given project. A DMP describes intended actions for acquiring, processing, analyzing, preserving, publishing/sharing, describing, managing the quality of, backing up, and securing USGS data holdings. The document describes where and how you will acquire data, what standards you will use, and how data will be handled and protected during and after the completion of the project. The DMP is created before the project begins and is updated throughout the research process, as needed, to reflect the reality of project activities (SM 502.6). The USGS has a responsibility to steward its data to meet OMB open data requirements for managing Federal Government scientific information as an asset throughout the lifecycle. USGS researchers must create DMPs to help meet this responsibility. This planning enables Bureau management to anticipate the need to provide infrastructure and other support for scientific data.

 

E.2.A.2. What information is included in a DMP?

A DMP includes information about the data and metadata standards to be used and intended actions for acquiring, processing, analyzing, preserving, publishing/sharing, describing, managing the quality of, backing up, and securing data holdings. The DMP captures the point of contact for the project and its data, and assigns roles to people who will be responsible for data management, updating the DMP, and generating metadata and other documentation. Descriptive information about the expected data input and output from the project is also important to include in the DMP, such as the estimated volume of the data and the format of the data and the accompanying metadata. With regard to length, the DMP should be as long as it needs to be to fully describe the data management activities on a given project. Once a project proposal has been accepted and the project is underway, the DMP is expected to be updated throughout the length of the project.

 

E.2.A.3. What is the relationship between a project work plan and a DMP?

The overall project work plan of every research project (as discussed in SM 502.2) must include a DMP. Project work plans are broad in scope and cover all aspects of a project including project purpose, significance, methodology, staffing, budget, timelines and deliverables. DMPs are focused on the data-related aspects of the project. Both project and data management plans are essential and should be maintained as project documents.

 

E.2.A.4. Where can I find some DMP examples?

DMP examples from various institutions can be found at https://dmptool.org/. Additional examples and data management planning guidance can be found on the USGS Data Management Plans web page.

 

E.2.A.5. What tools are available to help me create my DMP?

The USGS Data Management website provides guidance on developing DMPs and understanding data management best practices. The Data Management Planning Considerations Checklist can be used to help ensure that all issues that may affect your data have been addressed in your DMP. Additionally, the USGS has partnered with the DMPTool to develop a DMP template. The DMPTool is a free, web-based application that is also used by several Federal agencies including the National Science Foundation and the National Institutes of Health. The tool presents each section of a DMP and allows you to save, preview, and export your plans as well as share your DMPs with collaborators.

 

E.2.A.6. Can a Science Center-wide plan be leveraged for my project-specific DMP?

Yes. A number of centers have developed guidance for how their scientists should be managing their data. Any center-specific information on how data are managed should be considered when writing your DMP. Every project is unique and will have particular data management requirements, however, so it is important that those project-specific details are captured in the DMP. No single, center-wide plan will be completely applicable for every project within a Science Center.

 

E.2.A.7. What if I do not know all the details of how my data will be managed?

A data management plan is just that, a plan to get all the various actors in your program or center thinking about and committed to data management responsibilities before a project begins. You may need to get data management input from the principal investigator, co-investigators, data collectors, data analysts, information technology (IT) staff, modelers, geographic information systems (GIS) staff, and metadata experts, as each party involved may bring certain expertise related to specific aspects of the plan. If data management details change, your DMP should be updated to reflect those changes.

 

E.2.A.8. Who is responsible for ensuring that DMPs are developed and implemented for each project within a center or office?

Science Center Directors or their designees ensure compliance with data management requirements for data produced in their centers or offices and consult with their ADs, RDs, Managers (program and project), scientists, and others on their staff as needed with regard to carrying out data management activities, including ensuring the development of DMPs. They also assign or ensure the assigning of data managers to oversee or steward the lifecycle activities of their respective data products (SM 502.6).

 

E.2.B.  Metadata

E.2.B.1. What are metadata?

The term “metadata” refers to documentation of important aspects of data that describe where, when, and why the data were collected; who collected the data; what types of data were collected; what processes were used to create the data; what quality assurance controls were used; and where the collected data are located. Metadata are provided in a human-readable form as well as in a format that is machine readable (for example, XML) for automated use.

 

E.2.B.2. Why do we need metadata for data?

Metadata enable users to find, understand, and reuse the data, thus extending the life of the data. In addition, a metadata record is required by the USGS for including data in the Science Data Catalog. Federal Government mandates (including Executive Order 12906 and OMB Circular A-16) and USGS policy SM 502.7 require metadata as an integral part of data released to the public.

 

E.2.B.3. What do metadata records look like?

Examples of metadata for data include those found at https://archive.usgs.gov/archive/sites/sofia.usgs.gov/metadata/index.php.html and other examples on the USGS Data Management web page at https://www.usgs.gov/datamanagement/describe/metadata.php#hide-FGDC-CSDGM-Standard-Metadata-Examples.

 

E.2.B.4. How do I create metadata?

Various tools for creating metadata are available on the USGS Data Management metadata web page. For example, the Online Metadata Editor (OME) helps you create a valid FGDC metadata record by compiling the answers you provide to common questions about your data. The OME tool can be used to start new records, upload and edit existing records, and save completed or ongoing records to the database or directly to your desktop.

 

E.2.B.5. What is a metadata review, and who can perform it?

A metadata review includes both checking for compliance with metadata standards by using a recommended metadata validation tool and performing quality checks. A minimum of one metadata review by a qualified reviewer is required for all USGS scientific data prepared for release. The role of the metadata reviewer is to evaluate the accuracy, completeness, and usability of the metadata. The metadata review can be conducted as part of the peer review (SM 502.3) or data review (SM 502.8), or it can be conducted separately as appropriate. Science Center management determines who serves as qualified metadata reviewers for the data produced by authors in their Centers. A metadata checklist that provides guidelines to reviewers is available. Additional information on metadata reviews is available on the USGS Data Management Metadata web page. A written report of all metadata reviews (reviewer comments and how they were reconciled) must be included in the internal IPDS review package that is submitted for Bureau approval as described in SM 502.7.

 

E.2.B.6. When do I create metadata?

A metadata record needs to be finalized and disseminated when the data are ready to be released to others. Authors should develop an approach for compiling the metadata record at the data-management planning stage as described in SM 502.6. Metadata are collected, used, and revised as a living resource throughout the data lifecycle. Therefore, metadata creation should be started as soon as the project begins. When recorded throughout the lifecycle of data, the metadata information is likely to be more accurate and will require less effort than if it is recorded at the end of the project. The metadata information must be updated periodically to document any changes to the data, such as corrections or additions.

 

E.2.B.7. I have a lot of data packaged in different datasets and databases. For what packages of data do I produce a metadata record?

It depends on how the data will be used. You need a metadata record that describes the data package that will be cited, which is generally also the package that will be searched for in the Science Data Catalog and public search engines. Additional metadata records might be needed for separate parts of data packages that have different creation or processing details.

 

E.2.B.8. Are metadata records required for any size dataset?

There is no established size for a dataset that prescribes when a metadata record is required. A separate metadata record may not be needed if only a few sample results are presented in their entirety in a published table. However, if the table contains analytical or summary results, or a larger set of data is used from which a small number of records are extracted to create a table for a USGS series or outside publication, then it is appropriate to also have a metadata record for that larger dataset. When data are released, they must be accompanied by a metadata record. If a product uses a subset or summary of separately released data, no additional metadata record is needed.

 

E.2.B.9. Are metadata records needed for scientific datasets and databases that are provided by non-USGS authors and are subsequently included in USGS datasets, databases, or publications?

Generally, yes, because these items become part of USGS scientific information products. When incorporated into USGS information products, these datasets or databases from non-USGS sources need to comply with USGS data release requirements, including review and approval of data and documentation of the source data. This is important because metadata records establish the provenance of incorporated data and include a link to the original source data. If sufficient metadata do not exist, record(s) should be created and included as part of the new information product's data package or cited in the metadata for the new USGS data product.

 

E.2.B.10. Are the output data generated by a model simulation also subject to the metadata requirement?

Yes, model simulation data that will be made publicly available through the data release process need metadata. Source data used for the model should be well documented and cited in the metadata to allow the work to be understood and replicated by others.

 

E.2.B.11. Do summary data tables in scientific information products such as USGS publication series products or outside publications (for example, journals) need metadata?

No. The data behind the summary table, however, if not also presented in the body of the product, will need metadata and will need to go through the data release process.

 

E.2.B.12. Is USGS SPN editing required for metadata records?

No, an SPN editorial review is not required, but Science Centers have the option of obtaining such a review as described in SM 1100.2 for any product as they deem appropriate.

 

E.2.B.13. Where do the metadata records go once we have created them?

A copy of the metadata record must stay with its associated data. Upon formal release of the data, copies of metadata records for all USGS data-related products, including non-geospatial data, must also be placed in the USGS Science Data Catalog. For more information about how to include metadata in the Science Data Catalog.

 

E.2.B.14. If my data are associated with a publication, where does the DOI pointing to that publication get placed in the metadata?

The DOI for the associated publication should be placed in the “Cross-referenced: Citation: Online Linkage:” element of the metadata record. To be an actionable link within the metadata record, the DOI must follow this format example: https://doi.org/10.3133/sir20155010.

 

E.2.B.15. If I cannot explain how the data were created in the metadata record, where do I place the URL or DOI for the product that describes the data creation process?

This information should be added to the “Supplemental Information:” element of the metadata record.  To be an actionable link within the metadata record, the DOI, if used, must follow this format example: https://doi.org/10.1029/2005WR004455.

 

E.2.B.16. Where can I find additional guidance or information about metadata?

Additional guidance on metadata creation, quality control and content review, tools, and best practices is available on the USGS Data Management Metadata web page.

 

E.2.C. Data Products

E.2.C.1. What are the review and approval requirements for releasing scientific data to the public?

Data intended for public release are subject to USGS FSP review, approval, and release requirements. These requirements include one data review and one metadata review followed by Bureau approval documented in the IPDS as described in SM 502.8. Data are never placed in the IPDS—only the documentation of the required metadata review and data review and any necessary reconciliation are placed in the IPDS as part of the approval package. Data are approved for release by Science Center Directors or their designees. USGS scientific data are considered noninterpretive; however, the scholarly publications associated with the data that describe the process used to create data, if interpretive and previously unpublished, must be peer reviewed and are approved by BAOs in the OSQI (refer to SM 205.18). Additional information about USGS scientific data is available at Distinctions between New Research or Interpretive Information Products, Previously Published or Noninterpretive Information Products, and Scientific Data.

 

E.2.C.2. What outlets are available for releasing data?

The preferred path for USGS data release is through USGS data repositories or portals, such as ScienceBase, NWIS, or BioData. The goal is to ensure that the USGS maintains the authoritative copy of the data it releases. The USGS has guidance available on acceptable digital repositories for releasing USGS data at Acceptable Digital Repositories for USGS Scientific Publications and Data. This guidance includes a list of repositories that will be updated as additional repositories are deemed acceptable.

 

E.2.C.3. Can I login and enter data into the USGS ScienceBase before those data have been approved for release?

Yes. Under the manage option in ScienceBase, users can enter data and keep the data private, that is, available only to them and others within the USGS. Data can be added to ScienceBase at any time and can be used as a resource during the data review process prior to release. Once the data are approved for release, the same manage option can be used to make the data public.

 

E.2.C.4. Where do I get a DOI for USGS data that has been approved for released?

For specific guidance on DOIs, refer to Data Management Preserve.

 

E.2.C.5. Where can I find additional guidance related to releasing USGS scientific data?

Additional guidance is available on the USGS Data Management web page and the FSP web page.

 

E.2.C.6. What Federal Government policies require the release of scientific data, and how does the USGS intend to meet these requirements?

The OSTP's February 22, 2013, memorandum Increasing Access to the Results of Federally Funded Scientific Research requires that all Federal Government agencies with a research budget greater than $100 million must develop and implement a plan to support increased public access to the results of federally funded research. Agencies must ensure that the public can read, download, and analyze, in digital form, final peer-reviewed manuscripts or final published documents within a timeframe that is appropriate for each type of research conducted or sponsored by the agency. Further, the OSTP requires digitally formatted scientific data resulting from unclassified research supported wholly or in part by Federal funding to be stored and publicly accessible to search, retrieve, and analyze. Also refer to SM 502.8, which describes the requirements for review and approval of USGS scientific data prior to release. Additionally, the OMB's May 9, 2013, memorandum M-13-13, Open Data Policy—Managing Information as an Asset requires agencies to collect or create scientific information in a way that supports downstream information processing and dissemination activities, including using machine-readable and open formats, data standards, and common core and extensible metadata for all new scientific information creation and collection efforts. The Bureau's web page Public Access to Results of Federally Funded Research at the U.S. Geological Survey provides information related to how the Bureau meets these OSTP and OMB requirements and includes a link to the USGS Public Access Plan. The USGS Public Access Plan requires that digital data, upon which scholarly conclusions in USGS funded publications are based, be made available no later than the time of publication of those scholarly conclusions in conformance with applicable USGS data management policies.

 

« Return to FSP FAQs