Office of Science Quality and Integrity

E.6 Software

E. Extended Guidance and Specific Products

 

 

E.6.1. What is considered USGS scientific software and what are the USGS requirements for releasing it?

Scientific software is a discrete package of computer code and documentation that contains source code implementing scientific algorithms or producing scientific data. All USGS scientific software products intended for public release are reviewed and approved in accordance with USGS FSP requirements as described in IM OSQI 2019-01. All scientific software must be accompanied by the appropriate disclaimer statement(s). Software is considered noninterpretive information.

E.6.2. What is the difference between approved and provisional software?

USGS releases both provisional or preliminary software and approved software. Provisional or preliminary scientific software, which is subject to change, has yet to be approved as a USGS information product. Release of in-progress scientific software for collaborative or informal sharing through a publicly accessible code repository requires Science Center level approval of the methods and practices used and assurance that personal, private, or otherwise sensitive information is not shared, but does not require tracking through the IPDS or approval as an information product.

E.6.3. How is USGS scientific software released?

Scientific software can be released as either approved or provisional software (refer to IM OSQI 2019-01). Software can be released as a stand-alone product, or released as a separate product associated with another USGS scientific information product, such as a data release or a publication series report or map, or released in association with an outside publication such as a journal article. For more information, refer to the USGS Software Management website.

E.6.4. When is USGS scientific software ready for approval and public release?

Scientific software is ready for approval and public release when a version of the software is no longer under development, includes the appropriate documentation, and has been reviewed in accordance with IM OSQI 2019-01 requirements. USGS software authors, developers, and Science Center Directors determine when to publish a software release.

E.6.5. What types of review are required before approval of scientific software for publication as a software release?

Approval of scientific software requires two reviews—a code review and a domain review as described below. A single reviewer can perform either review or both reviews. The reviewer(s) are selected by the software author with concurrence of a Science Center Director for their qualifications to perform such reviews. Additional information is available in the “Review” section of the USGS Software Management website.

  • Code Review: The review ensures overall software code quality. Typical quality checks include, but are not limited to, coding standards, unit tests passing, user input cleansing, memory leaks, vulnerabilities, and optimizations. Additionally, the code review ensures that personally identifiable information (PII), absolute file system paths, internal server host names or IP addresses, usernames/passwords, and other personal, private, or otherwise sensitive information is not included with the software.
  • Domain Review: The review ensures the software generates output that aligns with published or otherwise well-known expected results. This may involve comparing output with external data sets, comparing algorithms with published scholarly articles about the algorithm, and reviewing unit and integration test results.

E.6.6. What are the approval requirements for USGS scientific software releases?

Bureau approval for USGS scientific software releases is granted by the Science Center Director (or a designee) and documented in the IPDS. The authoritative or original reference copy of all approved scientific software releases must be maintained on a USGS hosting platform such as code.usgs.gov. Nonauthoritative copies of a USGS scientific software release can also be shared on externally hosted platforms, but the associated DOI (refer to FAQ E.6.12) for these copies must point to the repository on the USGS hosting platform. Provisional software is subject to change or revision and may be released without Bureau approval in order to support collaborative software development with colleagues and partners from outside entities. Provisional software can be shared online through a USGS hosting platform; the hosting location for provisional software must be approved by a Science Center Director.

E.6.7. How do FSP requirements for release of scientific software compare to those for release of scientific data?

USGS scientific software and data follow the same basic FSP requirements for review and approval and both are considered noninterpretive information. Both releases are subject to the requirements of the 2013 OSTP directive on increasing access to the results of federally funded scientific research. Similar to FSP requirements for data release products, the FSP requirements for scientific software release products include documentation as described in IM OSQI 2019-01; review and Science Center Director approval tracked in the IPDS before public release; and ultimate release from a USGS-approved hosting platform. A software release, like a data release, can be a stand-alone product or can be released separately in association with an interpretive product, such as a USGS publication series product or a journal article. Data and software each have unique review requirements. A software release requires a coe review and a domain review, whereas a data release requires a data review and a metadata review.

E.6.8. When does a software release product require a version number?

Because software may be further developed and subsequently updated after the original version is released, all revised software should be released with a new version number. It is recommended that all releases follow a version numbering scheme. Although the approach to version numbering can vary, revised software releases can be grouped into three general categories: major revision, minor revision, and patch.

  • Major Revision: This is a numbered software release that includes significant code changes. The changes may: render the software incompatible with previous release dependencies such as operating systems or code reference libraries; include changes to the underlying development framework (for example, Java version or Spring Framework version); result in output that differs from or is incompatible with output from prior versions; or indicate a substantial deviation from a previous major release through a combination of minor releases. Major releases require a new approval record in the IPDS.
  • Minor Revision: This is a numbered software release that includes less significant code changes such as those that include the addition of new features, bug fixes, or incorporate other minor changes, but the software largely (or wholly) remains backward-compatible with the previous software major release.
  • Patch: This is a numbered software release that includes backward-compatible bug fixes only.

E.6.9. Can I contribute code to open source projects, and if so, how should I identify and document my contributions?

Yes. Software, as with other USGS authored or produced data and information that are released publicly, is in the public domain. During software development by USGS employees, including development of scientific software codes, some aspects of the software may be partially or wholly owned by a non-Federal entity such as a university collaborator. In those cases, the other entity may have the ability and desire to maintain intellectual property rights for the software project and could elect to apply a license, including an open source license that restricts public domain access. In these cases, the USGS employees may choose to structure their contributions so that specific components, such as a statistical algorithm, are available in the public domain. It is a best practice, when possible, for USGS software authors and developers to document their contributions in the codebase of the open-source project as those of a Federal Government employee. Commonly, this can be handled effectively through a code management tool such as Git, in which USGS contributions will be made via pull requests from a USGS repository/branch and can include USGS documentation.Conversely, the use of any copyrighted code in USGS software products requires written permission from the copyright holder and a statement indicating the copyrighted material is used with permission (refer to SM 1100.6).

E.6.10. My software release repository requires an open-source license. What can I use? What do I need to know about licenses?

The software developer needs to have an understanding of the repository’s open source licensing requirements; however, the following information is useful to consider.

  • When an official license declaration is required or appropriate to include, the Creative Commons CC0 license may be used (currently CC0 1.0). This assumes the software is either completely original or includes software that also uses the CC0 license.
  • "MIT/X11" is an option when governing jurisdictions (such as those outside the United States) do not recognize the public domain dedication.
  • Another option is the unlicense (unlicense.org).

E.6.11. What is required to comply with Federal Source Code Policy?

To comply with the 2016 Federal Source Code Policy (as described in OMB M-16-21) and USGS FSP requirements, conduct a strategic analysis of mission goals and consider existing open, mixed, and proprietary software solutions free of preconceived preferences prior to starting a new software project. Custom developed code may be considered only if existing solutions do not adequately satisfy USGS needs or the purpose of science and innovation.

With limited exceptions, all source code associated with USGS software releases must be made available at minimum for Federal Government-wide reuse. Source code for USGS software releases must:

  • Include an appropriate open-source license. For contract work, agencies must secure re-use rights sufficient for Federal Government-wide reuse at minimum.
  • Be included in the USGS source code inventory. This requires the source code to be hosted on one of the Bureau-wide source-code repositories and include a code.json snippet.
  • Be accompanied by documentation and other supporting materials sufficient to facilitate reuse and participation by third parties.

E.6.12. Do software releases need a DOI ?

Yes. Approved USGS software releases must be assigned a DOI with the 10.5066 prefix once approved for release. Software release products should be assigned a meaningful resource type in the DOI metadata such as software, model, workflow, or another appropriate term. DOIs can be obtained through the USGS Digital Object Identifier (DOI) Creation Tool. USGS scientists and software developers may employ conventions for indicating discrete versions of software that should be individually cited by assigning separate DOIs for each major version, indicating the version number in the title, and ensuring that the DOI points to an appropriate repository on a “.gov” server (such as USGS OpenSource GitLab) or an access point for the specific version being cited. The USGS inventories official software products cataloged in the DOI registration system, so it is important to ensure that the DOI metadata appropriately describe the software product.

E.6.13. What are some reasons to not use a public repository for developing software?

In general, the open sharing of ideas and code development for both data management and analytical use is becoming the norm across the scientific community. This rationale is supported in the USGS software release IM policy through the concepts of provisional software release, including the use of publicly available repositories (IM OSQI 2019-01). However, all cloud services used in the DOI require FedRAMP authorization. The DOI or USGS can discontinue use of any service that does not meet Federal requirements. In addition, the scientific software IM and related USGS policies contain strict requirements for release. Sharing software in public repositories without adhering to these policy requirements may result in one or more of the following violations:

  • Release of unreviewed scientific interpretive information may violate USGS FSP requirements.
  • Release of scientific results through an informal venue may obviate the ability to publish that material through some external publications (such as peer-reviewed journals).
  • Informal release of information removes the Bureau’s ability to later withhold pre-decisional material sought under Freedom of Information Act requests.
  • Release of security sensitive information or personally identifiable information would violate the USGS Privacy Act Program.

To avoid these and potentially other violations, the software developer may use an internal code repository, such as the internal USGS Git hosting platform, USGS OpenSource GitLab, or USGS InnerSource GitLab, which can be restricted to a specific group of collaborators.

E.6.14. What is the distinction in software release requirements between scripts used to prepare data for analysis or graphical visualizations and models or other packages of source code?

The requirements described in IM OSQI 2019-01 do not address every case where USGS scientific software and related code is produced and released. Rather, the IM policy lays out a set of principles and practices that should be followed in all cases, and additional guidance is provided in these FAQs and other documentation to supplement the IM requirements. It is the author's responsibility to work together with their Science Center Director to determine if a set of source code is appropriate for release as a USGS scientific software release or if the algorithms or statistical methods would alternatively be more appropriate for inclusion in an associated manuscript or the metadata of a data release.

E.6.15. What hosting platforms are available for releasing USGS scientific software?

The authoritative (or reference) copy of an approved USGS scientific software release must be distributed through a “.gov” server (such as USGS OpenSource GitLab) to comply with USGS open access requirements. The USGS software release citation DOI must link to the authoritative USGS source repository. USGS scientific software that is considered provisional may be made available from a public repository with Science Center Director approval if the requirements for safeguarding unpublished USGS information prior to release (SM 502.5) are met. Other services may be used for distributing software if the USGS software release requirements are met. These types of public distribution outlets can make USGS scientific software code more accessible and immediately deployable for use.

E.6.16. Where can I find additional information about managing and releasing USGS software?

Refer to the Software Management website for additional information, including best practices and guidance related to the management and release of USGS software.

 

« Return to FSP FAQs