USGS - science for a changing world

USGS Data Management

Plan > Data Templates

U. S. Geological Survey Data Lifecycle Diagram Plan Acquire Preserve Publish/Share Describe (Metadata and Documentation) Manage Quality Backup and Secure
USGS Data Lifecycle Diagram

Data Templates

Creating Data Templates for data collection, data storage, and metadata saves time and increases consistency. Utilizing form validation increases data entry reliability.

Why Use a Data Template?

Key Points

  • Data templates standardize procedures and processes of a dataset that is handled by more than one person.
  • Through the use of error checking and controlled vocabulary, using data templates will help reduce the number of errors made during data collection.
  • Develop data templates before collecting your data.
  • Data entry templates:
    • specify the columns, rows of the data to be collected.
    • establish how 0s and missing data will be handled.
    • define the data type for every column or row.
    • include appropriate column and row header names.
    • include the units (for measured data) or datum (for geographic data).
    • use predefined formats for geographic coordinate values.
    • define any codes and acronyms.
    • can reduce human error by establishing strict data entry rules (e.g. forms).
  • Before assigning new codes to data, be sure to check to make sure standards codes haven't already been established.
  • To check data after it has been entered, use either a second person to reenter the data (and check against the first data entry) and/or use a software program to read back the data entered.
  • Plot or graph data to check for unexpected outliers.
  • Data storage and metadata templates also help standardize procedures for handling data after data entry.

Prior to the data collection process, a plan should be developed for each step of the data collection, data entry, and data storage process. In general, more than one person is involved in any of these three activities and potential human error can reduce the quality of the data. Therefore, creating templates allows these processes to be controlled and standardized.

Templates During Data Entry

Before data collection, planning how the data will be recorded into the template is vital. This includes ensuring that the template is consistent, has clearly defined headers, rows, and terms, and has mechanisms to reduce human error. These mechanisms are typically called forms within a data processing program such as Microsoft Excel, Microsoft Access, Google Forms, or OpenOffice Calc (see "Tools" below for more information).

Forms can be created in any of these programs so that they only allow a certain data type to be entered into a field. If the data are entered incorrectly, the form will reject such data entry. Forms can also constrain fields in the form of pull-down menus which use a controlled vocabulary to ensure that the data are entered in the correct locations.

When creating the form, give special consideration to how to label the columns and headers for the form. Good values are ones that are long enough to describe the data included in the cells below or next to it, but are short enough to make readability easier by either humans or machines.

Avoid using spaces or special characters (like * or +) in the name, as it may cause issues when inputting the data into a database.

When entering data that has a unit associated with it, be sure to include that unit in the header (such as "Height (cm)"). Also, include the datum for any geographical data (such as "NAVD88").

After Data Entry

After data have been collected, there are several things that can be done to check data entry. Either while the data are being entered in the first time, or afterwards, consider having another person reenter in the data. This facilitates a comparison between the data entered to make sure they are a match. Use a program to read the data back. This is useful as it allows one to check the data that have been entered into the system against the original dataset. Check the beginning and ending portions of your dataset for errors, and then randomly spot-check other values throughout the form. Lastly, consider graphing or mapping the data to ensure there are no unexpected outliers.

When using a template to store data, various spreadsheet programs can help process and visualize the data. However, some of these processes can cause problems. One example is that it is possible to select and sort a user-specified column from A-Z or vice-versa. Unless one selects the option to include all other columns, only one column will be sorted, leaving the others in an unsorted state. This means the connections between the various columns will be lost.

Data Storage and Metadata

Templates for data storage and metadata are also useful for standardizing procedures. Templates for data storage specify how to name files and folders consistently and stipulate where the data should be kept, either for short- or long-term storage.

Similarly, metadata should have a standard template. As notes and information about the dataset are added over time and by various people on the team, creating a standard metadata template that specifies how this information is entered will keep the metadata organized and easily readable.

Best Practices

  • Data Templates
    • Use forms when entering data so that some QA/QC is automatically taken care of during data entry.
    • Consistency
      • Be sure that the columns of data have only numbers, dates, or text and contain the same information.
      • Names, codes, and formats must be consistent.
      • Dates and geographic data should use the same format and datum (for geographic data).
    • Efficient organization - Data that are organized efficiently can be read by statistical packages.
    • Descriptive names
      • Column names should be descriptive and should not contain strange symbols.
      • The file name should be descriptive and concise.
    • Standardize how missing data are represented. Decide how you are going to handle missing/null values before entering data. Be sure if you use a specific value to represent missing data, there is no chance that value will be used to represent actual data at a later time (e.g., using 9999 to represent missing data).
      • Use a separate column to track and describe missing data across the rows.
    • Create forms (i.e., in Google Forms, Microsoft Excel, Microsoft Access, or OpenOffice Calc) to control how the data are entered.
      • Only allow certain data types (i.e., numbers, letters) to be entered in designated columns so that if the data type is entered incorrectly, the form will reject the entry.
    • Consider having more than one person enter the data. Then you can compare the data entered to make sure it is a match.
    • Using a program that can read data back to you is useful as it allows you to check the data that have been entered into the system against the original dataset.
    • Check the beginning and end portions of your dataset for errors, and then randomly spot-check other values throughout the form.
    • Consider graphing or mapping your data to ensure there are no unexpected outliers.

  • Long-term Storage
    • Use file formats consistently, and preferably formats that will remain readable in the long-term
      • ASCII, UNICODE, non-proprietary, unencrypted, uncompressed
      • Use comma-delimited ASCII files to represent tabular data
Disclaimer: Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S. Government.

Tools

References

Can't open pdf files? Get Adobe Acrobat Reader.

Accessibility FOIA Privacy Policies and Notices

Take Pride in America logo USA.gov logo U.S. Department of the Interior | U.S. Geological Survey
URL: http://origin-www.usgs.gov/datamanagement/plan/datatemplates.php
Page Contact Information: Email Us
Page Last Modified: Tuesday, April 08, 2014