Core Science Analytics and Synthesis - Metadata
Increasingly scientists are sharing data for the benefit of solving new scientific challenges. This data sharing, and ultimately integration of datasets, cannot occur without a critical process step in the ifecycle of data management: creation of metadata. Metadata is defined as documentation that explains the data in a detailed and standardized way.
Understand Data: Consider the following scenario: Scientist A requests observational data from Scientist B. The data arrives in the form of a tabular spreadsheet containing hundreds of columns, with each column header neatly labeled with a code. Further, each column contains multiple cells containing various units of measure.
Without an accompanying metadata record, Scientist B will spend valuable time trying to interpret this data and may not succeed. Communication will need to occur with Scientist A to gather the information needed to reuse the dataset. In this scenario, Scientist A is available, but there are many situations in which a scientist is not necessarily available to address questions about data. This data then becomes useless. It cannot be interpreted.
However, a detailed metadata record accompanying the data immediately solves this problem. It contains the meaning of each code, the unit of measure contained in the column, and the domain of units to expect. A metadata record will also contain other valuable information about the dataset: why it was created, how, when, and where the data was gathered, if there are any gaps in the data, what quality checks the data went through, other sources that were used to create the dataset, and how it should be cited. This record allows the data to be reused for purposes that may not have been foreseen when it was collected. This allows the advancement of science to occur.
Avoid Data Duplication: Development of a dataset is a time consuming and costly endeavor. By merely making metadata available and discoverable, these records allow scientists to determine what data already exist and avoid duplication of effort.
Share and Access Reliable Information: Metadata records allow scientists to share reliable information with ease and find out how to access it.
Evaluate Data: A metadata record allows a scientist to quickly determine if a dataset is appropriate for use in a project.
Reduce Workload: Creating a metadata record creation requires some work up front. However, when a data call comes in for data that a scientist created years before, a metadata record will provide the details about that data that may have otherwise been long forgotten.
Make Data Transcend People and Time: A metadata record allows the data to remain usable once the data developer has moved onto other projects. It ensures investment in the data by providing information that allows it to be used indefinitely.
Institutional Memory: Metadata creates institutional memory for organizations, allowing an organization to have accessible knowledge of all the work it has produced.
Metadata has value for data users, data developers, and organizations. No dataset should be considered complete without accompanying metadata. Data without metadata is useless.