Knowledge Extraction Algorithms (KEA): Turning Literature Into Data

Science Center Objects

Identifying, extracting, and mobilizing information from current and historical literature is a time-consuming part of organizing and collating synthetic data productions. This project explored the use of algorithm-based methods to identify and extract occurrence information from the GeoDeepDive (GDD) literature database to support upkeep of the Nonindigenous Aquatic Species (NAS) data. The Geo...

Identifying, extracting, and mobilizing information from current and historical literature is a time-consuming part of organizing and collating synthetic data productions. This project explored the use of algorithm-based methods to identify and extract occurrence information from the GeoDeepDive (GDD) literature database to support upkeep of the Nonindigenous Aquatic Species (NAS) data. The GeoDeepDive API was extended to include query capabilities on terms from the Integrated Taxonomic Information System (ITIS). This functionality helped support identification of literature mentioning/focusing on species that are tracked by the Nonindigenous Aquatic Species Database. These methods were paired with algorithms to extract location information associated with term mentions. Efforts are in progress to continue improving these algorithms and workflow.