Skip to main content
U.S. flag

An official website of the United States government

Grammar to graph—An approach for semantic transformation of annotations to triples

September 2, 2025

Data annotation is the process of labeling data to show the outcome that a related data model should predict. In this study, annotation data were transformed into semantic graph triples, mainly for use with the Resource Description Framework (RDF), a type of entity-relationship-attribute data model for graph databases. The transformation of annotation data to semantic graph triples provides complex linguistic meaning with data handling advantages such as reduced data storage needs, improved logical specification of relations between objects, and reusable classes and properties that support logic and inference. A grammar-based framework in graph form supports user questions and queries.

The words defining approximately 334 topographic feature types compiled by the U.S. Geological Survey were tokenized as units of analysis and grouped by part of speech. Their dependency relations were identified for this study using natural language processing libraries. Dependency concepts are used as structured semantic relations among part-of-speech classes. Tokens, units equivalent to words, form instances of classes and were quantified within a tabular output format using PostgreSQL data storage software. Table data were logically aligned as triples following a mapping file and stored with an ontology file using Ontop virtual triplestore software. A grammar ontology schema for the data was synchronized to match queries whose results validated the graph’s structure. The text analysis produced 8 part-of-speech classes of content words for object representations and 4 classes of function words for operational applications. Dependency relations formed 27 ontology properties for topographic subgraph structures. Token occurrences shaped overall ontology salience and formed a lexicon of syntactic terms for subgraph objects and properties. The schema ontology of class and property population shapes formed the lexicon of English terms. SPARQL Protocol and RDF Query Language (SPARQL) was used with the lexicon to conform data to RDF guidelines.

This study confirms the hypothesis that although linguistic logic varies from description logic, its approximation applies to ontology design. Property and query use case patterns extracted from the analysis support queries concerning complex topographic relations and patterns normally embedded within text definitions. The method used in this study could be applied to text forms in other domains, such as survey notes.

Publication Year 2025
Title Grammar to graph—An approach for semantic transformation of annotations to triples
DOI 10.3133/sir20255064
Authors Dalia Varanka, Emily Abbott
Publication Type Report
Publication Subtype USGS Numbered Series
Series Title Scientific Investigations Report
Series Number 2025-5064
Index ID sir20255064
Record Source USGS Publications Warehouse
USGS Organization Center for Geospatial Information Science (CEGIS)
Was this page helpful?