Grammar to graph—An approach for semantic transformation of annotations to triples
Data annotation is the process of labeling data to show the outcome that a related data model should predict. In this study, annotation data were transformed into semantic graph triples, mainly for use with the Resource Description Framework (RDF), a type of entity-relationship-attribute data model for graph databases. The transformation of annotation data to semantic graph triples provides complex linguistic meaning with data handling advantages such as reduced data storage needs, improved logical specification of relations between objects, and reusable classes and properties that support logic and inference. A grammar-based framework in graph form supports user questions and queries.
The words defining approximately 334 topographic feature types compiled by the U.S. Geological Survey were tokenized as units of analysis and grouped by part of speech. Their dependency relations were identified for this study using natural language processing libraries. Dependency concepts are used as structured semantic relations among part-of-speech classes. Tokens, units equivalent to words, form instances of classes and were quantified within a tabular output format using PostgreSQL data storage software. Table data were logically aligned as triples following a mapping file and stored with an ontology file using Ontop virtual triplestore software. A grammar ontology schema for the data was synchronized to match queries whose results validated the graph’s structure. The text analysis produced 8 part-of-speech classes of content words for object representations and 4 classes of function words for operational applications. Dependency relations formed 27 ontology properties for topographic subgraph structures. Token occurrences shaped overall ontology salience and formed a lexicon of syntactic terms for subgraph objects and properties. The schema ontology of class and property population shapes formed the lexicon of English terms. SPARQL Protocol and RDF Query Language (SPARQL) was used with the lexicon to conform data to RDF guidelines.
This study confirms the hypothesis that although linguistic logic varies from description logic, its approximation applies to ontology design. Property and query use case patterns extracted from the analysis support queries concerning complex topographic relations and patterns normally embedded within text definitions. The method used in this study could be applied to text forms in other domains, such as survey notes.
Citation Information
| Publication Year | 2025 |
|---|---|
| Title | Grammar to graph—An approach for semantic transformation of annotations to triples |
| DOI | 10.3133/sir20255064 |
| Authors | Dalia Varanka, Emily Abbott |
| Publication Type | Report |
| Publication Subtype | USGS Numbered Series |
| Series Title | Scientific Investigations Report |
| Series Number | 2025-5064 |
| Index ID | sir20255064 |
| Record Source | USGS Publications Warehouse |
| USGS Organization | Center for Geospatial Information Science (CEGIS) |