AI-driven science synthesis tools for resource managers
The scientific literature is abundant on drought-related topics in the western United States, including areas of the Colorado River Basin and the sagebrush biome. Scientists, resource managers, and decision makers use this science to assess, predict, and respond to the effects of drought on people and the environment. However, it can be difficult to quickly synthesize large amounts of research. To assist resource managers in streamlining management decisions, we are evaluating the potential applications of tools such as large language models (LLMs) and artificial intelligence (AI) in science synthesis and literature review processes.
The western U.S. is experiencing intensified droughts that are costing the nation billions in agricultural loss and infrastructure damage, and threatening public safety and health through worsening wildfire seasons and reduced water quality.
Decades of scientific research on drought in the West can help predict and mitigate future consequences, but it can be difficult for resource managers to locate and quickly synthesize large volumes of research.
Objectives
We aim to assist resource managers in streamlining decision-making by developing efficient, reliable tools for synthesizing scientific evidence. This includes new software designed to read and process a comprehensive body of literature using various AI approaches, such as machine learning, deep learning, and generative AI. Specifically, we are leveraging cutting-edge technologies developed by proprietary AI platforms that employ LLMs. The workflow includes quality assurance checks at every stage, relying on pre-defined benchmark goals and multi-stage assessments.
Our software is aimed at helping with the following tasks:
- Process abstracts and large bodies of text into structured file formats for accurate reference of content extracted from portable document format (PDFs)
- Identify geographies of study areas
- Rank the relevance of large volumes of literature to a decision-making context
- Identify knowledge gaps related to a decision-making context
- Evaluate relevant literature for patterns and themes
- Provide users with the flexibility to analyze results using additional criteria to select relevant literature
- Generate statistical metrics using external data provided by users and results generated from the aforementioned tasks to evaluate the precision, accuracy, and reliability of AI.
Research Implications
We are developing and testing this software in the context of drought impacts on ecosystem processes, an area with extensive research, particularly for sagebrush ecosystems in the Colorado River Basin. Results will inform management of drought in this region by synthesizing current knowledge on drought in these ecosystems and identifying critical hydrological thresholds associated with ecosystem change.
How are we using AI?
Artificial intelligence is the broad field of using machines that can perform tasks that typically require human intelligence. Large language models (LLMs), machine learning, deep learning, and similar approaches reflect different types of AI.
LLMs generate text by predicting the most likely sequence of words based on patterns learned from extensive training data. Using Retrieval-Augmented Generation, we supply relevant literature entries in a secure environment and pose topical questions. The LLM then uses this augmented context to produce informed responses.
To enable scalable use of local AI models, our software will leverage high-performance computing resources provided by the USGS. Our software also uses an application programming interface (API) to leverage a Federal Risk and Authorization Management Program (FedRAMP)-approved cloud hosting solution. The FedRAMP ensures a standardized security assessment for cloud products that meet national best practices.
Collaborative drought science planning in the Colorado River Basin Collaborative drought science planning in the Colorado River Basin
The scientific literature is abundant on drought-related topics in the western United States, including areas of the Colorado River Basin and the sagebrush biome. Scientists, resource managers, and decision makers use this science to assess, predict, and respond to the effects of drought on people and the environment. However, it can be difficult to quickly synthesize large amounts of research. To assist resource managers in streamlining management decisions, we are evaluating the potential applications of tools such as large language models (LLMs) and artificial intelligence (AI) in science synthesis and literature review processes.
The western U.S. is experiencing intensified droughts that are costing the nation billions in agricultural loss and infrastructure damage, and threatening public safety and health through worsening wildfire seasons and reduced water quality.
Decades of scientific research on drought in the West can help predict and mitigate future consequences, but it can be difficult for resource managers to locate and quickly synthesize large volumes of research.
Objectives
We aim to assist resource managers in streamlining decision-making by developing efficient, reliable tools for synthesizing scientific evidence. This includes new software designed to read and process a comprehensive body of literature using various AI approaches, such as machine learning, deep learning, and generative AI. Specifically, we are leveraging cutting-edge technologies developed by proprietary AI platforms that employ LLMs. The workflow includes quality assurance checks at every stage, relying on pre-defined benchmark goals and multi-stage assessments.
Our software is aimed at helping with the following tasks:
- Process abstracts and large bodies of text into structured file formats for accurate reference of content extracted from portable document format (PDFs)
- Identify geographies of study areas
- Rank the relevance of large volumes of literature to a decision-making context
- Identify knowledge gaps related to a decision-making context
- Evaluate relevant literature for patterns and themes
- Provide users with the flexibility to analyze results using additional criteria to select relevant literature
- Generate statistical metrics using external data provided by users and results generated from the aforementioned tasks to evaluate the precision, accuracy, and reliability of AI.
Research Implications
We are developing and testing this software in the context of drought impacts on ecosystem processes, an area with extensive research, particularly for sagebrush ecosystems in the Colorado River Basin. Results will inform management of drought in this region by synthesizing current knowledge on drought in these ecosystems and identifying critical hydrological thresholds associated with ecosystem change.
How are we using AI?
Artificial intelligence is the broad field of using machines that can perform tasks that typically require human intelligence. Large language models (LLMs), machine learning, deep learning, and similar approaches reflect different types of AI.
LLMs generate text by predicting the most likely sequence of words based on patterns learned from extensive training data. Using Retrieval-Augmented Generation, we supply relevant literature entries in a secure environment and pose topical questions. The LLM then uses this augmented context to produce informed responses.
To enable scalable use of local AI models, our software will leverage high-performance computing resources provided by the USGS. Our software also uses an application programming interface (API) to leverage a Federal Risk and Authorization Management Program (FedRAMP)-approved cloud hosting solution. The FedRAMP ensures a standardized security assessment for cloud products that meet national best practices.