An important component in the fields of ecology and conservation biology is understanding the environmental
conditions and geographic areas that are suitable for a given species to inhabit. A common tool
in determining such areas is species distribution modeling which uses computer algorithms to determine
the spatial distribution of organisms. Most commonly the correlative relationships between the organism
and environmental variables are the primary consideration. The data requirements for this type of
modeling consist of known presence and possibly absence locations of the species as well as the values
of environmental or climatic covariates thought to define the species habitat suitability at these locations.
These covariate data are generally extracted from remotely sensed imagery, interpolated/gridded
historical climate data, or downscaled climate model output. Traditionally, ecologists and biologists
have constructed species distribution models using workflows and data that reside primarily on their
local workstations or networks. This workflow is becoming challenging as scientists increasingly try to
use these modeling techniques to inform management decisions under different climate change scenarios.
This challenge stems from the fact that remote sensing products, gridded historical climate, and
downscaled climate models are not only increasing in spatial and temporal resolution but proliferating
as well. Any rigorous assessment of uncertainty requires a computationally intensive sensitivity analysis
accounting for various sources of uncertainty. The scientists fitting these models generally do not have
the background in computer science required to take advantage of recent advances in web-service based
data acquisition, remote high-powered data processing, or scientific workflow systems. Ecologists in the
field of modeling are in need of a tractable platform that abstracts the inherent computational complexity
required to incorporate the burgeoning field of coupled climate and ecological response modeling.
In this paper we describe the computational challenges in species distribution modeling and solutions
using scientific workflow systems. We focus on the Software for Assisted Species Modeling (SAHM) a
package within VisTrails, an open-source scientific workflow system.
|Title||Data management challenges in species distribution modeling|
|Authors||Colin Talbert, Marian Talbert, Jeffrey T. Morisette, David Koop|
|Publication Subtype||Journal Article|
|Series Title||Bulletin of the Technical Committee on Data Engineering|
|Record Source||USGS Publications Warehouse|
|USGS Organization||Fort Collins Science Center|