Skip to main content
U.S. flag

An official website of the United States government

Models of high-dimensional environmental or ecological data

Link to PDF Version.


So-called generalized linear mixed models have become popular among scientists and applied statisticians for fitting models of environmental and ecological processes. However, these models are limited in their abilities to address i) unexplained variation at cluster scales (e.g., organism or chemistry variables might be sampled at multiple locations in each of multiple lakes) and ii) multiple outcomes (e.g., multiple species, multiple dissolved nutrients or multiple chemical contaminants). Models that pertain to distributions of multiple species are often termed joint species distribution models. An intern with a statistical or mathematical interest could further develop existing multivariate models or, if a scientific interest, use already-developed multivariate models to tackle natural resource questions that are multivariate in nature. The intern could also employ machine learning models.

Project Hypothesis or Objectives:

The objective of this project may be theoretical or applied. A theoretical objective would be statistical and focus on elaborating current methods for making inferences or predictions from multivariate and moderately high-dimensional data, often consisting of regular and irregular time series. Such an approach would entail evaluations of a proposed method using simulated data and an example environmental or ecological dataset. An applied or science-driven approach would focus on using recently-developed computational methods that have seen little application with natural resource questions (e.g., joint species distribution models). A project could also have both theoretical and applied components. An example would be the development and/or use of models with counts from multiple fish species to estimate fish community associations with environmental predictors—or
similar but with detections of multiple plant species. Or elaborate current methods to address left censoring of multiple chemicals (eg nutrients in freshwater or PCBs in fish tissue).

Duration: Up to 12 months

Internship Location: La Crosse, WI (conceivably virtual)

Field(s) of Study: Life Science, Computing, statistics, data science

Applicable NSF Division: Biological Sciences; Computer and Information Sciences; Geological Sciences; Mathematics, Physics, and Astronomical Sciences

Intern Type Preference: Any Type of Intern

Keywords: Biology/microbiology/biochemistry; Chemistry/geochemistry;Computer/Data science; Ecology/Ecosystems; Environmental Health; Hydrology;Modeling; Population Dynamics; Statistics

Expected Outcome:

Expected outcomes include exposure of intern to natural resource science practice in the USGS; evaluation of methods for use with multivariate and clustered data (theoretical objective) or use of recently-described methods to obtain inferences from multivariate natural resource data; and one or more peer-reviewed publications.

Special skills/training Required:

Familiarity with statistical methods (including probability distributions, and generalized linear models) and/or machine learning, and analytical software (e.g., R or SAS); if a statistical approach will be taken, then the intern will need to have completed graduate-level mathematical statistics courses. Modest familiarity with hydrology, ecology or inorganic chemistry is not required but would be helpful. 


The intern will develop models appropriate for use with multivariate and clustered environmental or ecological data or use already-developed models to fit such models in a new ecological or environmental setting. Models will be evaluated using statistical or mathematical software and both simulated and measured data. The intern will have the opportunity to work in a natural resource setting, to interact with natural resource scientists, and to share findings with partner natural resource agencies and the scientific community, the latter via one or more peer-reviewed publications.