Skip to main content
U.S. flag

An official website of the United States government

A practical guide to understanding and validating complex models using data simulations

November 18, 2022
  1. Biologists routinely fit novel and complex statistical models to push the limits of our understanding. Examples include, but are not limited to, flexible Bayesian approaches (e.g. BUGS, stan), frequentist and likelihood-based approaches (e.g. packages lme4) and machine learning methods.
  2. These software and programs afford the user greater control and flexibility in tailoring complex hierarchical models. However, this level of control and flexibility places a higher degree of responsibility on the user to evaluate the robustness of their statistical inference. To determine how often biologists are running model diagnostics on hierarchical models, we reviewed 50 recently published papers in 2021 in the journal Nature Ecology & Evolution, and we found that the majority of published papers did not report any validation of their hierarchical models, making it difficult for the reader to assess the robustness of their inference. This lack of reporting likely stems from a lack of standardized guidance for best practices and standard methods.
  3. Here, we provide a guide to understanding and validating complex models using data simulations. To determine how often biologists use data simulation techniques, we also reviewed 50 recently published papers in 2021 in the journal Methods Ecology & Evolution. We found that 78% of the papers that proposed a new estimation technique, package or model used simulations or generated data in some capacity (18 of 23 papers); but very few of those papers (5 of 23 papers) included either a demonstration that the code could recover realistic estimates for a dataset with known parameters or a demonstration of the statistical properties of the approach. To distil the variety of simulations techniques and their uses, we provide a taxonomy of simulation studies based on the intended inference. We also encourage authors to include a basic validation study whenever novel statistical models are used, which in general, is easy to implement.
  4. Simulating data helps a researcher gain a deeper understanding of the models and their assumptions and establish the reliability of their estimation approaches. Wider adoption of data simulations by biologists can improve statistical inference, reliability and open science practices.
Publication Year 2023
Title A practical guide to understanding and validating complex models using data simulations
DOI 10.1111/2041-210X.14030
Authors Graziella Vittoria Direnzo, Ephraim Hanks, David A. W. Miller
Publication Type Article
Publication Subtype Journal Article
Series Title Methods in Ecology and Evolution
Index ID 70250311
Record Source USGS Publications Warehouse
USGS Organization Coop Res Unit Leetown; Patuxent Wildlife Research Center