Missing data in ecology: Syntheses, clarifications, and considerations
In ecology and related sciences, missing data are common and occur in a variety of different contexts. When missing data are not handled properly, subsequent statistical estimates tend to be biased, inefficient, and lack proper confidence interval coverage. Missing data are often grouped into three categories: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). We review each category and compare their benefits and drawbacks. We review several approaches to handling missing data including complete case analysis, imputation, inverse probability weighting, and data augmentation. We clarify what types of variables should accompany imputation methods and how those variables are influenced by the analysis methods. Additionally, we discuss missing data that lack a formal basis for measurement and hence are fundamentally different from MCAR, MAR, and MNAR missing data. Throughout, we introduce concepts and numeric examples using both simulated data and data from the United States Environmental Protection Agency's 2016 National Wetland Condition Assessment. We conclude by providing five considerations for ecologists and other scientists handling missing data.
Citation Information
| Publication Year | 2025 |
|---|---|
| Title | Missing data in ecology: Syntheses, clarifications, and considerations |
| DOI | 10.1002/ecm.70037 |
| Authors | Michael Dumelle, Rob Trangucci, Amanda M. Nahlik, Anthony R Olsen, Kathryn Irvine, Karen A. Blocksom, Jay Ver Hoef, Claudio Fuentes |
| Publication Type | Article |
| Publication Subtype | Journal Article |
| Series Title | Ecological Monographs |
| Index ID | 70272202 |
| Record Source | USGS Publications Warehouse |
| USGS Organization | Northern Rocky Mountain Science Center |