# Estimating species misclassification with occupancy dynamics and encounter rates: A semi-supervised, individual-level approach

April 5, 2022

1. Large-scale, long-term biodiversity monitoring is essential to conservation, land management, and identifying threats to biodiversity. However, multispecies surveys are prone to various types of observation error, including false positive/negative detection, and misclassification, where a species is thought to have been encountered but not correctly identified. Previous methods assume an imperfect classifier produces species-level classifications, but in practice, particularly with human observers, we may end up with extraspecific classifications including unknown', morphospecies designations, and taxonomic identifications coarser than species. Disregarding these types of species misclassification in biodiversity monitoring datasets can bias estimates of ecologically important quantities such as demographic ratess, occurrence, and species richness.

2. Here we present a joint classification-occupancy model that accounts for species non-detection and misclassification. Our framework accommodates extinction and colonization dynamics, allows for additional uncertain morphospecies' designations, and makes use of individual specimens with known species identities in a semi-supervised setting. We compare the performance of our model to a classification-only model that discards information about occupancy and encounter rate. We illustrate our model with an empirical case study of the carabid beetle (Carabidae) community at the National Ecological Observatory Network Niwot Ridge Mountain Research Station, near Boulder, CO, USA. We also use simulations to evaluate model performance through validation metrics where varying fractions of the data are confirmed.

3. The model supported imperfect classifier accuracy and favored certain true species classifications strongly for some morphospecies. The model outperformed (e.g., precision) the reduced model that discarded occupancy information, and these differences were most pronounced for abundant species.

4. Spatial and temporal dynamics from modeled occupancy and encounter rates may inform species misclassification probability, but this idea has not yet been tested. Our statistical framework explores this opportunity, and can be applied to datasets with imperfect species detection and classification, limited verification data, and non-species classifications.