Skip to main content
U.S. flag

An official website of the United States government

A cross-validation package driving Netica with python

October 3, 2014

Bayesian networks (BNs) are powerful tools for probabilistically simulating natural systems and emulating process models. Cross validation is a technique to avoid overfitting resulting from overly complex BNs. Overfitting reduces predictive skill. Cross-validation for BNs is known but rarely implemented due partly to a lack of software tools designed to work with available BN packages. CVNetica is open-source, written in Python, and extends the Netica software package to perform cross-validation and read, rebuild, and learn BNs from data. Insights gained from cross-validation and implications on prediction versus description are illustrated with: a data-driven oceanographic application; and a model-emulation application. These examples show that overfitting occurs when BNs become more complex than allowed by supporting data and overfitting incurs computational costs as well as causing a reduction in prediction skill. CVNetica evaluates overfitting using several complexity metrics (we used level of discretization) and its impact on performance metrics (we used skill).

Publication Year 2014
Title A cross-validation package driving Netica with python
DOI 10.1016/j.envsoft.2014.09.007
Authors Michael N. Fienen, Nathaniel G. Plant
Publication Type Article
Publication Subtype Journal Article
Series Title Environmental Modelling and Software
Index ID 70128127
Record Source USGS Publications Warehouse
USGS Organization Wisconsin Water Science Center