Skip to main content
U.S. flag

An official website of the United States government

Study of L-kurtosis and several distribution families for prediction of uncertainty distributions, An applied software technical note concerning L-kurtosis use in daily salinity prediction from multiple machine learning methods

December 11, 2023

Statistical predictions that are based on multiple machine learning (MML) methods (from including differing training regimes) produce differing predictions. When the predictions are combined to a final estimate, then there are residuals of the predictions spread around the final estimate. It is common to assume normality or near-normality of the residuals (errors), but the assumption of normality for the error distribution of predictions from MMLs might be tenuous in some applied settings. For unbiased predictions, the distribution of the errors is assumed to have a mean of zero and variability of the errors provides for distribution spread. One simplifying assumption is that the errors have general symmetry about the prediction, and hence, skewness is equal to zero. The symmetry constraint narrows the candidate list of probability distribution forms potentially suitable for modeling error distributions and the peakedness of the distribution can be used as a fitting parameter. Three symmetrical distributions were selected because they can be fit to the L-kurtosis (peakedness) of a sample. The distributions are the normal-polynomial quantile mixture, 3-parameter Student t, and polynomial density-quantile function4 (PDQ4). Using a mean of zero, and 1 standard deviation unit for spread, L-skew of zero, and ranges of L-kurtosis, the shapes of these distributions are compared with common reference to the standard normal distribution. The lower and upper 90th-percentile prediction bounds, the 5th and 95th percentiles, are reference points of particular interest. The PDQ4 can attain the widest range of L-kurtosis values and hence is the most flexible of these distributions to empirical distributions of prediction errors, and the PDQ4 is a family whose form conveniently requires the minimum assumption of any other distributional properties. The PDQ4 might be especially attractive for honoring the properties of daily salinity predictions from MML methods documented in the workflows of the makESTUSAL software.

Publication Year 2023
Title Study of L-kurtosis and several distribution families for prediction of uncertainty distributions, An applied software technical note concerning L-kurtosis use in daily salinity prediction from multiple machine learning methods
DOI 10.5066/P9IJMLY4
Authors William H Asquith
Product Type Software Release
Record Source USGS Digital Object Identifier Catalog
USGS Organization Lower Mississippi-Gulf Water Science Center - Nashville, TN Office
Was this page helpful?