ICP-MS measurements and R Code to determine provenance soil type from analyses of Pinus ponderosa ash collected in Arizona and Colorado
Needle samples from 140 Pinus ponderosa were analyzed for trace elements via inductively coupled plasma mass spectrometry (ICP-MS). Samples were collected from eight locations representing five distinct soil types across the Colorado Plateau near Flagstaff, Arizona and Boulder, Colorado. Data includes a full spectral scan between m/z 5-245 which was condensed to 72 dominant atomic masses. Instrument drift, matrix effects, and differing sample mass were corrected for using internal standard ion count intensities and individual sample weight. Further statistical analysis was performed using R executed within the RStudio environment. Classification was performed using three preprocessing techniques and five machine learning algorithms, including hierarchical modeling structures to optimize separation.
Data files provided here include the metadata, PineAsh_Metadata.xml, a Microsoft Office Excel file, PineAsh_Data.xlsx, with six spreadsheets containing the Introduction, and data tables T01-T05, and the individual data tables as comma-separated value .csv files. T01_Pinus_ponderosa_DataDiction.csv is the data dictionary containing entity and attribute metadata in table format for T02-05, T02_ICP_Mass_Spectrum.csv is the mass spectral scan results of 140 samples across m/z range 5-245, T03_ICP_Mass_Spec_Censored.csv is the mass spectral scan results of 140 samples from 72 dominant atomic masses, T04_IntStd_Corrected.csv is the mass spectral scan results of 140 samples from 72 dominant atomic masses normalized using internal standard ion count intensities and T05_Mass_Corrected.csv is the mass spectral scan results of 140 samples from 72 dominant atomic masses normalized using internal standard ion count intensities and sample mass.
R code includes three scripts: Preprocessing_ModelTraining.R which is the data preprocessing and model training script that evaluates multiple preprocessing strategies and classification models to determine the best combination. It applies transformations, partitions the dataset, and conducts cross-validation to assess model performance across different classification algorithms and structures. The required data file, ICP_8pineAsh_onlyAtoms.mat, is included. ModelSelection_VariableImportance.R is the best model selection and variable importance script that uses the best-performing model based on accuracy and Cohen’s kappa, it calculates variable importance using model-specific and model-independent metrics to determine the most influential features for classification. The required data file, Ash_rework.csv, is included. EnhancedVisualization.R is a visualization script that refines and enhances the visual presentation of results from the prior scripts, creating clearer and more aesthetically polished figures for better interpretation and presentation.
Photo of a smoke plume from a managed wildfire rising above Ponderosa pine trees in New Mexico can be found in the USGS Images Library at https://www.usgs.gov/media/images/a-convective-smoke-plume-a-managed-wi….
Citation Information
| Publication Year | 2025 |
|---|---|
| Title | ICP-MS measurements and R Code to determine provenance soil type from analyses of Pinus ponderosa ash collected in Arizona and Colorado |
| DOI | 10.5066/P14F74ZA |
| Authors | Michael Ketterer, Lauren T Reid, Fernanda D Cornelio, James A Jordan, Tyler B Coplen, Caelin P Celani, Helder V Carneiro, Karl S Booksh |
| Product Type | Data Release |
| Record Source | USGS Asset Identifier Service (AIS) |
| USGS Organization | Water Resources Mission Area - Headquarters |
| Rights | This work is marked with CC0 1.0 Universal |