# Waveform Data and Metadata used to National Earthquake Information Center Deep-Learning Models

August 18, 2021

These data were used to train the Machine Learning models supporting the USGS software release &quot;NEIC Machine Learning Applications Software&quot; (https://doi.org/10.5066/P9ICQPUR), and its companion publication in Seismological Research Letters &quot;Leveraging Deep Learning in Global 24/7 Real-Time Earthquake Monitoring at the National Earthquake Information Center&quot; (https://doi.org/XXXXX). These data are formatted as python numpy arrays and readable by the python code used to generate deep-learning models that classify waveform phases, refine automatic pick timings, and estimate source distances.

The cataloged picks and associated metadata were obtained from the USGS PDE catalog (https://earthquake.usgs.gov/data/pde.php). Waveform segments were obtained from the IRIS Data Management Center using their readily available web-services (https://ds.iris.edu/ds/nodes/dmc/). IRIS Data Services are funded through the Seismological Facilities for the Advancement of Geoscience (SAGE) Award of the National Science Foundation under Cooperative Support Agreement EAR-1851048.

Each numpy array file is a tar zip file of a numpy array. Large files were split into separate files, where the extension '.parta[.]' is a alphabetical listing of the order of the split. In order to extract these files, it is necessary to run cat name.tar.gz.parta* name.tar.gz prior to uncompressing the file. To uncompress the .tar.gz files run tar xvzf name.tar.gz. Each numpy file contains a single array. The file names describe the data in the array. The files are named first with the phase type (P or S), then the data type, then if the data belongs to the training or testing dataset. The training and testing datasets are ordered so that the index of each array corresponds to the index of all the other P or S wave training / testing arrays. The data types include Dist (distance in degrees), Azi (back azimuth in degrees), EQID (PDE EQ ID), Mag (the preferred PDE earthquake magnitude), or WF (seismic waveform data). Waveform data is 60 seconds of 40 sample a second data centered on the arrival time, with rows corresponding to vertical component, north / 1 component, and east / 2 component.