Compilation of Images from Rivers Reaches across the United States (CIRRUS)
This data release provides a Compilation of Images from River Reaches across the United States (CIRRUS). These images were retrieved programmatically using the Google Maps Application Programming Interface (API). The data set consists of a total of 281,024 individual images from rivers throughout the contiguous U.S. and Alaska, with an emphasis on locations likely to have rapids. For the purposes this data release, rapids are defined as areas of a river where the water surface is wavy, irregular, or even broken (i.e., whitewater), presumably due to high flow velocities and turbulence, that stand out from adjacent areas of smoother flow and are thus visible in satellite and aerial imagery. The image compilation provided herein was created to support remote sensing and deep learning applications. For example, developing automated tools for recognizing images that contain rapids could help to inform planning of recreational activities, assessment of habitat conditions, and estimation of river discharge.
Please refer to the "Entity and attribute" and "Process step" sections of the metadata for further detail regarding these files and how they were produced, but the following is a brief summary of the contents of this data release.
The individual images themselves are stored in a *.jpg file format and provided in a series of uncompressed tar folders.
The file image_list.csv contains a listing of all the images in CIRRUS with fields for image file name, a root name for each site, the latitude and longitude of the image center, the zoom level, a time stamp for when the image was retrieved from the API, the two- and four-digit hydrologic unit codes (HUCs), the tar folder containing the image, and the predicted probability of the image containing rapids from the rapids classifier model trained on the baseline rapids class dataset (i.e., without masking or active learning).
The file rapids_split_regions.csv provides information on which HUC4 regions were assigned to the train, validation, or test subsets when developing and evaluating models for classifying rapids. The two fields in this file are the HUC4 code and rapid_split, which has values of "train", "test", or "val" (short for validation) that indicate which subset the images from that HUC4 were assigned to during the rapids classifier development and testing. This file consists of 227 rows. Of all 245 HUC4 codes, 3 were removed because they are entirely in Mexico (HUC4 1310, 1311, 1312), 9 were removed from HUC2 20 (Hawaii Region), 4 from HUC2 21 (Caribbean Region), and 2 from the HUC22 (South Pacific Region). No images in these removed regions are included in this data release.
The file river_mask_labels.csv provides metadata for the image-mask pairs used to train a segmentation model for isolating the river channel within an image. This file contains fields for image file name, a root name for each site, the latitude and longitude of the image center, the zoom level, a time stamp for when the image was retrieved from the API, a binary variable indicating that the image has a mask (1 in all cases for this file), the two- and four-digit hydrologic unit codes (HUCs), and a field named rapid_split that indicates whether the image was used for training, validation, or testing using the HUC4-based train-test-validation split described above because the same approach was used for developing the segmentation models as well.
The file river_mask_dataset.tar is an uncompressed tar file containing 885 image-mask pairs used to train the river segmentation model. The images are stored as 640x640 JPEGs, and the masks are stored as bit-compressed NumPy arrays (NPY). When extracted from the tar file, the masks can be loaded into Python using this code (replacing mask_path with the name of a specific mask): mask = np.load("mask_path.npy"); mask = np.unpackbits(mask, axis=-1).
The file rapids_labels.csv provides metadata for the images used to train a model for classifying the presence or absence of rapids within an image. This file contains fields for image file name, a root name for each site, the latitude and longitude of the image center, the zoom level, a time stamp for when the image was retrieved from the API, a binary variable indicating whether a water mask was available for the image, a binary variable indicating whether the image contains rapids, the two- and four-digit hydrologic unit codes (HUCs), a field named rapid_split that indicates whether the image was used for training, validation, or testing using the HUC4-based train-test-validation split described above, a binary field indicating whether the image was labeled through active learning, and a binary field indicating whether a mask had been applied to the image.
The file rapids_label_dataset.tar is an uncompressed tar file containing 4,975 images that have been labeled as rapids or non-rapids. The labels for this dataset are stored in rapids_labels.csv. This dataset includes 4,465 unmasked images. 510 of those images also appear as masked images, which are denoted by appending an "m" to the end of the file name. The masked images occur only in the train split (no validation or test images have masked counterparts).
The file known_rapids_locations.tar is an uncompressed tar file containing images from locations with known rapids based on the National Hydrography Dataset and an OpenStreetMap rapids layer.
The file File_Overview.csv contains a brief summary of the code developed to process the images included in this data release and to develop segmentation models for producing river masks and classification models for identifying rapids. The fields in this file include the folder name within the rapid-detection-collection-code-release.zip zip folder, code file name, and a brief description of the purpose of each code file.
The file rapid-detection-collection-code-release.zip is a zip folder that contains all the code files described in File_Overview.csv as well as a README.txt file that provides guidance on how to use the code.
Users are advised to thoroughly read the metadata file associated with this data release to understand the appropriate use and limitations of the data provided herein.
Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.
Although this information product, for the most part, is in the public domain, the data set also contains copyrighted materials as noted on the watermark for each image obtained using the Google Maps API. Permission to reproduce copyrighted items must be secured from the copyright owner. The following entities retain copyright on all images: "© 2025 Airbus, Maxar Technologies, USDA FPAC/GEO".
Citation Information
| Publication Year | 2025 |
|---|---|
| Title | Compilation of Images from Rivers Reaches across the United States (CIRRUS) |
| DOI | 10.5066/P13JWRXP |
| Authors | Carl J Legleiter, Kelvyn Bladen, Nick Brimhall |
| Product Type | Data Release |
| Record Source | USGS Asset Identifier Service (AIS) |
| USGS Organization | Water Resources Mission Area - Headquarters |
| Rights | This work is marked with CC0 1.0 Universal |