The U.S. Geological Survey (USGS) Earth Resources Observation and Science (EROS) Center in Sioux Falls, SD has developed a cloud validation dataset from Collection 2 images throughout the history of Landsat. Two North American locations with high overlap between WRS-1 and WRS-2 were chosen. For each location, 20 images were selected at random from the Landsat archive, with at least one scene taken from each Landsat satellite between the years of 1972-2024. This provides a sampling of the 50-year history of Landsat data over the two chosen locations -- New Brunswick and Tuscon, AZ. It is intended that more locations will be added to this dataset in the future.
For each scene, a manual cloud validation mask was created. While these validation images were subjectively designed by a single analyst, they provide useful information for quantifying the accuracy of clouds flagged by various cloud masking algorithms. Each mask is provided in GeoTIFF format, and includes all bands from the original Landsat Level-1 Collection 2 data product (COG GeoTIFF), and its associated Level-1 metadata (MTL.txt file).
The interpretation for the pixel values in each cloud mask is as follows:
Value
Interpretation
0
Fill
128
Clear
192
Thin Cloud
255
Cloud
The methodology used to create these masks is the same as in previous USGS Landsat cloud truth masks (https://doi.org/10.5066/F7251GDH). Pixels are marked as Cloud if the pixel contains opaque and clearly identifiable clouds. Pixels are marked as Thin Cloud if they contain clouds that are transparent or if their classification as cloud is uncertain. Pixels that contain clouds with less than 50% opacity, or which do not contain clouds at all, are marked as Clear. In some masks the borders around clouds have been dilated to encompass the edges around irregular clouds.