Landsat Bulk Metadata Now Accessible in New Format
Landsat product metadata are now available in the Apache Parquet file format, in addition to existing CSV and XML files. Parquet files help users access metadata information more efficiently and provide faster, customized results.
Each day, the Landsat Bulk Metadata Service updates files that list all available Landsat Collection 2 Level-1, Level-2, and U.S. Analysis Ready Data products archived at the U.S. Geological Survey (USGS) Earth Resources Observation and Science (EROS) Center. These files contain millions of records spanning more than 53 years and can be extremely large, making them difficult to open and analyze with standard software.
The newly added Apache Parquet file format stores data in a format designed to make large datasets easier and faster to use. They offer several benefits:
- Smaller file sizes. A CSV file that is 3 GB may shrink to about 700 MB when saved as a Parquet file.
- Faster searches. You can pull only the information you need, such as cloud cover or image date, without opening the entire file.
- Better data analysis. Parquet files work well with programming languages such as Python and R, which helps speed up analysis.
- Compatible with big-data tools. Parquet files work with analytics and processing systems such as Tableau, Apache Spark, Hadoop, and cloud platforms.
- Flexible updates. New data can be added to Parquet files without breaking existing workflows.
Visit the Landsat Bulk Metadata Service to access the Landsat product metadata.
Please contact USGS EROS User Services with any questions about using Parquet files.