Cloud-native repositories for big scientific data

February 15, 2021

Scientific data have traditionally been distributed via downloads from data server to local computer. This way of working suffers from limitations as scientific datasets grow toward the petabyte scale. A “cloud-native data repository,” as defined in this article, offers several advantages over traditional data repositories—performance, reliability, cost-effectiveness, collaboration, reproducibility, creativity, downstream impacts, and access and inclusion. These objectives motivate a set of best practices for cloud-native data repositories: analysis-ready data, cloud-optimized (ARCO) formats, and loose coupling with data-proximate computing. The Pangeo Project has developed a prototype implementation of these principles by using open-source scientific Python tools. By providing an ARCO data catalog together with on-demand, scalable distributed computing, Pangeo enables users to process big data at rates exceeding 10 GB/s. Several challenges must be resolved in order to realize cloud computing’s full potential for scientific research, such as organizing funding, training users, and enforcing data privacy requirements.

Publication Year	2021
Title	Cloud-native repositories for big scientific data
DOI	10.1109/MCSE.2021.3059437
Authors	Ryan Abernathey, Tom Augspurger, Anderson Banihirwe, Charles C. Blackmon-Luca, Timothy Crone, Chelle Gentemann, Joseph Hamman, Naomi Henderson, Chiara Lepore, Theo McCaie, Niall Robinson, Richard P. Signell
Publication Type	Article
Publication Subtype	Journal Article
Series Title	Computing in Science and Engineering
Index ID	70220313
Record Source	USGS Publications Warehouse
USGS Organization	Woods Hole Coastal and Marine Science Center

Cloud-native repositories for big scientific data

Oceanographer

Oceanographer

Woods Hole Coastal and Marine Science Center

U.S. Geological Survey

U.S. Department of the Interior

Cloud-native repositories for big scientific data

Citation Information

Related Content

Richard P Signell, Ph.D. (Former Employee)

Oceanographer

Related Content

Richard P Signell, Ph.D. (Former Employee)

Oceanographer