Amazon Web Services Open Data (AWSOD) and Amazon Sustainability (AS) are working together to make sustainability datasets available on the AWS Simple Storage Service (S3), and they are removing the undifferentiated heavy lifting by pre-processing the datasets for optimal retrieval. Sustainable datasets are commonly from satellites, geological studies, weather radars, maps, agricultural studies, atmospheric studies, government, and many other sources.
On December 10, 2018, AWSOD and AS teams released the first group of datasets. These datasets add a new category of data to the existing AWS Open Data datasets. While these sustainability datasets have been previously publicly available, AWS is improving the ease of access to the datasets, for example, separating large archive files into smaller addressable chunks that can be retrieved independently. AWS uses Simple Storage Service (S3) for storage with buckets set to public accessibility. Simple Notification Services (SNS) are used to notify consumers of new data, and CloudFront is used in a few cases to make data available via application programming interfaces for faster retrieval.
To further stimulate usage of the new datasets, AWS is working with the Group on Earth Observations (GEO) to grant $1.5 million in cloud credits to gain insights about the planet.
AWS provides documentation for using the sustainability open datasets and tags for searching the datasets. The dataset "Africa Soil Information Service (AfSIS) Soil Chemistry" can be used as a starting point to learn about applying machine learning to open data with a walkthrough Jupyter notebook. Third-party contributors to the community of individuals using open datasets are publishing blogs with walkthroughs on how to use the public datasets. Walkthroughs include:
- Seeing Earth from Space - from Raw Satellite Data to Beautiful High-Resolution Images
- Exploring AWS Lambda with cloud-hosted Hubble public data
Additionally, AWS has customers that are successfully doing work in the cloud to support sustainability practices, including:
- Helping to End Future Famines with Machine Learning
- Estimating Hurricane Wind Speeds with Machine Learning
Sebastian Fritsch, who works on data analytics for agriculture, participated in a Q&A with AWS about the usage of satellite datasets and was asked, "Were there any highlights for you?" He answered, "Being able to scale up data products from a relatively small pilot region up to global availability just by changing a few lines of code is a highlight for us."
Before the release of the sustainability datasets, AWS Global Open Data Lead Jed Sundwall spoke about continuously learning to improve how AWS stages petabytes of open data. AWS is adding a variety of indexes to the open datasets to increase the ease of access including external indexes, file naming, and internal indexes. AWS staff are observing a community coming together, and they realize they can gauge the success of the datasets by the mechanisms the community builds to process those datasets. Lastly, AWS has a well-defined program for covering the cost and allowing new contributors to make their public datasets available through AWS.