In a recent blog post, Amazon introduced a new market data publisher/subscriber service called AWS Data Exchange. This service is an add-on to the existing AWS Marketplace and contains more than 1000 licensable data products, from more than 80 data providers. These data feeds include both free and paid offerings that span industries such as financial services, health care, life sciences, geospatial, weather and mapping.
The marketplace service brings together data publishers and data subscribers. Amazon's role is to provide a scalable platform where data providers can bundle their data and have Amazon distribute it in a consistent manner across subscribers.
Factual, an organization that provides point of interest (POI) data sets to organizations like Uber, Facebook and Apple, is using AWS Data Exchange to service their customers. Rob Jonas, chief revenue officer at Factual, describes some of the reasons why they have chosen to publish their data sets on the AWS Data Exchange service:
Customers of AWS can now license Factual’s high-quality Places datasets, which include over 130 million places and points of interest across 52 countries, directly within AWS Data Exchange. They can easily search for and purchase datasets packaged by country, or they can request customized POI datasets to suit their business needs. Our team will provide the full-service pre- and post-purchase support we’re known for, ensuring businesses have what they need to power products, drive context and personalization, and build compelling customer experiences.
Amazon built this service to simplify access to data sets by leveraging the scale of the AWS cloud. Jeff Barr, chief evangelist for AWS, explains:
We live in a data-intensive, data-driven world! Organizations of all types collect, store, process, analyze data and use it to inform and improve their decision-making processes. The AWS Cloud is well-suited to all of these activities; it offers vast amounts of storage, access to any conceivable amount of compute power, and many different types of analytical tools.
Image source: (screenshot) https://www.youtube.com/watch?v=Lu9QVJ0Rml4
Organizations that are looking to subscribe to data sets can do so through an online product catalogue. Within the catalogue, people can filter by vendors (data publishers), pricing plans and use a search function.
Image source: https://aws.amazon.com/blogs/aws/aws-data-exchange-find-subscribe-to-and-use-data-products/
Once a user has chosen to subscribe to a data feed, they can choose the Amazon S3 bucket destination where they would like the data to be delivered to. After the data has been stored in an S3 bucket, it can be consumed by a variety of AWS services including AWS Lambda where data can be parsed and enriched. A Lambda function can then send the results to an Amazon DynamoDB table for long term storage. In addition, ETL processes, like AWS Glue, can be used with Amazon Athena where it can query an AWS Glue ETL job and load the results into an Amazon QuickSight dashboard.
From a data publisher perspective, organizations need to seek approval from Amazon to publish their data set. Once they have received approval, they can complete the AWS Data Exchange Data set creation wizard. Data can be either uploaded from an S3 location or manually. Publishers then need to choose how they will monetize their data set. Monetization options include providing private licensing plans for individual customers or for organizations that may already have existing agreements with customers, Amazon will let customers bring their own subscription offer.
Image source: https://aws.amazon.com/blogs/aws/aws-data-exchange-find-subscribe-to-and-use-data-products/
Amazon is prohibiting some data sets from participating in AWS Data Exchange, Barr explains:
Certain categories of data are not permitted on AWS Data Exchange. For example, your data products may not include information that can be used to identify any person, unless that information is already legally available to the public. See Publishing Guidelines for detailed guidelines on what categories of data are permitted.
Data providers must also have a valid legal entity domiciled in the United States or be in a member state of the European Union.