AWS recently announced the general availability (GA) of Amazon DataZone. This data management service allows users to catalog, discover, share, and govern data stored across AWS, on-premises, and third-party sources.
The authors of a Big Data blog post write:
With Amazon DataZone, data users like data engineers, data scientists, and data analysts can share and access data across AWS accounts using a unified data portal, allowing them to discover, use, and collaborate on this data across their teams and organizations. Additionally, data owners and data stewards can make data discovery simpler by adding business context to data while balancing access governance to the data via pre-defined approval workflows in the user interface.
Overview of Amazon DataZone capabilities and concepts (Source: AWS News blog post)
The data management service was preannounced at re:Invent last year and publicly previewed in March this year. With the GA, the service offers improvements over the public preview. For instance, regarding the "Business Data Catalog" component, users can now attach multiple business glossary terms to assets and glossary terms to individual columns in the asset.
In addition, projects will serve as business use case-based logical containers. Through the governed data sharing, users can set up subscription terms to be attached to assets when published and automate subscription grant fulfillment for AWS-managed data lakes and Amazon Redshift with customizations using EventBridge events for other sources.
Other Cloud providers, like Microsoft and Google, offer similar services for data management that AWS provides with Amazon DataZone. Microsoft, for instance, offers Purview, a unified data governance solution to help manage and govern users' on-premises, multi-cloud, and software-as-a-service (SaaS) data. At the same time, Google provides Data Loss Prevention as part of the Sensitive Data Protection family of services. Alternatively, other solutions are available through Allation, Collibra, and OneTrust.
Peter Hanssens, an AWS Serverless Hero, tweeted:
Amazon DataZone is an interesting release... it's at a price point that will be seriously competitive compared to other players out there in the market but I have question marks around its usability (which I hope will become more data folk-friendly).
Yet, in an Amazon Science news item, the author writes:
Going forward, the team will continue to expand Amazon DataZone’s integration with third-party data tools and sources. In addition, the team will continue to focus on introducing additional simplification via automation that will make data more easily discoverable, make it more understandable, and facilitate the extraction of insights.
Currently, Amazon DataZone is available in eleven AWS regions: US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), Europe (Frankfurt), Europe (Ireland), Europe (Stockholm), and South America (São Paulo). Furthermore, pricing details are available on the pricing page, while more information and guidance on the service can be found through the user guide.