Amazon has recently announced the general availability of the Amazon Redshift Data Sharing functionality to share live data across Amazon Redshift clusters. This allows the use of a single data warehouse cluster for multi-cluster deployments and sharing data instantly without the need to copy or move them.
Queries accessing shared data provide live access to the data but are run using the compute resources of the consumer cluster without impacting the performance of the producer cluster. While it is possible to create multiple datashares on the same Amazon Redshift database, Amazon Redshift associates each datashare with a single database: superusers and database owners can create datashares and only objects from the associated database can be added to a datashare.
Amazon identifies four major user cases for sharing, not all of them related to cost optimization: sharing data from a central cluster in a hub-spoke architecture, sharing data among multiple business groups where each cluster can be a producer of some data but also can be a consumer of other datasets, sharing data as a service across the organization and with external companies and finally sharing data between development, test, and production environments.
Raghu, founder of CloudStaq, tweets:
An important step for Redshift that enables easy onboarding of new use cases without the need for copying data from one cluster to another.
source: https://aws.amazon.com/blogs/big-data/announcing-amazon-redshift-data-sharing-preview
Debu Panda, product manager at AWS, adds:
With Data Sharing, you can isolate diverse workloads across different Amazon Redshift clusters while still sharing live, transactionally consistent data by leveraging Redshift Managed Storage across these clusters without the complexity and delays associated with data copies and data movement.
In a separate article, "Implementing multi-tenant patterns in Amazon Redshift using data sharing", AWS explains how the new data sharing feature of Amazon Redshift can be used to implement an Analytics as a Service solution with a multi-tenant architecture. The authors cover three different storage strategies: the pool model, where data is stored in a single database schema for all tenants, the bridge model, where the storage for each tenant is controlled at individual schema level and the silo model, where storage and access control to data for each tenant is maintained in separate databases.
Amazon announced as well the general availability of Amazon Redshift Cross-database queries, the ability to query across databases in a Redshift cluster. The new feature allows projects organizing data across multiple databases to support multi-tenant configurations and still query and join across different data sets.
While in 2020 Amazon introduced the preview of Amazon Redshift ML and other new features for data analysts, the growth of the cloud-based data storage and analytics service Snowflake, made some users on Reddit question the long term viability of Redshift. Corey Quinn, cloud economist at The Duckbill Group, writes in his weekly newsletter:
Along with cross-database queries, Amazon Redshift is boldly releasing new features / crying in the corner begging Snowflake to just take their lunch money and leave them alone already.
Both Redshift Data Sharing and Redshift Cross-database queries are available in all regions where RA3 node types are available.