BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Google Cloud Introduces HDD Tier for Spanner Database, Cutting Cold Storage Costs by 80%

Google Cloud Introduces HDD Tier for Spanner Database, Cutting Cold Storage Costs by 80%

Log in to listen to this article

Google has recently introduced tiered storage for Spanner, its distributed SQL database on Google Cloud. This tiered storage is based on a new HDD storage option that is 80% cheaper than the existing SSD option, allowing for cost optimization of older data while minimizing the overhead associated with traditional data migration.

While the default SSD tier is designed for data that requires high throughput and low latency, the new HDD tier is designed for larger datasets that are infrequently accessed or not latency-sensitive. Storage tiering is policy-driven, with asynchronous background processes automatically moving data from SSD to HDD as part of maintenance tasks and according to user-defined policies. Regardless of the storage tier, SQL queries can access data across both SSD and HDD tiers, with backup policies applied consistently across all data.

The team at Google highlights that for most database records, the operational value decreases over time while their role for reporting and compliance increases. This shift means that older "cold" data doesn't need the same high-performance access as current "hot" transactional data, encouraging companies to seek more cost-effective storage solutions for historical information.

Source: Google Cloud blog

Matthew Muckloo, software engineer at Google, and Piyush Mathur, group product manager at Google, write:

Moving to alternative types of storage typically requires complicated data pipelines and can impact the performance of the operational system. Manually separating data across storage solutions can result in inconsistent reads that require application-level reconciliation. Furthermore, the separation imposes significant limits on how applications can query across current and historical data for things like responding to regulators; it also increases governance touchpoints that need to be audited.

Storage tiering strategies can now be implemented at various Spanner levels (database, table, column, or secondary index), with the flexibility to move specific data to a slower but less expensive HDD storage. For example, rarely accessed data like JSON product attributes can be moved to HDD without restructuring tables, and it is possible to keep indexes on faster SSD while storing the actual data on HDD.

To enable tiered storage, a locality group that defines storage options [SSD (default)/HDD] must be created, with the option to define a ssd_to_hdd_spill_timespan to specify the time for which data should be retained on SSD before a compaction cycle moves it to HDD. For example:

CREATE LOCALITY GROUP recent_on_ssd OPTIONS (storage = 'ssd', ssd_to_hdd_spill_timespan = '15d');

creates an SSD to HDD spill policy. The minimum amount of time that data must be stored in SSD before it's moved is one hour.

While Google Spanner is not the only distributed cloud database offering tiered storage, Amazon DynamoDB hides the storage technology used, offering Standard and Standard-IA storage classes with different storage and retrieval fees.

Spanner’s tiered storage supports both GoogleSQL and PostgreSQL-dialects and is available in all Google Cloud regions where Spanner is available. The HDD usage can be monitored from System Insights.

About the Author

BT