InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Enter your e-mail address

Select your country

We protect your privacy.

InfoQ Homepage News Amazon S3 Introduces Metadata Feature for Improved Data Management and Querying in Preview

Cloud

Amazon S3 Introduces Metadata Feature for Improved Data Management and Querying in Preview

This item in japanese

Dec 10, 2024 2 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

Amazon Web Services (AWS) has unveiled Amazon S3 Metadata, a new feature designed to simplify data discovery and management for Amazon S3 users. Currently, in preview in the US East (Ohio and N. Virginia) and US West (Oregon) regions, S3 Metadata enables users to query and analyze their S3 data, leveraging real-time metadata updates and integration with AWS analytics services.

Amazon S3 Metadata automatically captures and organizes metadata for S3 objects, offering insights into system-defined properties—such as object size, storage class, encryption status—and user-defined tags. This capability allows businesses to curate, identify, and utilize their data more effectively for a wide array of applications, including:

Business Analytics
Real-time inference applications
AI model training

Metadata is updated within minutes of changes to S3 objects, ensuring near real-time accuracy. The data is stored in S3 Tables, a new bucket type: a table bucket, which stores tables as subresources.

S3 Metadata employs Apache Iceberg, allowing users to store metadata in fully managed Iceberg tables. This compatibility facilitates high-performance querying at scale using Iceberg-compatible tools such as Apache Spark, Amazon Athena, and Amazon QuickSight.

With Iceberg, each update generates a new row in the table, providing a historical record of object changes that can be easily retrieved and analyzed.

Amrutha Gujjar, CEO of Structured Labs, concluded in a blog post:

By embracing Iceberg, AWS aligns itself with the industry’s move toward open table formats. This not only ensures interoperability with tools like Apache Spark and Flink but also future-proofs investments in S3-based architectures.

S3 Metadata tables integrate seamlessly with AWS analytics tools, enabling robust data processing and visualization. Key integrations include:

AWS Glue Data Catalog (currently in preview)
Amazon Athena, Redshift, EMR, and QuickSight for streaming and querying metadata
Amazon Bedrock, which annotates AI-generated videos stored in S3 with metadata like origin, creation timestamp, and the model used.

The metadata schema includes over 20 elements, from bucket names and object keys to encryption details and user-defined tags. Users can enrich this data further by joining it with application-specific tables.

Enabling S3 Metadata involves three simple steps:

Create a Table Bucket: Use the create-table-bucket command, the AWS Management Console, or an API call to create a bucket for storing metadata.
Attach Metadata Configuration: Specify a configuration file to link your data bucket with the metadata table.
Run Queries: Use tools like Apache Spark or AWS analytics services to query the metadata, enabling insights into object storage, updates, and other critical details.

(Source: AWS News blog post)

An example query looks like this:

spark.sql("SELECT key, size, storage_class, encryption_status FROM mytablebucket.aws_s3_metadata.my_table ORDER BY last_modified_date DESC LIMIT 10").show(false)

Ian Mckay, cloud principal at Kablamo and AWS Community Hero, tweeted:

S3 buckets now support queryable metadata (Iceberg tables) functionality, allowing for a live queryable view of the creations, updates, and deletions of objects using tools like Athena. Check pricing before usage, as the cost increase is non-trivial.

Lastly, users can also configure and manage S3 Metadata through the Amazon S3 Console’s Metadata tab. Pricing is based on the number of updates (object creation, deletion, and metadata changes) and storage costs for the metadata table. Detailed pricing information is available on the S3 Pricing page.

About the Author

Steef-Jan Wiggers

Steef-Jan Wiggers is one of InfoQ's senior cloud editors and works as an Principal Consultant Cloud/DevOps at Team Rockstars IT in The Netherlands. His current technical expertise focuses on integration platform implementations, Azure DevOps, AI and Azure Platform Solution Architectures. Steef-Jan is a regular speaker at conferences and user groups and writes for InfoQ. Furthermore, Microsoft has recognized him as Microsoft Azure MVP for the past fifteen years.

Show moreShow less

This content is in the Cloud topic

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Amazon S3 Introduces Metadata Feature for Improved Data Management and Querying in Preview

Write for InfoQ

About the Author

Steef-Jan Wiggers

This content is in the Cloud topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter