Improving Distributed System Data Integrity with Amazon S3 Conditional Writes

AWS recently announced support for conditional writing in Amazon S3, allowing users to check for the existence of an object before creating it. This feature helps prevent overwriting existing objects when uploading data, making it easier for applications to manage data.

Conditional writes simplify how distributed applications with multiple clients can update data in parallel across shared datasets. Each client can write objects conditionally, ensuring that it doesn't overwrite any objects already written by another client. This means there's no need to build client-side consensus mechanisms to coordinate updates or use additional API requests to check for the presence of an object before uploading data.

Instead, developers can offload such validations to S3, which improves performance and efficiency for large-scale analytics, distributed machine learning, and highly parallelized workloads. To use conditional writes, developers can add the HTTP if-none-match conditional header along with PutObject and CompleteMultipartUpload API requests.

A put-object using the AWS CLI to upload an object with a conditional write header using the if-none-match parameter could look like this:

aws s3api put-object --bucket amzn-s3-demo-bucket --key dir-1/my_images.tar.bz2 --body my_images.tar.bz2 --if-none-match "*"

On a Hacker News thread, someone asked if most current systems requiring a reliable managed service for distributed locking use DynamoDB. Are there any scenarios where S3 is preferable to DynamoDB for implementing such distributed locking? With another one answering:

Using only s3 would be more straightforward, with less setup, less code, and less expensive

The conditional write behavior is as follows, according to the company’s documentation:

When performing conditional writes in Amazon S3, if no object with the same key name exists in the bucket, the write operation succeeds with a 200 response.
If an existing object exists, the write operation fails with a 412 Precondition Failed response. When versioning is enabled, S3 checks for the presence of a current object version with the same name.
If no current object version exists or the current version is a delete marker, the write operation succeeds.
Multiple conditional writes for the same object name will result in the first write operation succeeding and subsequent writes failing with a 412 Precondition Failed response. Additionally, concurrent requests may result in a 409 Conflict response.
If a delete request to an object succeeds before a conditional write operation completes, the delete request takes precedence. After receiving a 409 error with PutObject and CompleteMultipartUpload, a retry may be needed.

412 Precondition Failed response (Source: Conditional Requests Documentation)

Paul Meighan, a product manager at AWS, stated in a LinkedIn post:

This is a big simplifier for distributed applications that have the potential for many concurrent writers and, in general, a win for data integrity.

Followed by a comment from Gregor Hohpe:

Now, that's what I call a distributed system "primitive": conditional write.

Currently, the conditional writes feature in Amazon S3 is available at no additional charge in all AWS regions, including the AWS GovCloud (US) Regions and the AWS China regions. In addition, samples are available in a GitHub repository.

About the Author

Steef-Jan Wiggers

Show moreShow less

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Write & Win: InfoQ Contest

About the Author

Steef-Jan Wiggers

Rate this Article

This content is in the Cloud topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

Write Your Way to a QCon or InfoQ Dev Summit!

The InfoQ Newsletter