Amazon recently introduced Batch Replication for S3, an option to replicate existing objects and synchronize buckets. The new feature is designed for use cases such as disaster recovery setup, reduce latency or transfer ownership of existing data.
While S3 Replication has been available since 2015, until now customers had to develop their own solutions for copying objects that were created before the replication rule was set up. Moreover, copying objects manually between buckets did not preserve metadata such as version ID or object creation time.
Source: https://aws.amazon.com/blogs/aws/new-replicate-existing-objects-with-amazon-s3-batch-replication/
Marcia Villalba, senior developer advocate at AWS, highlights the main use cases for the latest functionality:
Customers might want to copy their data to a new AWS Region for a disaster recovery setup. (...) Another reason to copy existing data comes from organizations that are expanding around the world. (...) One other common use case we see is customers going through mergers and acquisitions where they need to transfer ownership of existing data from one AWS account to another.
S3 Batch Replication can be used to replicate existing objects, replicating objects that were added to a bucket before any replication rule was configured, replicate objects that previously failed due to insufficient permissions, replicate objects that were already replicated to a different target bucket or replicate replicas of objects that were created from a replication rule.
Paul Meighan, senior manager at AWS, summarizes in a tweet:
Amazon S3 Batch Replication gives you an easy way to backfill a newly created bucket with existing objects, retry objects that were previously unable to replicate, migrate data across accounts, or add new buckets to your data lake.
The article "Replicating existing objects between S3 buckets" has been updated to reflect the latest feature, with Akhil Aendapally, senior solution architect at AWS, and Steven Dolan, enterprise support lead at AWS, recommending:
In order to monitor the replication status of your existing objects, configure Amazon S3 Inventory on the source bucket at least 48 hours prior to enabling the replication.
Coney Quinn, cloud economist at The Duckbill Group, warns in his newsletter:
This is a remarkably strong candidate for what could potentially be "the most expensive API call in all of AWS." Be careful with this one!
On top of the storage costs for the replicated data in the destination bucket, customers are charged replication fees, data transfer fees, batch operations, an optional manifest generation fee and Key Management Service (KMS) costs. S3 Batch Replication is available in all AWS regions.