AWS recently announces support for dead-letter queue redrive in SQS using the AWS SDK or the Command Line Interface. The new capability allows developers to move unconsumed messages out of an existing dead-letter queue and back to their source queue.
When errors occur, SQS moves the unconsumed message to a dead-letter queue (DLQ), allowing developers to inspect messages that are not consumed successfully and debug their application failures. Sébastien Stormacq, principal developer advocate at AWS, explains:
Each time a consumer application picks up a message for processing, the message receive count is incremented by 1. When ReceiveCount > maxReceiveCount, Amazon SQS moves the message to your designated DLQ for human analysis and debugging. You generally associate alarms with the DLQ to send notifications when such events happen.
Once the failed message has been debugged or the consumer application is available to consume it, the new redrive capability moves the messages back to the source queue, programmatically managing the lifecycle of the unconsumed messages at scale in distributed systems.
In the past, it was only possible to handle messages manually in the console, with Jeremy Daly, CEO and founder of Ampt, writing at the time:
It's not a feature, it's not an API, it's an "experience" only available in the AWS Console. Do I want it? Yes! Do I want to log in to the AWS Console to use it? Absolutely not.
To reprocess DLQ messages, developers can use the following tasks: StartMessageMoveTask, to start a new message movement task from the dead-letter queue, CancelMessageMoveTask, to cancel the message movement task, and ListMessageMoveTasks, to get the most recent message movement tasks (up to 10) for a specified source queue.
The feature has been well received by the community with Tiago Barbosa, head of cloud and platforms at MUSIC Tribe, commenting:
This is a nice improvement. One of the things I never liked about using DLQs was the need to build the mechanism to re-process the items that ended up there.
Benjamen Pyle, CTO at Curantis Solutions, wrote an article on how to redrive messages with Golang and Step Functions.
In the configuration of a DLQ, it is possible to specify if the messages should be sent back to their source queue or another queue, using an ARN for the custom destination option. Luc van Donkersgoed, lead engineer at PostNL and AWS Serverless Hero, tweets:
Just redrive to the original queue would have been nice. This is EXTRA nice because it allows us to specify any destination queue. That's a whole class of Lambda Functions... POOF, gone.
The documentation highlights a few limitations: SQS supports dead-letter queue redrive only for standard queues and does not support filtering and modifying messages while reprocessing them. Furthermore, a DLQ redrive task can run for a maximum of 36 hours, with a maximum of 100 active redrive tasks per account. Some developers question instead the lack of support in Step Functions.
SQS does not create a DLQ automatically, the queue must be created and configured before receiving unconsumed messages.