AWS recently introduced Incident Manager, a new capability of AWS Systems Manager that helps customers prepare and respond to application and infrastructure incidents.
Incident Manager manages automated response plans and responses using runbook actions, incident updates, and chat-based collaboration, while automatically notifying the assigned contacts. Julien Simon, global technical evangelist at AWS, explains the main use cases for the new service:
As pagers go wild, on-duty engineers scramble to restore service, and every second counts (...) You can’t afford to waste any time locating and accessing the appropriate runbooks and procedures (...) Serious issues often require escalations. Although it’s great to get help from team members, collaboration and a speedy resolution require efficient communication. Without it, uncoordinated efforts can lead to mishaps that confuse or worsen the situation. Last but not least, it’s equally important to document the incident and how you responded to it.
When a new incident is triggered, a dashboard is automatically created in the System Manager console that serves as the point of reference for all components involved in managing the escalation. The dashboard includes an overview on the incident, CloudWatch metrics and alarms, a timeline of all events added by Incident Manager and any custom event added manually by responders. The service can notify the responders through Slack and supports the usage of automated runbooks.
In a series of two articles, Harshitha Putta, senior cloud infrastructure architect at AWS, and Guyu Ye, cloud architect at AWS, show how customers can use Incident Manager to mitigate failures, creating escalation plans and integrating with Amazon CloudWatch. They explain that the new release is coming from an internal project at Amazon:
Customers often ask us how we manage incidents internally. To simplify incident response management, we have just released a new AWS Systems Manager capability, Incident Manager, that incorporates the best practices we follow for internal incident management at Amazon.
Source: https://aws.amazon.com/blogs/aws/resolve-it-incidents-faster-with-incident-manager-a-new-capability-of-aws-systems-manager/
A few users question the official name AWS Systems Manager Incident Manager. Gabriel Mangiurea, software development engineer at Adobe, tweets:
I'm quite certain there is an AWS Systems Manager Future-product Manager Naming Manager Wheel of fortune Manager that manages the management of the naming for future products in a manageable way, with all the Cloudwatch logs and stuff.
Others, as on a Reddit thread, are surprised by the lack of an on-call system:
I can’t seem to work out if it has an on-call functionality? I can see there’s escalation plans but not if you can rotate the contacts.
The new service is priced by number of response plans and SMS and voice messages, starting at $7 per month per response plan and including 100 SMS or voice messages.