PagerDuty has released a number of new updates and enhancements to their incident response platform. This includes new integrations with Amazon DevOps Guru, AWS Control Tower, and Microsoft Teams. Other improvements include improvements to mapping failures back to changes, automatic triggers, and content-based alert grouping.
Amazon DevOps Guru provides a machine-learning service that detects when behaviours deviate from normal operating patterns. Once a deviation is detected, such as error rates, increased latency, or resource constraints, an alert is generated with details on the issue and potential remediations. It is now possible to integrate these alerts with PagerDuty via Simple Notification Service (SNS). Once the event is forwarded into PagerDuty, it can then be used to trigger the appropriate on-call personnel.
The integration with AWS Control Tower simplifies how organizations with multiple AWS accounts can respond to incidents. AWS Control Tower enables governance and compliance across multiple AWS accounts, providing organizations with a way to ensure consistency across their accounts. As with DevOps Guru, Control Tower uses SNS to send notifications. These can be integrated into PagerDuty to trigger incidents when noncompliant resources are detected within an account.
The new integration with Microsoft Teams allows for incorporating PagerDuty's incident management tooling into Teams. This includes sending incident notifications directly into specific team channels. It is also possible for incidents to be created directly from within team channels.
With this release there are new integrations to the change impact mapping feature. PagerDuty can integrate change events from CI/CD pipelines and code repositories to help identify which changes may have caused failures. The new integrations are with Ansible, Buildkite, GitLab, Jenkins, Rundeck, and ServiceNow.
Change events can represent any number of changes such as a deployment, build completion, or configuration updates. Alongside the change event integrations, it is also possible to send events via the Events API v2. For example, a new change event representing a build finishing could be sent to https://events.pagerduty.com/v2/change/enqueue
with the following payload data:
{
"routing_key": "samplekeyhere",
"payload": {
"summary": "Build Success: Increase snapshot create timeout to 30 seconds",
"timestamp": "2020-07-17T08:42:58.315+0000",
"source": "acme-build-pipeline-tool-default-i-9999",
"custom_details": {
"build_state": "passed",
"build_number": "2",
"run_time": "1236s"
}
},
"links": [
{
"href": "https://acme.pagerduty.dev/build/2",
"text": "View more details in Acme!"
}
]
}
With automated triggers responders can automate specific actions in response to incidents and alerts. This is done by defining a webhook using event rules and then sending a custom payload to the specified endpoint. This can be done via the UI creation flow or with the Rulesets API or the new Service Events Rules API. Rules can be scheduled to only be active during certain time blocks. This feature is a part of the Event Intelligence add-on package that is purchased separately from the core product.
Other changes include improvements to how alerts can be grouped. Alerts can now be grouped based on an exact match of a field or fields and leveraging logic such as "for all" and "for any". Alert grouping preview will show which alerts would have been grouped over the past 45 days to help fine tune the grouping rules. There is also a new audit trail reporting UI that provides audit records for configuration history changes over the past year.
More details on these and other changes can be found on the PagerDuty blog. A 14-day free trial is available for those interested in taking a closer look at these changes.