At the AWS re:Invent 2016 conference, held in Las Vegas, USA, a distributed tracing service named AWS X-Ray was released in preview within all 12 public AWS Regions. In a similar fashion to Google’s Dapper, Twitter’s Zipkin and the OpenTracing API, AWS X-Ray helps developers analyse and debug distributed applications, such as those built using a microservices architectural style. A web-based UI is available that shows a topological ‘service map’, the distributed traces in graphical format, and a queryable list of all traces recorded.
As discussed by Jeff Barr on the AWS blog, the design and deployment of software applications over the past several years has trended towards creating complex distributed ‘service-based’ systems. Accordingly, the act of debugging software applications has changed, particularly when looking at patterns of behaviour at scale.
The combination of cloud computing, microservices, and asynchronous, notification-based architectures has brought forth systems that have hundreds or thousands of moving parts. The challenge of identifying and addressing performance issues in these complex systems has only grown, as has the difficulty of aggregating individual, service-level observations into meaningful top-level results.
The primary challenge for builders and operators of these types of cloud-based distributed systems has revolved around “following-the-thread” as execution traverses application services, containers, compute instances, database-as-a-service and messaging-as-a-service.
AWS X-Ray traces requests as they travel through an entire system deployed within an AWS environment. It aggregates the data generated by the individual services and resources that make up an application, providing an ‘end-to-end view of how the application is performing’. AWS X-Ray’s tracing feature allows the following of an arbitrary request path in order to pinpoint where within an application a performance issue is occurring. AWS X-Ray also provides annotations to allow metadata to be appended to traces, making it possible to tag and filter trace data.
According to the AWS blog post detailing the release, AWS X-Ray works with Amazon EC2, Amazon EC2 Container Service (Amazon ECS), AWS Elastic Beanstalk, and Amazon API Gateway. The AWS X-Ray SDK can be used with applications written in Java, Node.js, and .NET that are deployed on these services, and it is possible to trace requests made to applications that span multiple AWS accounts, AWS Regions, and Availability Zones. Support for AWS Lambda is coming soon.
AWS X-Ray implements ‘follow-the-thread’ tracing by adding an HTTP header (including a unique ID) to requests that do not already have one, and passing the header along to additional tiers of request handlers. The data collected at each point is called a segment (analogous to a Span within the OpenTracing API specification), and is stored as a chunk of JSON data. A segment represents a unit of work, and includes request and response timing, along with optional sub-segments that represent smaller work units. Adrian Cole, a core committer within the CNCF-backed OpenTracing initiative, noted on Twitter that the AWS X-Ray segment data format has “very little structure”.
The AWS X-Ray documentation states that a ‘statistically meaningful’ sample of the segments are routed to X-Ray. The AWS X-Ray SDK does not send trace data directly to the service, and instead the SDK sends the trace data to an AWS X-Ray daemon that must be running on the associated EC2 instances or within each ECS container. The daemon collects segments for multiple requests and uploads this within batches. The tracing data that is collected can be viewed within the AWS X-Ray web-based UI, or accessed via the AWS X-Ray APIs and AWS CLI.
More information about AWS X-Ray can be found on the AWS blog post titled “AWS X-Ray – See Inside of Your Distributed Application”, the AWS X-Ray product page, and the AWS X-Ray Documentation. A summary of other AWS re:Invent announcements and product releases can be found within the "AWS re:Invent Recap" InfoQ article.