AWS announced the release of CloudTrail Lake, a fully-managed data lake for storing and analyzing CloudTrail logs. CloudTrail Lake can aggregate logs across regions and accounts. Once in the lake, the logs can be queried using SQL syntax.
CloudTrail Lake provides a singular location for aggregating logs from multiple regions and multiple accounts. Multiple lakes can be created to isolate logs by region if desired. Multi-account support is only possible for accounts under the same AWS Organization. This improves upon the query functionality present within CloudTrail event history, as event history can only pull from a single region within a single account. It also isn't possible to query multiple attributes within event history.
The logs are stored immutably with a default retention period of seven years. The retention period is adjustable anywhere between seven days and seven years. It is currently possible to collect data on both management events and data events. Management events include all control plane operations such as configuring security, registering devices, configuring rules for routing data, and setting up logging. Data events by default are not included in CloudTrails and cover data plane operations such as S3 API activity on buckets and objects, Lambda function execution activity, or DynamoDB object-level API activity.
Once stored within the lake, the logs can be queried using standard SQL syntax. As the data lake is immutable, only SELECT
queries are permitted. The service includes a number of sample queries which can be used as templates. For example, to show all the recorded API activity for a specific IAM key you could use the following sample query:
SELECT eventTime, eventName, userIdentity.principalId
FROM $EDS_ID
WHERE userIdentity.accessKeyId like 'AKIAXZUQIC6XEVCJJFM7'
Note that the event data store ID is used as the table name within the query. Another sample query will show any security group changes after a certain time:
SELECT eventname, useridentity.username, sourceIPAddress, eventtime, element_at(requestParameters, 'groupId') as SecurityGroup, element_at(requestParameters, 'ipPermissions') as ipPermissions
FROM $EDS_ID
WHERE (element_at(requestParameters, 'groupId') like '%sg-%')
and eventtime > '2017-11-01T00:00:00Z'
order by eventtime asc;
The query editor lists all the available event properties that can be queried. Within the console, queries can be saved for future access. A log of recent queries is also available for review.
The release also includes a number of CLI commands for creating, querying, and working with CloudTrail Lake. For example, the command aws cloudtrail list-event-data-stores
will show all event data stores within a given account. Queries can be started via the CLI as well using aws cloudtrail start-query
. The status of a query can be obtained via describe-query
, and if the run was successful, the results can be retrieved from get-query-results
.
CloudTrail Lake is available within most regions in AWS and can be enabled via the CloudTrail console, by SDK, or via the AWS CLI. More details on using CLoudTrail Lake can be found in the documentation.