Stefan Thies, DevOps Evangelist at Sematext, in a recent post discusses ten important container monitoring metrics and their implications on operating Docker containers, specifically when running many containers per host. Combined in a single correlated view these metrics provide a starting point for monitoring Docker-based environments.
According to the author, the rationale for proposing a set of metrics for operating containers is based on container monitoring challenges, since containers behave differently than VMs: "Traditional monitoring solutions take metrics from each server and applications they run. These servers and applications running on them are typically very static, with very long uptimes."
Containers in contrast:
- can be short-lived and dynamically scheduled;
- are processes but have own environments, virtual networking, and different storage management;
- share resources of the underlying host, with possibly short-lived batch jobs and long-running processes scheduled on the same host.
The metrics Thies proposes can be grouped in container-based and host-based metrics:
Container-based metrics
Resource sharing requires imposing sensible container quotas which in turn require visibility into container resource usage. According to a survey, a Docker host typically runs five containers. Moreover, container orchestration solutions such as Docker Swarm, Kubernetes or Mesos allow for efficient scheduling of many containers on clusters of hosts and therefore container behaviour should be monitored and tuned accordingly.
- Container CPU – Throttled CPU Time: Observing the total time that a container’s CPU usage is throttled provides the information needed to correctly set CPU shares in Docker. The kernel will throttle container CPU time only when the host CPU is maxed out. "A spike of this metric is typically a good indication of one or more containers needing more CPU power than the host can provide."
- Container Memory Usage, Container Swap and Container Memory Fail Counter: Spikes in these metrics suggest that containers need more of memory than allocated. Thies suggests to use container memory limits to ensure that applications do not use too much memory, therefore affecting other containers on the same host. However, as Docker documentation for example adds: "Note that the memory control group adds a little overhead, because it does very fine-grained accounting of the memory usage on your host."
- Container Disk I/O: Multiple containers can use the same host resources concurrently. Monitoring helps assigning "higher throughput to critical applications like data stores or web servers, while throttling disk I/O for batch operations."
- Container Network Metrics: Monitoring the virtual network is important for interconnected containers such as containerised load balancers, Thies contends. Dropped packets should be tracked, but "in addition, network traffic might be a good indicator how much applications are used by clients and sometimes you might see high spikes, which could indicate denial of service attacks, load tests, or a failure in client apps."
Host-based metrics
Compared to containers, Docker hosts are long-living and therefore should be capacity managed and resource optimised, specifically when running multiple loads on a single host.
- Host CPU and Host Memory: Understanding CPU and memory utilisation of hosts and containers helps optimising the resource usage of Docker hosts, the author writes: "When the resource usage is optimised, a high CPU utilisation might actually be expected and even desired, and alerts might make sense only for when CPU utilisation drops (service outages) or increases for a longer period over some max limit (e.g. 85%)."
- Host Disk Space: Monitoring and alerting of disk space is essential, since containers, images, and host-mounted volumes all consume disk space, says Thies. It is a good practice to clean up the host disk space by regularly removing unused containers and images.
- Total Number of Containers Running on a host: In static use cases, knowing the current and historical number of containers helps during updates to check that everything is running as before deployment. Whereas in more dynamic scenarios, where cluster managers like Kubernetes are automatically scheduling different types of loads, that metric might change irregularly. Therefore, Thies suggests to use anomaly detection to alert on sudden changes.
The author recommends to use modern monitoring tools that can cope with the dynamic nature of containers. Sematext is providing a monitoring solution, that integrates monitoring, logging and events in a single view, enabling time correlation between these. The Docker Agent extends that solution for monitoring containers including features such as container auto-detection and collection of Docker events, logs and metrics.
Other options for monitoring Docker containers include cAdvisor, sysdig and DataDog, for example. A comparison of some of these tools was published in previous posts by Rancher and InfoWorld.