InfoQ Homepage Operations management Content on InfoQ
-
Growing from the Few to the Many: Scaling the Operations Organization at Facebook
Pedro Canahuati describes how Facebook's operations maintains their infrastructure, including challenges faced and lessons learned: prioritizing calls, managing technical debt, incident management.
-
Evolution of the Netflix API
Ben Christensen describes Netflix API's evolution to a web service platform serving all devices and users, the challenges met in operations, deployment, performance, fault-tolerance, and innovation.
-
Asgard, the Grails App that Deploys Netflix to the Cloud
Joe Sondow presents how Netflix uses Asgard to deploy code updates and manage resources in the Amazon cloud.
-
Actionable Metrics - Enabling Decision-Making in Netflix’s Decentralized Environment
Roy Rapoport discusses how Netflix uses metrics to monitor and manage their operating environment along with some notes about their event management system.
-
Polyglot Parallelism: A Case Study in Using Erlang and Ruby at Rackspace
Phil Toland discusses using Erlang and Ruby providing backup for 20k network devices running in 8 datacenters across 3 continents for Rackspace’s operations.
-
"Big Data" and the Future of DevOps
Ram C Singh discusses using Big Data for infrastructure telemetry along with good practices and an autonomic engine to create an autonomic computing infrastructure that might prevent downtime.
-
Scaling Devops - Breaking Down the Barriers between Development and IT Operations
Jez Humble discusses how to deal with risk management, regulation compliance, ITIL, audit requirements in a large organization that intends to adopt devops.
-
Silos Are for Farmers: Production Deployments Using All Your Team
Julian Simpson thinks dev and ops should be one team, achieved through: collaboration, respecting everyone, having lunch together, co-location, discussing problems, joined retrospectives, etc.