InfoQ Homepage Failure Content on InfoQ
-
Netflix’ Principles of Chaos Engineering
Based on their experience with arbitrarily shutting down servers or simulating the shutdown of an entire data center in production, Netflix has proposed a number of principles of chaos engineering.
-
Getting Actions Done to Make Change Happen
Even with best intentions it can be challenging for people to follow up on actions that they agreed to do. They can start to doubt if they can do the actions and become afraid to fail. Several authors have recognized this and came up with suggestions for dealing with it and making change happen.
-
Building 'Failure as a Service' at Netflix without the Simian Army
At QCon New York 2015, Kolton Andrus discussed Netflix’s Failure Injection Testing (FIT) platform, which allows the injection and monitoring of arbitrary failure scenarios to a targeted group of customers using the Netflix production web services. FIT allows Netflix to maintain an ‘antifragile’ programming culture, which results in the creation of systems that are resilient to failure.
-
Uncertainty in Agile and the Discovery Mindset
InfoQ interviewed Andrea Provaglio about business models for execution, optimization and discovery, dealing with uncertainty and leveraging it to create business value, understanding both value and cost, growing a discovery mindset, and creating a culture where people have the courage to make mistakes and can learn from them.
-
Experiment using Behavior Driven Development
Behavior Driven Development (BDD) uses examples, preferably in conversations, to illustrate behavior. A lot of people focus on the tools if they are doing BDD but having the conversations is more important than writing down conversations and automating them said to Liz Keogh. An exploration of using BDD to do experiments to deal with complex problems and do discoveries.
-
Anti-patterns for Handling Failure
Oliver Hankeln shares the anti-patterns he found for handling failure in organizations: hiding mistakes, engaging in blame game, the arc of escalation and cowardice. He then suggests corrective actions for each of them.
-
Using Pairing for Experimenting in Presentations
In the closing keynote of the Agile Eastern Europe 2015 conference Yves Hanoulle did an experiment together with his son Joppe in pair presenting. InfoQ interviewed Joppe and Yves Hanoulle about doing experiments, checking the safety of the environment and ways to make it safer, learning from failure, and presenting in pairs at conferences.
-
Failure Injection Testing: Controlling Failure in Production
Netflix's Failure Injection Testing bridges the gap between isolated testing and unmitigated chaos testing by controlling the impact of the test. FIT establishes a context which other components of Netflix's production testing and infrastructure systems interpret and adjust the behavior of the system accordingly.
-
Mindfulness and Situational Awareness in Organizations
To thoroughly remove waste in a process you need flow to deliver just in time, and mindfulness and situational awareness in organizations to handle problems with processes and built in human intelligence. Organizations apply concepts from flow to develop what is needed and when it is needed and use pull to prevent inventories. What they also need is “Jidoka”: mindfulness and situational awareness.
-
How Netflix Handled the Reboot of 218 Cassandra Nodes
Amazon performed a major maintenance update at the end of September in order to patch a security vulnerability in a Xen hypervisor affecting about 10% of their global fleet of cloud servers. This update involved the rebooting of those servers, with consequences for AWS users and the services they provide, including one of their largest clients, Netflix.
-
Avoidance of Organizational Dysfunction Leads to Scrum Masters' Failure
Bob Marshall explains the reason of failing of scrum master in most of the organizations as the lack of awareness on the part of adopting scrum and scrum master’s responsibility to tackle organizational dysfunction.
-
Leslie Lamport on Distributed Systems and Precise Thinking
Leslie Lamport is the author of some of the most cited computer science papers and won a Turing Award in 2013 for his seminal work in distributed and concurrent systems. This is a summary of an interview that Lamport gave to Software Engineering Radio touching themes such as his early work in distributed systems and the importance of precise thinking in programming.
-
Fail Fast Means Learn Fast
Failing fast and often is one of the encouraged practices for agile teams. Sander Hoogendoorn, author of the This is Agile book discusses on his blog the importance of having a strategy that helps you on the decision of aborting a project by assuming its failure on an early stage.
-
Working with Investors as a Lean Startup
Entrepreneurs using lean startup can work with investors to raise capital for their business. Business plans from lean startups often differ from traditional startups and lean startup encourages learning from failure and to pivot, which might scare off investors. Can entrepreneurs and investors together use the lean startup approach to do fundraising?
-
Attitudes for Sustainable Lean Startup Teams
Ramli John gave an ignite talk about the minimum viable attitudes for lean startup teams at the 2013 lean startup conference. According to Ramli there are three attitudes that help teams to run lean sustainable over time: humbleness, hunger and happiness.