Cucumber was created as a way to overcome ambiguous requirements and misunderstandings, targeting both non-technical and technical members of a project team, but if you think Cucumber is a testing tool you are wrong, Aslak Hellesøy, who created Cucumber in 2008, stated a few years ago. In an interview with InfoQ he describes his experiences using Behaviour-Driven Development (BDD) and Cucumber, and what he thinks about the future for the tool that’s now 10 years old.
InfoQ: Can you shortly describe Cucumber as of today, and its connection to Behaviour-Driven Development (BDD)?
Aslak Hellesøy: Cucumber is a tool that supports BDD, which is a variant of TDD (Test-Driven Development). With BDD, *all* the tests are customer acceptance tests, written in plain (human) language so that non-technical stakeholders can understand them. Cucumber combines requirements specifications, automated tests and living documentation into a single format called Gherkin, which is just plain English with a little more structure.
The benefits you can get from BDD (if you do it well) is less rework, fewer bugs and more maintainable code. In order to reap those benefits you have to invest some effort in exploring requirements and designing the software to be testable. Exploring requirements typically pays off quicker, while designing testable software tends to pay off in the long run. BDD helps teams discover mistakes quickly, which makes software development more enjoyable and sustainable.
InfoQ: Why did you create Cucumber 10 years ago?
Hellesøy: I had been a software consultant for 10 years before I created Cucumber. For the first five years of my career (1998-2003), all of the projects I was on had the same problems:
- I often didn't know whether I had fully understood the requirements
- I found it difficult to build the confidence that what we were building would work
- I was scared to change existing code, not knowing what would break
- We were constantly late
Five years before I created Cucumber (2003) I became involved in three overlapping communities that contributed a lot of new ideas to what's now known as BDD. During those five years I had the opportunity to work on some really productive teams where I didn't experience the problems from the beginning of my career. We delivered, we were confident, and we had fun.
The first community was the XP community, where TDD came from. Fit by Ward Cunningham and Fitnesse by Robert C. Martin were also a great inspiration to me.
The second community was within ThoughtWorks, where I used to work at the time. Dan North, Chris Matts, Liz Keogh, Joe Walnes, Paul Hammant and Chris Stevenson were exploring variants of TDD with more emphasis on language, trying to bridge the gap between business requirements and programming.
The third community was the Ruby community where David Chelimsky, Steven Baker and Dave Astels built RSpec, which was the first BDD tool that had significant traction.
Dan North and David Chelimksy implemented support for an early version of the Given-When-Then syntax in RSpec. I fell in love with the idea of executable specifications and wanted to take it further, but I felt constrained by the design decisions in RSpec itself, which has a more low-level focus.
I extracted the Given-When-Then parser from RSpec into a separate library and started making lots of improvements to make it more user friendly, such as good error messages, printing snippets, adding colours to the output, extending the grammar, making it faster etc.
My wife suggested I call it Cucumber (for no particular reason), so that's how it got its name. I also decided to give the Given-When-Then syntax a name, to separate it from the tool. That's why it's called Gherkin (a small, pickled Cucumber).
InfoQ: How has the adoption or usage of Cucumber in the community been during these years?
Hellesøy: Beyond all expectations! Adoption has grown steadily for 10 years, doubling the number of users every 18 months. The growth isn't yet showing signs of slowing down. In 2017 alone Cucumber was downloaded 20 million times (combining the Java, .NET, Ruby, JavaScript, PHP and Python implementations).
The first wave of growth came with Ruby on Rails around 2008-2011.
The second wave started in 2011 after Matt Wynne and I published The Cucumber Book, which has sold nearly 20k copies. Our book brought Cucumber and BDD to a new audience. Product owners and business analysts started to realise that this could be a way to reduce misunderstandings around requirements. Gojko Adzic's books "Bridging the Communication Gap" and "Specification by Example" also helped spread that mindset.
Our book also brought Cucumber to the tester communities, who saw it as a way to automate tests. Ironically, this also led to the misconception that BDD and Cucumber are for testing, and testing only. When Dan North came up with BDD, his goal was to make TDD more accessible, making it easier for developers to understand that TDD is a technique for designing and developing software, not testing it.
But if all you have is a hammer, everything looks like a nail I guess.
Today, Cucumber has become part of the standard toolset of many organisations. Some of them use it for BDD, but most of them use it just for "BDD testing", whatever that means. It's right there in the name — Behaviour-Driven *Development*.
InfoQ: How has Cucumber evolved during the 10 years since it was created?
Hellesøy: A couple of years after the initial Ruby version of Cucumber was released in 2008, more language implementations became available:
- Gaspar Nagy created SpecFlow for .NET
- Julien Biezemans created Cucumber.js for JavaScript and Node.js
- Konstantin Kudryashov created Behat for PHP
- I created Cucumber-JVM for Java and other JVM languages
Today there is a Cucumber implementation in every major programming language (and some more obscure ones). Most of these implementations are very similar to the original Ruby implementation, but since the teams that maintain them work fairly independently of each other, they have diverged a little bit. This has mostly been a good thing; there is more innovation that way. It's a great symbiotic relationship. We copy ideas from each other. Some of the most significant features that have been added over the years are:
- Cucumber Expressions — an alternative to Regular Expressions, which is more user friendly
- Tag expressions — a boolean query language for Gherkin tags, making it easier to select what scenarios to run
- Suites (Behat only for the time being) — easily run the same scenario in different configurations (through the UI or underneath it)
- IDE integration in IDEA, IntelliJ and Visual Studio
While divergence of the different implementations fosters innovation, it also poses some challenges. There is a lot of duplicate effort in maintaining each implementation and documentation. To address this, we're soon going to release a new website which will document all the official Cucumber implementations in one place. This will also encourage contributors to converge their implementations so they become more consistent. After 10 years we don't need the divergence and innovation we needed in the early days — what we need more now is consistency and a sustainable pace of maintenance.
Charlie Rudolph from the Cucumber.js team has recently written an experimental shared binary in Go that can potentially be used as a foundation by all Cucumber implementations. This has the potential to simplify all of the implementations, which means we can deliver a better product to our users.
Finally, we're planning to make some changes to the Gherkin language itself, to make it more consistent with how BDD as a practice has evolved over the years. We'll make sure we make it backwards compatible — I don't want to receive a million emails!
InfoQ: What is the relation between Cucumber and BDD?
Hellesøy: My own (and my Cucumber/BDD partners in crime) approach to BDD has changed quite a bit over the past 10 years, and this influences Cucumber. Sometimes we make a change to Cucumber, and that changes people's practices. Cucumber and BDD go hand in hand and I've got a few examples of this.
Konstantin from the Behat project introduced me to a new BDD technique a few years ago that he dubbed "Modelling by Example". The idea is simple — you start with a couple of scenarios that describe some intended behaviour. I had known for a while that declarative scenarios (without technical detail) are usually better than imperative one (lots of UI detail). What I hadn't considered until he showed me, was that this also lets you choose a different starting point for development.
Previously I would always start with the UI. Konstantin showed me that the scenarios could also be used to drive development of the domain logic — underneath the UI, underneath the protocol layer (such as HTTP). This allows you to run Cucumber and the system you're developing, in the same process. These tests are incredibly fast, typically milliseconds and have a profound effect on productivity. It allows you to iterate very quickly towards a stable domain model.
This is a much better time to start developing the UI (or the protocol layer, such as HTTP). Because the scenarios are free from implementation details, we can use the same scenarios (with different automation code) to drive the development of the UI or the protocol layer.
This workflow is a lot easier if you can just tell Cucumber to run each scenario twice — once under the UI (trading confidence for speed), and once through the UI (trading speed for confidence). Konstantin added a special feature called "suites" to Behat, and this is something I want to add to the other Cucumber implementations. So that's one way a change in BDD practice influences the tool.
Example Mapping is another recent BDD practice that might influence Cucumber. It's an analysis technique my colleague Matt Wynne came up with. Example Mapping is a simple, collaborative technique for breaking down user stories into examples which can later become scenarios. Today I don't even consider writing a Cucumber scenario without first doing Example mapping. It has fundamentally changed how our team (and the hundreds of teams we have trained) approach requirements analysis and test design.
Example mapping groups examples underneath rules. Because Gherkin doesn't have a construct for rules, this causes a bit of impedance mismatch when you try to "translate" an example map into Gherkin. So we're planning to add a "Rule" keyword to Gherkin and add "Example" as a synonym for "Scenario".
InfoQ: You are an expert on BDD and Cucumber; has your work with the tooling changed over the years?
Hellesøy: Absolutely. One of the things I obsess about is test speed, and there have been some great advances in this in the past couple of years.
Research by Jakob Nielsen and Mihaly Csikszentmihalyi suggests that waiting more than 10 seconds for feedback makes it impossible for most people to stay focused on the task at hand. UX designers know this — that's why you see progress bars and spinners, when things take a while. Programmers and people who make tools for programmers don't seem to have realised this yet.
The difference in productivity between a programmer who has to wait 1-5 seconds for test feedback and 30+ seconds is significant. At 1-5 seconds you can attain (and stay) in state of flow where you're hyper productive, for hours. That just doesn't happen if you're interrupted all the time. We've become so accustomed to slow feedback that we've invented and adopted practices to work around them rather than fixing them. The test pyramid is one such workaround. Conventional wisdom tells us that full-stack tests that go through the UI are slow and brittle. Therefore, we'll have fewer of the slow, flaky tests, and more of the fast, consistent ones.
A much better way to address this is to make slow tests fast and make flaky tests stable. What's not to like about confidence *and* speed?
Nat Pryce and Josh Chisholm have independently explored ways to make full stack tests run sub-second by removing all I/O and running everything in-process. This used to be something we only knew how to do with domain level tests. For example, if you're writing a Node.js app with a DOM UI (such as React) you can use Cucumber-Electron to run the client and the server in the same Node.js process, with no I/O at all. Dozens to hundreds of full-stack tests per second. Nobody cares about testing an ice cream cone if it's that fast.
New ways to reduce brittleness and improve maintainability have also become more popular. The Screenplay pattern is one way to do this, thanks to Antony Marcano, Andy Palmer, John Ferguson Smart and Jan Molak.
InfoQ: Is the community using tools other than Cucumber and BDD to accomplish the same goal?
Hellesøy: It depends what the goal is. Let's face it — most people who use Cucumber don't use it for BDD. They use it for testing, writing their tests afterwards. If that's your workflow, Cucumber isn't significantly better than many other testing tools. If no customers, domain experts or product owners are going to read the Gherkin documents, I would usually recommend using a typical xUnit tool instead (JUnit, RSpec, NUnit, Mocha etc).
I always use a mix: Cucumber if it's something I want feedback on from non-technical stakeholders, and xUnit if it's technical and not directly related to the business requirements.
Cucumber is good for validating assumptions, discovering ambiguities and building a shared understanding of what to build. There are several alternatives to Cucumber that have the same goal, but not nearly as many alternatives as you'll find for traditional testing tools. I'm not sure which ones are the most popular amongst them, but Concordion, Robot framwework, JBehave and Fitnesse seem to be the ones that are closest in spirit to Cucumber.
InfoQ: What's the future for Cucumber and for BDD?
Hellesøy: Cucumber has grown to a big ecosystem with many implementations. The number of users (hundreds of thousands to millions) compared with the relatively small number of regular contributors (about a dozen) means we need to keep reducing the workload or increase the team. A shared binary might reduce some of the workload, but it's too early to say.
Distributed and parallel execution is something people are asking for a lot. There are workarounds and 3rd party tools for it, but nothing native in Cucumber yet. We'd love to add that. Better reporting is also something we'd like to improve.
As for BDD a practice, I think the most important thing we can do as a community is to keep educating users about BDD. There are a few books, but I think we need more bite-sized education material that is a bit more accessible to people. Nowadays we're focusing a lot on teaching people Example Mapping — we've seen teams get lots of benefits from that, even without using Cucumber.
We also need to document patterns and techniques for fast tests, especially in distributed architectures such as microservices and serverless architecture. Patterns like ports and adapters and contract testing are not mainstream, and I think they can address many of the problems people are facing with BDD in those contexts.
InfoQ: You and your colleagues have managed to bootstrap a company around Cucumber; how do you look at the future from that perspective?
Hellesøy: We founded Cucumber Ltd in 2014, and since then our primary revenue stream has been training. During those years we've built a collaboration tool for Cucumber called Cucumber Pro. It's a tool that publishes Gherkin documents along with their results and allows people to collaborate on those documents. We threw the first one away and started over again when we realised our customers needed something different.
We decided early on to build a bootstrapped company, profitable from day one without external investment. Today we are 10 employees and we have about 10 associates who help us deliver the training around the world.
We don't have an office, so we meet daily using video conferencing. Those of us working on Cucumber Pro use remote mob programming while others are visiting clients delivering training. We meet once every two months to strategise and socialise.
No external investors give us more control over priorities, and makes it easier to build the company we want, but it also means we don't progress with our commercial products as fast as we could if we hired a dozen developers. It's hard, but fun!