BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News How Airbnb Used LLMs to Accelerate Test Migration

How Airbnb Used LLMs to Accelerate Test Migration

Log in to listen to this article

Thanks to the right mix of workflow automation and large language models, Airbnb significantly accelerated the process of updating their codebase to adopt React Testing Library (RTL) and converted nearly 3.5K React test files originally using Enzyme.

When applying LLMs to code generation, success depends on several variables, including the prompt you use to describe the task and the context you provide, which may include source files, examples, validation errors, guidelines, and more.

One of the most significant lessons Airbnb engineers learned is that prompt engineering was less effective than retrying conversion multiple times until it worked, an approach Airbnb staff software engineer Charles Covey-Brandt described as "brute force".

To make this process more effective, Airbnb engineers broke down the migration process into a number of steps, starting with refactoring Enzyme to RTL, then fixing any errors found in Jest tests, running the linter, and finally, the TypeScript compiler.

This step-based approach provided a solid foundation for our automation pipeline. It enabled us to track progress, improve failure rates for specific steps, and rerun files or steps when needed.

Covey-Brandt says that another advantage of this approach was the possibility of running the migration concurrently for hundreds of files at a time.

At each step, a retry process guided the LLM towards getting the right results: if the validation for that step, e.g., running the linter, failed, the LLM was prompted with all found errors and asked to fix them in the source file. This loop was repeated until either no validation errors were found or a maximum number of repetitions was reached.

By the end of the migration, our prompts had expanded to anywhere between 40,000 to 100,000 tokens, pulling in as many as 50 related files, a whole host of manually written few-shot examples, as well as examples of existing, well-written, passing test files from within the same project.

The retry-loop-based approach proved effective with simple-to-medium complexity files and enabled migrating 75% of all test files in four hours with ten iterations maximum. However, this left a long tail of about 900 files.

For those files, a less automated approach was required. Airbnb engineers adopted a "sample, tune, sweep" strategy, analyzing each failed case, updating the prompt and scripts to address the failure's cause, and then rerunning the process for the affected file.

While less complex files required ten iterations maximum, long-tail files required between 50 and 100 retries, which made the whole process slower. After four days, however, the team had converted 97% of the files overall. The remaining files, less than 100, were fixed manually.

Overall, leveraging LLMs for test migration condensed an estimated 1.5-year engineering project into just six weeks while preserving the original test intent and code coverage.

About the Author

BT