Enzyme’s lack of support for React 18 made their existing unit tests unusable and jeopardized the foundational confidence they provided, Sergii Gorbachov said at QCon San Francisco. He showed how Slack migrated all Enzyme tests to React Testing Library (RTL) to ensure the continuity of their test coverage.
React Testing Library is the industry-accepted choice and there were no other officially supported alternatives available for React 18. This made the decision straightforward and aligned with best practices, Gorbachov said.
Enzyme’s approach emphasized testing internal component details, such as state, props, and implementation specifics. Transitioning to React Testing Library introduced a different methodology focused on simulating how users interact with components and their behavior in a real environment, Gorbachov said.
They started by using a Large Language Model (LLM) as a standalone tool, relying on it to generate fully converted and functional tests. While this approach achieved partial success, its effectiveness varied with the complexity of the tests, Gorbachov mentioned.
Next, they integrated the LLM into a larger pipeline, combining it with Abstract Syntax Tree (AST) transformations and incorporating steps for verification, linting, and iterative feedback to fix the first attempt, as Gorbachov explained:
This hybrid approach possessed the strengths of both methods and helped us achieve an 80% conversion success rate, significantly improving efficiency while maintaining test quality.
To assess the quality of the generated code they created a detailed quality rubric that evaluated codemod functionality (reliability and error handling), imports conversion (replacing Enzyme imports with RTL equivalents), rendering (converting Enzyme rendering to RTL’s), Enzyme methods (replacing Enzyme methods or adding annotations), assertions (updating to RTL-compatible ones), and JS/TS logic (preserving test functionality):
We selected three Enzyme test files from each difficulty level (easy, medium, and complex) and evaluated them manually to measure our progress. The difficulty was determined based on characteristics such as the number of test cases, imports, Enzyme methods, and assertions, with test cases weighted more heavily, averaging the complexity ratings to place the files into one of the difficulty levels.
They converted the files automatically and evaluated the results using the quality rubric, with human-converted files as the benchmark. Their tool achieved 80% accuracy on average with manual adjustments required for more complex cases, Gorbachov stated.
Combining deterministic methods with AI helped to overcome its limitations, such as the lack of real-time feedback and reliance on pre- and post-processing, Gorbachov said. By integrating AST-based conversions, annotations, and carefully crafted prompts, they improved the conversion success rate by 20-30% over the LLM’s out-of-the-box capabilities, modeling context after human workflows, he added.
While LLMs excel in handling high-complexity, unstructured tasks, it’s better to avoid them for deterministic tasks, as Gorbachov explained:
Our approach, which includes execution and verification of generated code, is transferable to projects like unit test generation, code modernization, and readability improvements, enabling a complete end-to-end flow.
To reuse the approach and build a similar pipeline for other types of projects, Gorbachov referred to the Enzyme to RTL codemod, the convert-test-files workflow for the full implementation, and their blog post on AI-Powered Conversion From Enzyme to React Testing Library at Slack.
InfoQ interviewed Sergii Gorbachov about their migration to React Testing Library.
InfoQ: How did you adapt your automated tests?
Sergii Gorbachov: For example, consider this code:
<div> <button onClick={toggle}>Toggle</button> <p>{isOn ? "Switch is ON" : "Switch is OFF"}</p> </div>
With Enzyme, we might test this behavior like this:
wrapper.setState({ isOn: false }); expect(wrapper.find('p').text()).toBe('Switch is ON');
This directly manipulates the component’s internal state to assert the result. In RTL, we simulate user interaction instead:
userEvent.click(screen.getByText('Toggle')); expect(screen.getByText("Switch is ON")).toBeInTheDocument();
This approach focuses on verifying the outcome of a user’s action and makes sure the component behaves as expected from their perspective, rather than its internal workings.
InfoQ: What benefits did you get from using abstract syntax tree codemod and large language models?
Gorbachov: Using Abstract Syntax Tree (AST) codemod and Large Language Models (LLM) significantly enhanced the test conversion process. The AST codemod our team created handled straightforward code conversions, as well as added annotations for complex scenarios where rules were too ambiguous to define programmatically.
For instance:
Will be transformed into:expect(component.find('div')).toHaveLength(2);
// Conversion suggestion: .find('div') --> Use component rendered DOM to get the appropriate selector and method: screen.getByRole('selector') or screen.getByTestId('<data-id=...>') expect(component.find('div')).toHaveLength(2);
This annotation was created specifically for the LLM and provided detailed and relevant guidance to help the model generate accurate conversion for this specific instance. Most such instances were annotated in the original files, which were used as the source code to convert. By including partially converted pseudocode with annotations along with tailored instructions in the LLM prompts, we reduced the ambiguity and minimized hallucinations often seen in AI approaches that rely solely on prompts.