Mark Christian and Johnny Rodgers recently discussed on the Slack Engineering blog how Slack successfully rebuilt the desktop version of Slack. The article quotes an incremental rewrite and release strategy as a key success factor.
Two decades ago, quoting Netscape's misadventures, Joel Spolsky, co-founder of Stack Overflow, posited in his landmark essay Things You Should Never Do that rewriting code from scratch is the single worst strategic mistake any software company can make. Spolsky mentioned two issues with rewrites. First, the existing codebase often embeds hard-earned knowledge about corner cases and weird bugs, knowledge which may be lost in the rewrite. Second, rewrites can be lengthy and divert resources which could be used for improving the existing codebase and products, and can result in the product being displaced by its competitors. Refactoring legacy code -- rather than rewriting it, seems, for some opponents of rewrites, a less risky option.
Conversely, there are arguments in favor of complete rewrites. The conventional wisdom reflected in "If it ain't broke, don't fix it" also means that, in case of software which can no longer be evolved at a satisfying cost, a rewrite may be justified. This may happen when maintaining and adding features to the legacy codebase is prohibitively expensive, or former technological choices cannot support the new use cases. Rewrite proponents may support taking advantage of the new effort to build a new application entirely. Alternatively, they may instead prefer replicating the existing application features without adding new ones, to lessen the project risk.
The Slack team credits the success of their rewrite of the Slack desktop application to having adopted a middle-ground, incremental, rewrite strategy. Code was not rewritten from scratch and released in one go. The legacy code and new code coexisted for the duration of the rewrite project, with the new code progressively replacing the old code.
The Slack strategy involved the definition of a target architecture, and interoperability rules which kept the old and new code separate, while allowing targeted reuse between old and new code:
(...) We introduced a few rules and functions in a concept called legacy-interop:
- old code cannot directly import new code: only new code that has been “exported” for use by the old code is available.
- new code cannot directly import old code: only old code that has been “adapted” for use by modern code is available.
The classic version of Slack (loading the new code and the old code) would coexist with the modern version (which only included the new code) till the moment when the modern version would reach feature-parity with the classic version:
(...) [That way] legacy code could access new code as it got modernized, and new code could access old code until it got modernized.
The incremental rewrite strategy was associated with an incremental release strategy. The article explains:
The first “modern” piece of the Slack app was our emoji picker, which we released more than two years ago — followed thereafter by the channel sidebar, message pane, and dozens of other features. (...) Releasing incrementally allowed us to (...) de-risk the release of the new client by minimizing how much completely new code was being used by our customers for the first time.
After using the modern-only version of the app internally for much of the last year, the Slack team is now ready to release the new modern desktop application to customers.
The old version of the desktop application featured a stack including jQuery, Signals, and direct DOM manipulation on Electron. It also exhibited a memory consumption increasing rapidly with the number of workspaces, and demonstrated lower performance for large workspaces due to eager data loading. The modern version of the application is built on React, and shows a near-constant memory consumption with respect to the number of workspaces. The new version additionally loads data lazily and presents a better performance profile.