Jake Zimmerman, Technical Lead of Sorbet at Stripe, and Getty Ritter, Ruby Infrastructure Engineer at Stripe, presented Refactoring Stubborn, Legacy Codebases at the 2024 QCon San Francisco conference.
Zimmerman kicked off the presentation enumerating some common complaints about stubborn codebases according to a 2017 company survey. In order to refactor to a happy state, Zimmerman maintained that the best way to centralize a refactoring involved having one team drive it in such a way to concentrate on expertise, incentivize automation and a higher probability of finishing.
A centralized migration needs two things: leverage over the codebase such that a small problem-solving force can have a large effect on system behavior; and a method to "ratchet" incremental progress. Use of the term "ratchet" implies a mechanical gear that can only turn in only one direction. With this in mind, Zimmerman introduced Sorbet, a type checker for Ruby. He maintained:
To refactor a large, stubborn codebase, you need to have a point of leverage and to pick good ratchets.
Building Sorbet was key to address all of these complaints and introduced points of leverage to address them.
Use of ratcheting is achieved with the # typed
comments located at the top of each Ruby file. It acts like a ratchet because it's easy to "go up a little." Valid use of this construct is:
# typed: false
is used for syntax and constants# typed: true
is used for inference in methods# typed: strict
is used for every method that needs a signature
While this may seem easy enough, Zimmerman cautioned that the order in which true
and false
are used may have unintended consequences. He used the following example:
# typed: true
# ^ start with false
class KnownParent
def method_on_parent(x); end
end
class MyClass < UnknownParent
def example
self.method_on_parent()
end
end
Initially using # typed: true
, the UnknownParent
constant and the call to method_on_parent()
method in MyClass
were highlighted. This can be confusing since the method_on_parent()
method is defined in the KnownParent
class.
Upon switching over to # typed: false
, only the UnknownParent
constant is highlighted because it doesn't exist. The issue can now be easily resolved by changing the constant to KnownParent
.
Now, a return to # typed: true
, only the call to the method_on_parent()
method in MyClass
is highlighted to alert the developer that the method call requires a parameter.
As a result, there was an improvement in developer satisfaction because: a significant amount of the large, stubborn codebase was refactored; they had a point of leverage using Sorbet; and they selected good ratchets.
Ritter then discussed how to make a Ruby monolith more modular and why modularity is important. He used a simple logger application containing personally identifiable information as an example.
# a toy logger
class Logger
def log(message, **storytime)
payload = storytime.map do |k, v|
"#{k}=#{v.inspect}"
end.join(" ")
@output.puts("#{Time.now.to_i}: #{message} #{payload}")
end
end
# elsewhere
logger.log("Attempting operation", op: my_op, merchant: m)
# 1730756308: Attempting operation op=:update merchant=#<Merchant id=22 secret="hunter2">
The solution, however, can create tangled code despite writing well-intentioned code.
# ...
payload = storytime.map do |k, v|
if v.is_a?(Merchant) # if we're logging a merchant...
"#{k}=Merchant(id=#{v.id}, ...)" # redact most fields
else
"#{k}=#{v.inspect}" # other objects can be logged as-is
end
end.join(" ")
# ...
Ritter discussed two points of leverage: Packaging, which is inherent in Sorbet, but isn't enough to address modularity: and Layering, the "essential principle is that any element of a layer depends only on other elements in the same layer or on elements of the layer 'beneath' it. Communication upward must pass through some indirect mechanism" as defined by Eric Evans.
Ritter provided many code examples of layering, using ratchets that Zimmerman had earlier described, and the attributes of what makes a good ratchet.
How can this all fall apart? Walter J. Savitch, relaying a quote overheard at a computer science conference, stated:
In theory, there is no difference between theory and practice. In practice, there is.
Tools aren't always perfect at first, said Ritter, and recommended to not rush the launch of a project. One team can refactor a larger, stubborn codebase.