BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News QCon SF 2024: Refactoring Large, Stubborn Codebases

QCon SF 2024: Refactoring Large, Stubborn Codebases

Jake Zimmerman, Technical Lead of Sorbet at Stripe, and Getty Ritter, Ruby Infrastructure Engineer at Stripe, presented Refactoring Stubborn, Legacy Codebases at the 2024 QCon San Francisco conference.

Zimmerman kicked off the presentation enumerating some common complaints about stubborn codebases according to a 2017 company survey. In order to refactor to a happy state, Zimmerman maintained that the best way to centralize a refactoring involved having one team drive it in such a way to concentrate on expertise, incentivize automation and a higher probability of finishing.

A centralized migration needs two things: leverage over the codebase such that a small problem-solving force can have a large effect on system behavior; and a method to "ratchet" incremental progress. Use of the term "ratchet" implies a mechanical gear that can only turn in only one direction. With this in mind, Zimmerman introduced Sorbet, a type checker for Ruby. He maintained:

To refactor a large, stubborn codebase, you need to have a point of leverage and to pick good ratchets.

Building Sorbet was key to address all of these complaints and introduced points of leverage to address them.

Use of ratcheting is achieved with the # typed comments located at the top of each Ruby file. It acts like a ratchet because it's easy to "go up a little." Valid use of this construct is:

  • # typed: false is used for syntax and constants
  • # typed: true is used for inference in methods
  • # typed: strict is used for every method that needs a signature

While this may seem easy enough, Zimmerman cautioned that the order in which true and false are used may have unintended consequences. He used the following example:

    
# typed: true
#         ^ start with false

class KnownParent
    def method_on_parent(x); end
end

class MyClass < UnknownParent
    def example
        self.method_on_parent()
    end
end
    

Initially using # typed: true, the UnknownParent constant and the call to method_on_parent() method in MyClass were highlighted. This can be confusing since the method_on_parent() method is defined in the KnownParent class.

Upon switching over to # typed: false, only the UnknownParent constant is highlighted because it doesn't exist. The issue can now be easily resolved by changing the constant to KnownParent.

Now, a return to # typed: true, only the call to the method_on_parent() method in MyClass is highlighted to alert the developer that the method call requires a parameter.

As a result, there was an improvement in developer satisfaction because: a significant amount of the large, stubborn codebase was refactored; they had a point of leverage using Sorbet; and they selected good ratchets.

Ritter then discussed how to make a Ruby monolith more modular and why modularity is important. He used a simple logger application containing personally identifiable information as an example.

    
# a toy logger
class Logger
    def log(message, **storytime)
        payload = storytime.map do |k, v|
            "#{k}=#{v.inspect}"
        end.join(" ")

        @output.puts("#{Time.now.to_i}: #{message} #{payload}")
    end
end

# elsewhere
logger.log("Attempting operation", op: my_op, merchant: m)
# 1730756308: Attempting operation op=:update merchant=#<Merchant id=22 secret="hunter2">
    

The solution, however, can create tangled code despite writing well-intentioned code.

    
# ...
payload = storytime.map do |k, v|
    if v.is_a?(Merchant) # if we're logging a merchant...
        "#{k}=Merchant(id=#{v.id}, ...)" # redact most fields
    else
        "#{k}=#{v.inspect}" # other objects can be logged as-is
    end
end.join(" ")
# ...
    

Ritter discussed two points of leverage: Packaging, which is inherent in Sorbet, but isn't enough to address modularity: and Layering, the "essential principle is that any element of a layer depends only on other elements in the same layer or on elements of the layer 'beneath' it. Communication upward must pass through some indirect mechanism" as defined by Eric Evans.

Ritter provided many code examples of layering, using ratchets that Zimmerman had earlier described, and the attributes of what makes a good ratchet.

How can this all fall apart? Walter J. Savitch, relaying a quote overheard at a computer science conference, stated:

In theory, there is no difference between theory and practice. In practice, there is.

Tools aren't always perfect at first, said Ritter, and recommended to not rush the launch of a project. One team can refactor a larger, stubborn codebase.

About the Author

Rate this Article

Adoption
Style

BT