BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles The Renaissance of Code Documentation: Introducing Code Walkthrough

The Renaissance of Code Documentation: Introducing Code Walkthrough

Key Takeaways

  • Developers deserve and need good documentation but in most cases, code documentation is lacking and out of date, so wary developers don’t trust it — or simply don’t create it in the first place.
  • The Continuous Documentation methodology is a useful paradigm that helps ensure that high-quality documentation is created, maintained, and readily available. It asserts that documentation should be treated like crucial parts of the development workflow — such as tests, or the code itself.
  • The two common types of documentation: Inline Documentation — usually code comments that explain a specific line or area in the code but are disconnected from the bigger picture;  High-level documentation — provides the big picture like code architecture, business logic, or major decisions, but is detached from specifics of the code.
  • What is often missing is a third type of documentation: Code Walkthroughs. These take the reader on a “walk” — visiting at least two stations in the code — describe flows and interactions, and often incorporate code snippets.
  • Walkthroughs are similar to getting familiarized with a codebase with the help of an experienced contributor who walks you through the code. They can show recurring patterns, interactions between parts of the code, or explain a process that involves code in multiple repositories.
     

We are entering a new era for code collaboration, and a substantial breakthrough is imminent. What is changing? And more importantly, why?

This is the second part of the Continuous Documentation Manifesto, in which we called for creating and maintaining code documentation in a way that incorporates it into the ongoing development workflow. This time we are shifting focus to explain the untapped potential of an often overlooked category of documentation - Code Walkthrough Documentation - which is made possible by practicing Continuous Documentation.

Let us start by agreeing on the foundations: developers and development teams deserve and need good documentation. In theory, the solution sounds easy. Let’s all write good documentation, and all will be better. Right?

Not so fast. Current documentation practices don’t necessarily serve developers. In most cases, code documentation is lacking and out of date, so wary developers don’t trust it. Or simply don’t create it in the first place. This can severely impact a team and a business. Transitioning from this experience of code documentation requires new methods and tools.

Recap on Continuous Documentation

The Continuous Documentation methodology asserts that you need to treat documentation as you treat other crucial parts of your development workflow - like tests, or the code itself. Continuous Documentation relies on three principles:

  • Code-coupled - Documentation that explicitly references parts of the code. 
  • Always up to date - Continuously verifying that the current state of the documentation matches the current state of the codebase, as the code evolves.
  • Created when best - Creating documentation when the knowledge is fresh.

Common Types of Documentation

In-line documentation (low-level tactical documentation like comments) and high-level documentation, which gives you the bigger picture, appear to be the most common types of documentation. However, on their own, they are insufficient as a means to introduce the code to someone who wants to understand how it works and possibly make changes to it.

Low-Level / In-line Documentation

This means tactical documentation is useful for explaining a specific line or area of the code, in an isolated form. Code comments are the common kind of this type of documentation. For example, if a line of code includes division by 24601, a comment explaining why this number is there is a form of in-line documentation explaining this line, specifically. It doesn’t provide an idea of how this line works in a broader flow, or what its role is as a part of a complex architecture. Another example is a comment next to a function’s declaration, explaining what it does and/or its parameters or return value. This comment explains the specific function, in isolation, not why it is used as a part of a certain flow, for example.

Some prefer not to write comments at all, but rather strive to make their code highly readable and clear, which we will argue is another form of in-line documentation. 

High-Level Documentation

If low-level documentation gives you the minute details, high-level documentation attempts to give you the big picture. This can include a layout of the code architecture, the business logic behind it, and reasons for major decisions taken about both. Sometimes it can describe various repositories or main modules within a specific repository. This kind of documentation can be very valuable when being introduced to a codebase for the first time. However, its value pretty much ends there. Since it rarely contains practical value for day-to-day tasks, engineers don’t often revisit it.

Why are these types of documentation insufficient? As inline comments describe the specific code area they are attached to without a broader scope, they are always limited. As for high-level documentation, they can indeed provide the big picture, but they lack the details that developers need for their work. For example, in the documentation about extending git’s source code, you can definitely describe something like the general process of creating a new git command in a high-level document. However, you won’t be able to do so effectively without getting into specific details and giving examples from the code itself. For instance, including the files’ paths, the function’s signature, and mentioning the `commands` array within `git.c`. Adding this to your documentation would mean it’s not high-level anymore.

Code Walkthrough Documentation

Code-Walkthrough Documentation takes the reader on a “walk” made up of at least two stations within the code. They describe flows and interactions and they may rely on incorporating code snippets or tokens to do so. In other words, they are code-coupled, in accordance with the principles of Continuous Documentation.

This kind of document provides an experience similar to getting familiarized with a codebase with the help of an experienced contributor to the codebase - when the latter walks you through the code. Just as the contributor would show you parts of the code and explain them, Walkthrough Documents do the same.

Consider the three examples listed below:

Recurring Code Patterns

Examples of recurring repository patterns could be adding a configuration value, inheriting from an interesting base class, creating a new command, or reading data from the database. Some of these patterns are really small and intuitive to understand, and consist of a function that is being called many times (calling the logging function is also a pattern). More interesting patterns that call for Walkthrough Documentation usually expand across multiple files. As these patterns occur many times in the code, they are crucial to understand, yet to really explain them well - the reader must get a grasp of the pattern as a whole, alongside the specific implementation details.

Consider, for example, the process of adding a new command to git’s CLI. That is, in case you are a contributor to git, and would like to create a new command - for example `git new-command`. Let us quickly see the process of adding such a command. We will not go into all the details as they are not crucial to make our point, neither is previous knowledge about C programming language.

To understand how a command is added, let us look at an existing instance of this pattern - and consider the `git add` command. The entry point for this command is located within: `builtin/add.c`, as every command has a file with a corresponding name under the `builtins` folder.

int cmd_add(int argc, const char **argv, const char *prefix)
    {

The actual implementation of this function does not matter to understand the general pattern. It is important to see the function’s signature - as all commands take the same arguments.

The command’s declaration needs to be included in `builtin.h`:

int cmd_add(int argc, const char **argv, const char *prefix);

The pattern doesn’t end here though. To make git “aware” of the `add` command, it needs to be registered by adding a `cmd_struct` to the `commands[]` array within the file `git.c`:

static struct cmd_struct commands[] = {
	{ "add", cmd_add, RUN_SETUP | NEED_WORK_TREE },
	…
	…
};

Also, in order to be able to build the project, we must add our command into `BUILTIN_OBJS` within the Makefile:

BUILTIN_OBJS += builtin/add.o

Note that the explanation here is highly coupled to the codebase, and highlights many specific details. The naming convention of using `cmd_` prefix, the folder hierarchy (specifically, the folder `builtin`), the signature of commands, filenames (`builtin.h`, `git.c`, ...), names of variables (`BULTIN_OBJS`) and many more. It also takes the reader through a pattern that spans across four different files, though logically it’s all related to adding a single CLI command.
Getting such a directed tour in a single document makes the process of understanding this pattern much easier:

Of course, this is a simple example, and as the patterns get more complicated and span across additional files - the impact of such Walkthrough Documentation becomes even bigger.

Interactions between Parts of the Code

Some code parts, even crucial parts, can be hard to understand since they entail various areas of the code that interact with one another, in a way that is not necessarily obvious. Imagine a very simple web app, where to fully understand the flow, one would need to go back and forth between various parts of the Frontend as well as the Backend. Especially for someone who is new to the project, a document explaining how the interaction between various code areas can drastically ease onboarding and clarify the constituents and interactions involved.

A process with multiple repositories or services

The case described in the previous section becomes even more complicated if the code is spanned across multiple repositories. Consider, for example, a system where the Frontend’s code is in one repository, and the Backend’s code is in another repository. The problem worsens as the number of repositories involved in the flow increases. With a document that contains code parts from all relevant repositories, understanding the flow is very straightforward.

So what do all of these have in common?

  1. They explain something about the code that includes more than one specific isolated area of it.
  2. They include the information needed to practically understand the code.
  3. The explanation they provide is likely to become outdated when the relevant code changes. 

The latter is why, without practicing Continuous Documentation, it’s rare to find documents like these, as developers prefer not to create them in the first place.

 However, creating Code Walkthrough Documentation is an act of kindness and developer love, to both your team and your future self. Without it, code exploration becomes a tedious process that may take months and drive the proliferation of knowledge silos in software organizations. 

Conclusions

So why is it so rare to find detailed documentation that describes processes, flows, and interactions? Because like with any long-term, committed relationship - if you don’t continue to invest in it, it will fall apart. The amount of time, attention, and work that is needed to keep this type of documentation up-to-date when the code it describes changes is far too much for most developers, who would rather be spending this time on creating new features and solving bugs…

Enter Continuous Documentation. Practicing this methodology can unlock the enormous value that Walkthrough Documentation can provide by creating it when the knowledge is fresh and keeping it up-to-date as part of the development workflow.

Omer Rosenbaum and Tom Ahi Dror are co-founders of Swimm, the company that syncs teams with code. 

About the Authors

Omer Rosenbaum is the CTO and co-founder of Swimm, the company making continuous documentation an integral part of the development lifecycle. Omer founded the Check Point Security Academy and was the Cyber Security Lead at ITC, an educational organization that trains talented professionals to develop careers in technology.

Tom Ahi Dror is the CBO and co-founder of Swimm, the company making continuous documentation an integral part of the development lifecycle.Tom is an accomplished leader with unique experience in technology, training, strategy and business development. A graduate of Talpiot, Israel’s most prestigious military academy, Tom went on to become commander of the program. Most recently, prior to co-founding Swimm, Tom was VP of Business Development of ITC (Israel Tech Challenge).

Rate this Article

Adoption
Style

BT