Microsoft has completed the move of their Windows codebase from Source Depot to Git using GVFS.
According to Brian Harry, Corporate VP at Microsoft, the company decided to change their engineering system a couple of years ago. While some of the tools are common across the enterprise, others are peculiar to one or few teams, making collaboration more difficult and resulting in tool and process differences with all the negative impact associated to them. The engineering system for a company the size of Microsoft covers many domains – source control, build, release, testing, telemetry, static analysis, security, etc. – being too difficult to address in one step, so they decided to tackle work planning, source control, and build first.
One of the first steps was to standardize the entire company on Visual Studio Team Services, from which all the other tools would be made available. When it comes to source control, most of the company used TFS, except the very large teams – Windows and Office – which remained on Source Depot, a VCS introduced at Microsoft in early 2000s. Nobody wanted to embark into the task of moving those two teams to a different system, the costs being considered too high.
After debating several options - TFVC, Source Depot, Git, Mercurial – Microsoft decided to try and see if Git was a proper solution. While Microsoft had no problem moving small or medium repositories to Git, they encountered serious scaling issues with large ones:
There aren’t many companies with code bases the size of some of ours. Windows and Office, in particular (but there are others), are massive. Thousands of engineers, millions of files, thousands of build machines constantly building it, quite honestly, it’s mind boggling. To be clear, when I refer to Window in this post, I’m actually painting a very broad brush – it’s Windows for PC, Mobile, Server, HoloLens, Xbox, IOT, and more. And Git is a distributed version control system (DVCS). It copies the entire repo and all its history to your local machine. Doing that with Windows is laughable (and we got laughed at plenty). TFVC and Source Depot had both been carefully optimized for huge code bases and teams. Git had *never* been applied to a problem like this (or probably even within an order of magnitude of this) and many asserted it would *never* work.
In numbers, the Windows repo is about 300GB, with ~3.5M files, 4,000 engineers, and ~1,700 daily builds across 440 branches. When one wants to clone a Git repo, he has to download everything, making the operation slow for large projects. The solution to the problem was virtualization, introducing the Git Virtual File System (GVFS). With GVFS Git behaves like all resources are local, but actually the system monitors user’s actions and brings some of the resources over the network when required. Using a virtual file system driver involved few changes to git.exe. They did not want to alter Git too much:
For sure we didn’t want to fork Git – that would be a disaster. And we didn’t want to change it in a way that the community would never take our contributions back either. So we walked a fine line doing as much "under" Git with a virtual file system driver as we could.
Over a period of three months, Microsoft has moved the entire Windows team from Source Depot to Git hosted by VS Team Services. It is the largest Git repository on the planet, and it is a success, said Harry. They had some performance issues at first which were fixed with some tuning. Based on an internal survey of developers involved, over 70% of them are either "very satisfied" or "somewhat satisfied" with Git.
Microsoft has open sourced GVFS and invites other companies interested to use it and contribute to it, recommended it as a viable solution for very large Git repositories.