Last week, Microsoft announced Git Virtual File System (GVFS) for Large Git Repositories. GVTS is a solution for users who experience limitations in the Git client, which is unable to handle huge repositories. Since the launch, Microsoft has expanded on the details of GVTS, explaining what problems the solution solves.
In two blog posts, Microsoft’s Brian Harry discusses how GVFS is basically just Git. However, it solves several problems found in the client. Harry also explains how Microsoft has been scaling Git in recent years and how the client is merged with GVFS.
As we mentioned last week, the main problem GVFS solves is handling a large number of files. Git is unable to work well with hundreds of thousands of files within a working set. GVFS remedies this by optimizing operations so that commit is fast and push and pull work comfortably.
Git Virtual File System is able to work with bigger files and faster than Git because it only pulls in content when it is needed:
GVFS takes the “file system beneath your repo and makes it appear as though all the files in your repo are present, but in reality only downloads a file the first time it is opened. GVFS also actively manages how much of the repo Git has to consider in operations like checkout and status, since any file that has not been hydrated can be safely ignored.”
Expanding on this theme, Harry says GVFS is also adept at handling bigger files. Git struggled with big binary files and would slow down under the load. The new service only pulls files when they are needed, allowing for bigger files to be managed.
Similarly, GVFS solves an issue where Git would be stumped by the multiplication of lots of files. Lots of history and binary files would result in a .git directory that was too big and unmanageable. GVFS will only pull down files when they are needed, making management easier, smaller, and faster.
Harry also explains how the service helps manage large number so users in two ways:
Lots of branches – Users of Git create branches pretty prolifically. It’s not uncommon for an engineer to build up ~20 branches over time and multiply 20 by, say 5000 engineers and that’s 100,000 branches. Git just won’t be usable. To solve this, we built a feature we call “limited refs” into our Git service (Team Services and TFS) that will cause the service to pretend that only the branches “you care about” are projected to your Git client. You can favorite the branches you want and Git will be happy.
Lots of pushes – Lots of people means lots of code flowing into the server. Git has critical serialization points that will cause a queue to back up badly. Again, we did a bunch of work on our servers to handle the serialized index file updates in a way that causes very little contention.
GVFS is Git
Harry points out that Git Virtual File System is just Git. All the repos are the same as Git, while protocols are also the same. Indeed, Microsoft says if users are happy with their current Git repo performance then there is no need to adopt GVFS. However, if Git repos are too big or you need to manage bigger files, then GVFS is the solution.