Introduction
When working with Github for a relatively long time, you will encounter issues with repository size becoming too large. This tip will help you reduce the repository and improve overall experience for any contributors you might attract.
Background
We have been developing a project on Github for the past two years. There have been quite a few feature branches and many files have been added/removed during the development. All these files still sit in the Git history which is downloaded every time when the repository is cloned. At its peak, the repository reached a size of over 200MB, while useful/active code was only a few MBs in size. At that point, the size started to impact efficiency of developers working on the platform. We researched several options on how to reduce size. The main concern was keeping a history of still active files while also allowing our existing customers to be able to see and merge changes to their older versions.
Solution
We ended up using the tool from https://rtyley.github.io/bfg-repo-cleaner/. It allows for the quick removal of large files from the Git history while preserving the latest HEAD files. That means we didn’t have to worry about deleting something that was still used. We wanted to delete as many binary files and DLLs as possible. Most of all, libraries were moved to nuget packages and only the ones that didn’t have the latest nuget repositories remain. One of the requirements was to keep files and history related to 1.x version which at the moment was in the 1.x branch. We had to remove it to reduce the size of the repository since it contains a lot of files that are no longer used in the current 2.x. To solve the issue, we decided to move the whole branch to a new repository, that way a complete history for 1.x could be preserved (including large files). To move the repository, we used the following command:
git push https://github.com/VirtoCommerce/vc-community-1.x.git v1.x:master
The command will push the branch into the vc-community-1.x Github repository and rename to the “master”.
The rest of the commands are related to deleting large files from the existing repository:
git clone --mirror https://github.com/VirtoCommerce/vc-community.git .git
java -jar bfg.jar --delete-folders '{.nuget,Architecture,Projects,SDK,
Shared,Tests,Tools,packages,src}' --protect-blobs-from master,dev .git
java -jar bfg.jar --delete-files '*.{dll,exe,lib,pfx,nupkg,zip,eot,otf,
ttf,woff,bmp,gif,ico,jpg,jpeg,png,sql}' --protect-blobs-from master,dev .git
java -jar bfg.jar --strip-blobs-bigger-than 100K --protect-blobs-from master,dev .git
cd .git
git reflog expire --expire=now --all; git gc --prune=now --aggressive
git push
The above script will clone the clean repository to a local folder without any of the content files, remove large files and then push all the changes back to the repository.
Counting objects: 47842, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (12659/12659), done.
Writing objects: 100% (47842/47842), 13.03 MiB | 377.00 KiB/s, done.
Total 47842 (delta 34134), reused 47842 (delta 34134)
To https://github.com/VirtoCommerce/vc-community.git
+ cda685f...b8cf9ab dev -> dev (forced update)
+ f763931...5e12fb4 master -> master (forced update)
+ 80c1ae1...af480a0 community/dev -> community/dev (forced update)
+ 96c221b...b96e4b5 v1.10 -> v1.10 (forced update)
+ b835d58...0c9a746 v1.11 -> v1.11 (forced update)
+ 19f286e...f3c251a v1.12 -> v1.12 (forced update)
+ 192d38a...c646866 v1.13 -> v1.13 (forced update)
+ 30b0f11...368e912 v1.9 -> v1.9 (forced update)
+ 48aae27...a79c544 v1.9.732 -> v1.9.732 (forced update)
+ bec5eaa...211d6d6 v2.1 -> v2.1 (forced update)
+ 3da7501...c6f6cb2 v2.2 -> v2.2 (forced update)
+ 54ce0ac...75fc554 v2.3 -> v2.3 (forced update)
+ 788b7fd...0d53a8d v2.4 -> v2.4 (forced update)
! [remote rejected] refs/pull/1/head -> refs/pull/1/head (deny updating a hidden ref)
! [remote rejected] refs/pull/1/merge -> refs/pull/1/merge (deny updating a hidden ref)
! [remote rejected] refs/pull/2/head -> refs/pull/2/head (deny updating a hidden ref)
! [remote rejected] refs/pull/3/head -> refs/pull/3/head (deny updating a hidden ref)
! [remote rejected] refs/pull/37/head -> refs/pull/37/head (deny updating a hidden ref)
! [remote rejected] refs/pull/37/merge -> refs/pull/37/merge (deny updating a hidden ref)
! [remote rejected] refs/pull/4/head -> refs/pull/4/head (deny updating a hidden ref)
! [remote rejected] refs/pull/4/merge -> refs/pull/4/merge (deny updating a hidden ref)
error: failed to push some refs to 'https://github.com/VirtoCommerce/vc-community.git'
So, it failed to rewrite some hidden references and the size of the repository on GitHub is 94 MB, but if you make a common clone, it shrinks to 14 MB.
Conclusion
The process was fairly straightforward using tools provided by the Git community. One more tool I’d like to mention is Git Extensions – http://gitextensions.github.io/. It includes a large file plugin which will display all the large files in the repository by simply opening it. This is a very useful feature to figure out why your repository is bloated.
We are now very careful in adding binary files to repository as those are not handled that well on Github.