Introduction
Here at CodeProject Command and Control, we had been using SVN as our source code repository forever. While it has served us well, having centralized code repository had its issues.
Most notably, the repository is vulnerable to server errors and failures. Rebuilding the repository from backups and local copies is not an exercise for the faint of heart. Also, merging branches can result in interesting issues and conflicts that can take hours to resolve, if possible. Sometimes it's not possible and a manual merge by copying files is necessary and often results in missed files, change overwrites, and broken builds while trying to ensure everything is corrected.
Git, on the other hand, is a distributed source control system. This means that the 'central' copy is just another copy. It can be rebuilt from any clone, or multiple clones to get all changes. Furthermore, you don't need access to the 'central' repository to view the history. You have the whole history in your local clone.
We talked about moving to Git for some time, and with our release of Workspaces the time was right to make the move. The work by Kamil and his team has resulted in a platform on which we are confident enough to run our business, manage our code, and manage our Bug, Features, and Backlogs.
As the title suggests, the organization of our SVN repository is, unintentionally, structured in a non-standard way. This meant that the various guides, blogs, and documentation for this process were only starting points. In this article, I'll walk you through the process I used to get from SVN to a Workspace Git repository, indicating which steps I had to change or add due to our SVN repository structure.
Background
The tools I will be using for this exercise, assuming you have SVN already installed are:
- Git Gui and Git Bash that come with the GitExtensions. I choose this as most examples/guide use *NIX shell scripts.
- The git online documentation.
- Visual Studio 2013, because that is what we use here, and it has built in Git support.
The starting point for my journey was John Albin's blog Converting a Subversion repository to Git. This is a simple, step-by-step guide for performing the migration. Unfortunately, is assumes a Standard Layout of the SVN repository. A standard SVN repository has a root directory with Trunk, Branches, and Tags subdirectories, with one project per repository as shown below.
Unfortunately, we didn't set it up this way. What we ended up with was one repository containing several projects. This structure is shown below, with the two main 'repositories' that we wish to migrate highlighted.
Performing the Migration
Step 1: Creating a list of SVN Committers
When cloning the SVN repository to Git, Git requires a file which maps the SVN identifiers to Git format authors. Each line in this file will look like matthew = matthew <matthew@codeproject.com>
.
John Albin suggest running the script (line break added for readability)
svn log -q | awk -F '|' '/^r/ {sub("^ ", "", $2); sub(" $", "", $2);
print $2" = "$2" <"$2">"}' | sort -u > authors-transform.txt
Since it has been some time since I have used shell scripting, I did this a little differently. I used a windows console and ran the command svn log -q > svnlog.txt
in my temp directory. The svnlog.txt
file contains information in this format:
------------------------------------------------------------------------
r22779 | chris | 2014-01-18 21:52:05 -0400 (Sun, 18 Jan 2014)
------------------------------------------------------------------------
r22778 | matthew | 2014-01-16 17:41:09 -0400 (Fri, 16 Jan 2014)
------------------------------------------------------------------------
r22777 | chris | 2014-01-16 16:39:13 -0400 (Fri, 16 Jan 2014)
------------------------------------------------------------------------
r22776 | chris | 2014-01-16 15:42:53 -0400 (Fri, 16 Jan 2014)
------------------------------------------------------------------------
r22775 | chris | 2014-01-16 15:41:58 -0400 (Fri, 16 Jan 2014)
I then wrote a little C# console application to create the Git names file. it is executed by the command line SvnUsers2Git svnlog.txt names.txt
. The source is shown below. Note that the names are case sensitive, so you may end up with multiple entries for the same author only varying by case. This is what is required, so don't remove the apparently duplicate entries. Trust me, your migration will crash if you do.
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
namespace SvnUsers2Git
{
class Program
{
static HashSet<string> names = new HashSet();
static void Main(string[] args)
{
if (args.Length != 2)
{
Console.WriteLine("Usage: SvnUser2Git inputfile outputfile");
return;
}
ReadNames(args[0]);
WriteNames(args[1]);
}
private static void ReadNames(string filename)
{
try
{
using (StreamReader inputFile = File.OpenText(filename))
{
string line;
string[] parts;
while ((line = inputFile.ReadLine()) != null)
{
if (line.StartsWith("-"))
continue;
parts = line.Split("|".ToCharArray());
if (parts.Length == 3)
names.Add(parts[1].Trim());
}
}
}
catch (Exception ex)
{
throw new Exception("Error opening input file " + filename, ex);
}
}
private static void WriteNames(string filename)
{
try
{
using (FileStream outputFile = File.Open(filename, FileMode.Create))
{
using (StreamWriter writer = new StreamWriter(outputFile))
{
foreach (string name in names.OrderBy(n => n))
writer.WriteLine("{0} = {1} <{1}@codeproject.com>", name, name.ToLower());
}
}
}
catch(Exception ex)
{
throw new Exception("Error opening output file " + filename, ex);
}
}
}
}
Step 2: Clone the SVN repository
John Albin uses a bash command line of
git svn clone [SVN repo URL] --no-metadata -A authors-transform.txt --stdlayout ~/temp/newrepo
Because I wanted to be able to update the Git repository with new changes from SVN, I did not use the --no-metadata
option. Furthermore, due to our non-standard SVN layout I did not use the --stdlayout
option, rather I used the -T Trunk/Src
option to specify the Trunk directory in the SVN repository. I executed the following bash command in my temp directory, creating an Git repository d:\temp\SvnGitRepo
that can be updated from SVN.
git svn clone https://cp-######/svn/codeproject -T Trunk/Src/ --authors-file=names.txt /d/temp/SvnGitRepo >> repo.log
For our repository, this process takes well over a day to execute, hence the need to be able to refresh the repository. I then made a copy to d:\temp\SvnGitRepo.bak
, just in case I messed up in the next steps.
Step 3: Create the .gitignore file
Here I follow what John Albin recommended, with slight changes for my layout. I created a batch file MakeGitIgnore.bat
to run in a Git Bash window, containing
cd /d/temp/SvnGitRepo
git svn show-ignore > .gitignore
git add .gitignore
git commit -m 'Convert svn:ignore properties to .gitignore.'
cd /d/temp
Copy the .gitignore
file somewhere safe. If you need to use your backup copy of the repository, as I will, you won't have to re-run the batch file, and it can take a while.
Update When I attempted to perform the full migration, I found that this .gitignore
file did not work as expected. It was huge, and difficult to debug. Using a .gitignore
file from another VS created project as a template, I created a much smaller and easier to understand. You will likely not be trying to convert a solution with 100 projects, so you may have better luck.
Step 4: Create a local Git only repository
In order to create the Git repository containing only the project of interest, I will need to create clone a repository, prune it, and push it to Workspaces. The first step is to create a local Bare Git repository and push the Git-SVN repository to it. This gives me something I can manipulate that doesn't take a long time to recreate.
Before I do this, I need to make sure that my Git-SVN repository is up to date. It had been 5 days since I did the initial SVN clone, and a lot of bug fixes and new features had been checked in since then. Since I had made a mess of my repository determining how to do the migration, I first copied my backup into a new SvnGitRepo folder, having deleted the old one if necessary. The lasted changes are fetched by the bash commands
cd /d/temp/SvnGitRepo
git svn fetch
Before Fetch
After Fetch
Then, using the post-import guidance from the Pro Git book Git and other systems - migrating to git section on git website, executed the following commands. Note: I used the PDF file, which is slightly different than the online version.
$ cp -Rf .git/refs/remotes/tags/* .git/refs/tags/
$ rm -Rf .git/refs/remotes/tags
$ cp -Rf .git/refs/remotes/* .git/refs/heads/
$ rm -Rf .git/refs/remotes
This makes all the remote tags local (even though I didn't have any), makes the local branches local as well.
Next I will run a batch file InitBareAndClone.bat
. As the name suggests it will create a Bare repository, Push the SvnGitRepo into it, and clone it to a local repository d:\temp\theRepo
. The contents of the batch file is shown below.
git init --bare /d/temp/bareRepo.git
cd /d/temp/bareRepo.git
git symbolic-ref HEAD refs/heads/trunk
cd /d/temp/SvnGitRepo
git remote add bareRepo /d/temp/bareRepo.git
git config remote.bareRepo.push 'refs/remotes/*:refs/heads/*'
git push bareRepo --all
cd /d/temp/
git clone bareRepo.git theRepo
cd theRepo
git branch -m trunk master
cd theRepo
After a few minutes, this will complete and the repository contains the files for all the projects and is on the branch 'master', having renamed the 'trunk' branch to 'masters'.
Now we are ready to prune this repository to hold only the project we are interested in.
Step 5: Pruning the Repository
The repository we created in the previous step contains all the projects that were in our non-standard SVN repository. We want to create a repository that contains only the CodeProject-2.7
project.
A quick Google search lead me to a blog by Dalibor Nasevic titled Permanently remove files and folders from a git repository . While this did not do exactly what I wanted, it did point me to the Git command I needed git-filter-branch
in the online documentation.
This command as an option --subdirectory-filter
which removes all root directories other than the one specified, and then moves the contents of the directory to the root. Just what I need :). It also requires a temp directory, specified with the -d <directory>
option, as it needs to checkout the repository to do its work. It is recommended that the temp directory be on another drive for performance reasons. I ran the following command in my Bash window in the theRepo
repository.
git filter-branch --subdirectory-filter CodeProject-2.7 -d /c/TempGitRepo
After several minutes I now have a repository that contains only the project I wanted, along with all of its history.
Step 6: Adding the .gitignore file
This is were I am glad I saved a copy of the .gitignore
file as the contents of the root directory has been replaced with the contents of the selected project. My .gitignore
is gone. I copied my saved copy to the repository directory.
I then opened the solution in Visual Studio 2013 to Verify that the project loads. The solution successfully loaded. So I added the .gitignore
file to my Solution Items
folder for easy reference.
The next step is to check these changes into the local repository. To do this you need to be using the Git Source Code Provider
. This can be found in the
Tools->Options->Source Control->Plug-in Selection
options settings as shown below. You may have to re-open the solution after making the change. I normally don't, but did while I was writing this section.
Open the Team Explorer
form the View
/ menu.
Click on Changes
to get to the Git support for committing changes. If required, drag the .gitignore
file from the Untracked Files to the Included Changes section. Also, if required, right click on the CodeProject.suo
file and have it ignore *.suo files as they are user specific. My .gitignore
file has an entry for this, so it was not required.
Enter a commit message such as "Added .gitignore file", and click the Commit
button. Close the solution as we need to change the Git origin to Workspaces in the next step.
Step 7: Pushing to Workspaces
The first step is to get the URL for the Git Repository in Workspaces. We have an existing Workspace, and so will use the ::GitMachine instance that comes with it. You could also create a new Workspace or add a new instance of ::GitMachine to an existing Workspace. This needs to be an empty repository. You can get the URL for the repository from the ::GitMachine instance as shown below. More detailed instructions are also included on the page.
I created a batch file PushToWorkspaces.bat
to do the work. This file contains
cd /d/temp/theRepo
git remote remove origin
git remote add origin https://git.codeproject.com/[a-user]/[a-workspace]/cp-code
git push -u origin master
You will be asked for your email address and password. Replace URLwith the URLfrom your Git repository's URL, which you can get from your Workspaces Git Repository as shown below.
Once this is done, you can now clone the repository and start working. I has to make a few changes to my solution, but these were to fix a NuGet package restore issue. Otherwise, Everything built an run. It is likely that the version of the CodeProject website you are reading this on, was built from the Workspaces Git Repository.
Points of Interest
Visual Studio 2013 Update 2, with its improved built in Git support and Workspaces make a wonderful evironment to manage both your Code and your Tasks, Bugs, and Backlog.