Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Migrating a Non-standard SVN Repository to Git using Workspaces::GitMachine

3 Jun 2014 0  
Migrating CodeProject's source code repositories from SVN to Workspaces::GitMachine. Despite having a non-standard layout for the SVN repository, migrating the code and history was a simple process

Introduction

Here at CodeProject Command and Control, we had been using SVN as our source code repository forever. While it has served us well, having centralized code repository had its issues.

Most notably, the repository is vulnerable to server errors and failures. Rebuilding the repository from backups and local copies is not an exercise for the faint of heart. Also, merging branches can result in interesting issues and conflicts that can take hours to resolve, if possible. Sometimes it's not possible and a manual merge by copying files is necessary and often results in missed files, change overwrites, and broken builds while trying to ensure everything is corrected.

Git, on the other hand, is a distributed source control system. This means that the 'central' copy is just another copy. It can be rebuilt from any clone, or multiple clones to get all changes. Furthermore, you don't need access to the 'central' repository to view the history. You have the whole history in your local clone.

We talked about moving to Git for some time, and with our release of Workspaces the time was right to make the move. The work by Kamil and his team has resulted in a platform on which we are confident enough to run our business, manage our code, and manage our Bug, Features, and Backlogs.

As the title suggests, the organization of our SVN repository is, unintentionally, structured in a non-standard way. This meant that the various guides, blogs, and documentation for this process were only starting points. In this article, I'll walk you through the process I used to get from SVN to a Workspace Git repository, indicating which steps I had to change or add due to our SVN repository structure.

Background

The tools I will be using for this exercise, assuming you have SVN already installed are:

  • Git Gui and Git Bash that come with the GitExtensions. I choose this as most examples/guide use *NIX shell scripts.
  • The git online documentation.
  • Visual Studio 2013, because that is what we use here, and it has built in Git support.

The starting point for my journey was John Albin's blog Converting a Subversion repository to Git. This is a simple, step-by-step guide for performing the migration. Unfortunately, is assumes a Standard Layout of the SVN repository. A standard SVN repository has a root directory with Trunk, Branches, and Tags subdirectories, with one project per repository as shown below.

Unfortunately, we didn't set it up this way. What we ended up with was one repository containing several projects. This structure is shown below, with the two main 'repositories' that we wish to migrate highlighted.

 

Performing the Migration

Step 1: Creating a list of SVN Committers

When cloning the SVN repository to Git, Git requires a file which maps the SVN identifiers to Git format authors. Each line in this file will look like matthew = matthew <matthew@codeproject.com>.

John Albin suggest running the script (line break added for readability)
 

svn log -q | awk -F '|' '/^r/ {sub("^ ", "", $2); sub(" $", "", $2); 
print $2" = "$2" <"$2">"}' | sort -u > authors-transform.txt
        

Since it has been some time since I have used shell scripting, I did this a little differently. I used a windows console and ran the command svn log -q > svnlog.txt in my temp directory. The svnlog.txt file contains information in this format:

------------------------------------------------------------------------
r22779 | chris | 2014-01-18 21:52:05 -0400 (Sun, 18 Jan 2014)
------------------------------------------------------------------------
r22778 | matthew | 2014-01-16 17:41:09 -0400 (Fri, 16 Jan 2014)
------------------------------------------------------------------------
r22777 | chris | 2014-01-16 16:39:13 -0400 (Fri, 16 Jan 2014)
------------------------------------------------------------------------
r22776 | chris | 2014-01-16 15:42:53 -0400 (Fri, 16 Jan 2014)
------------------------------------------------------------------------
r22775 | chris | 2014-01-16 15:41:58 -0400 (Fri, 16 Jan 2014)      

I then wrote a little C# console application to create the Git names file. it is executed by the command line SvnUsers2Git svnlog.txt names.txt. The source is shown below. Note that the names are case sensitive, so you may end up with multiple entries for the same author only varying by case. This is what is required, so don't remove the apparently duplicate entries. Trust me, your migration will crash if you do.

    using System;
    using System.Collections.Generic;
    using System.IO;
    using System.Linq;

    namespace SvnUsers2Git
    {
        // This file takes a svn log -q file and converts it to for a user file for git migration.
        class Program
        {
            static HashSet<string> names = new HashSet();

            static void Main(string[] args)
            {
                if (args.Length != 2)
                {
                    Console.WriteLine("Usage: SvnUser2Git inputfile outputfile");
                    return; 
                }

                ReadNames(args[0]);
                WriteNames(args[1]);
            }

            private static void ReadNames(string filename)
            {
               try
               {
                   using (StreamReader inputFile = File.OpenText(filename))
                   {
                       string line;
                       string[] parts;
                       while ((line = inputFile.ReadLine()) != null)
                       {
                           if (line.StartsWith("-"))
                               continue;
                           parts = line.Split("|".ToCharArray());
                           if (parts.Length == 3)
                               names.Add(parts[1].Trim());
                       }
                   }
               }
               catch (Exception ex)
               {
                   throw new Exception("Error opening input file " + filename, ex);
               }
           }

           private static void WriteNames(string filename)
           {
               try
               {
                   using (FileStream outputFile = File.Open(filename, FileMode.Create))
                   {
                       using (StreamWriter writer = new StreamWriter(outputFile))
                        {
                            foreach (string name in names.OrderBy(n => n))
                                writer.WriteLine("{0} = {1} <{1}@codeproject.com>", name, name.ToLower());
                        }
                   }
               }
               catch(Exception ex)
               {
                   throw new Exception("Error opening output file " + filename, ex);
               }
           }
        }
    }

Step 2: Clone the SVN repository

John Albin uses a bash command line of

git svn clone [SVN repo URL] --no-metadata -A authors-transform.txt --stdlayout ~/temp/newrepo

Because I wanted to be able to update the Git repository with new changes from SVN, I did not use the --no-metadata option. Furthermore, due to our non-standard SVN layout I did not use the --stdlayout option, rather I used the -T Trunk/Src option to specify the Trunk directory in the SVN repository. I executed the following bash command in my temp directory, creating an Git repository d:\temp\SvnGitRepo that can be updated from SVN.

git svn clone https://cp-######/svn/codeproject -T Trunk/Src/ --authors-file=names.txt  /d/temp/SvnGitRepo >> repo.log

For our repository, this process takes well over a day to execute, hence the need to be able to refresh the repository. I then made a copy to d:\temp\SvnGitRepo.bak, just in case I messed up in the next steps.

Step 3: Create the .gitignore file

Here I follow what John Albin recommended, with slight changes for my layout. I created a batch file MakeGitIgnore.bat to run in a Git Bash window, containing

cd /d/temp/SvnGitRepo
git svn show-ignore > .gitignore
git add .gitignore
git commit -m 'Convert svn:ignore properties to .gitignore.'
cd /d/temp

Copy the .gitignore file somewhere safe. If you need to use your backup copy of the repository, as I will, you won't have to re-run the batch file, and it can take a while.

Update When I attempted to perform the full migration, I found that this .gitignore file did not work as expected. It was huge, and difficult to debug. Using a .gitignore file from another VS created project as a template, I created a much smaller and easier to understand. You will likely not be trying to convert a solution with 100 projects, so you may have better luck.

Step 4: Create a local Git only repository

In order to create the Git repository containing only the project of interest, I will need to create clone a repository, prune it, and push it to Workspaces. The first step is to create a local Bare Git repository and push the Git-SVN repository to it. This gives me something I can manipulate that doesn't take a long time to recreate.

Before I do this, I need to make sure that my Git-SVN repository is up to date. It had been 5 days since I did the initial SVN clone, and a lot of bug fixes and new features had been checked in since then. Since I had made a mess of my repository determining how to do the migration, I first copied my backup into a new SvnGitRepo folder, having deleted the old one if necessary. The lasted changes are fetched by the bash commands

cd /d/temp/SvnGitRepo
git svn fetch   

Before Fetch
 

After Fetch
 

Then, using the post-import guidance from the Pro Git book Git and other systems - migrating to git section on git website, executed the following commands. Note: I used the PDF file, which is slightly different than the online version.

$ cp -Rf .git/refs/remotes/tags/* .git/refs/tags/
$ rm -Rf .git/refs/remotes/tags
$ cp -Rf .git/refs/remotes/* .git/refs/heads/
$ rm -Rf .git/refs/remotes

This makes all the remote tags local (even though I didn't have any), makes the local branches local as well.

Next I will run a batch file InitBareAndClone.bat. As the name suggests it will create a Bare repository, Push the SvnGitRepo into it, and clone it to a local repository d:\temp\theRepo. The contents of the batch file is shown below.

git init --bare /d/temp/bareRepo.git
cd /d/temp/bareRepo.git
git symbolic-ref HEAD refs/heads/trunk
cd /d/temp/SvnGitRepo
git remote add bareRepo /d/temp/bareRepo.git
git config remote.bareRepo.push 'refs/remotes/*:refs/heads/*'
git push bareRepo --all
cd /d/temp/
git clone bareRepo.git theRepo
cd theRepo
git branch -m trunk master
cd theRepo

After a few minutes, this will complete and the repository contains the files for all the projects and is on the branch 'master', having renamed the 'trunk' branch to 'masters'.

Now we are ready to prune this repository to hold only the project we are interested in.

Step 5: Pruning the Repository

The repository we created in the previous step contains all the projects that were in our non-standard SVN repository. We want to create a repository that contains only the CodeProject-2.7 project.

A quick Google search lead me to a blog by Dalibor Nasevic titled Permanently remove files and folders from a git repository . While this did not do exactly what I wanted, it did point me to the Git command I needed git-filter-branch in the online documentation.

This command as an option --subdirectory-filter which removes all root directories other than the one specified, and then moves the contents of the directory to the root. Just what I need :). It also requires a temp directory, specified with the -d <directory> option, as it needs to checkout the repository to do its work. It is recommended that the temp directory be on another drive for performance reasons. I ran the following command in my Bash window in the theRepo repository.

git filter-branch --subdirectory-filter CodeProject-2.7 -d /c/TempGitRepo

After several minutes I now have a repository that contains only the project I wanted, along with all of its history.

Step 6: Adding the .gitignore file

This is were I am glad I saved a copy of the .gitignore file as the contents of the root directory has been replaced with the contents of the selected project. My .gitignore is gone. I copied my saved copy to the repository directory.

I then opened the solution in Visual Studio 2013 to Verify that the project loads. The solution successfully loaded. So I added the .gitignore file to my Solution Items folder for easy reference.

The next step is to check these changes into the local repository. To do this you need to be using the Git Source Code Provider. This can be found in the
Tools->Options->Source Control->Plug-in Selection options settings as shown below. You may have to re-open the solution after making the change. I normally don't, but did while I was writing this section.

Open the Team Explorer form the View/ menu.

Click on Changes to get to the Git support for committing changes. If required, drag the .gitignore file from the Untracked Files to the Included Changes section. Also, if required, right click on the CodeProject.suo file and have it ignore *.suo files as they are user specific. My .gitignore file has an entry for this, so it was not required.

Enter a commit message such as "Added .gitignore file", and click the Commit button. Close the solution as we need to change the Git origin to Workspaces in the next step.

Step 7: Pushing to Workspaces

The first step is to get the URL for the Git Repository in Workspaces. We have an existing Workspace, and so will use the ::GitMachine instance that comes with it. You could also create a new Workspace or add a new instance of ::GitMachine to an existing Workspace. This needs to be an empty repository. You can get the URL for the repository from the ::GitMachine instance as shown below. More detailed instructions are also included on the page.

I created a batch file PushToWorkspaces.bat to do the work. This file contains

cd /d/temp/theRepo
git remote remove origin
git remote add origin https://git.codeproject.com/[a-user]/[a-workspace]/cp-code 
git push -u origin master 

You will be asked for your email address and password. Replace URLwith the URLfrom your Git repository's URL, which you can get from your Workspaces Git Repository as shown below.

Once this is done, you can now clone the repository and start working. I has to make a few changes to my solution, but these were to fix a NuGet package restore issue. Otherwise, Everything built an run. It is likely that the version of the CodeProject website you are reading this on, was built from the Workspaces Git Repository.

Points of Interest

Visual Studio 2013 Update 2, with its improved built in Git support and Workspaces make a wonderful evironment to manage both your Code and your Tasks, Bugs, and Backlog.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here