Migrating Bricolage CVS to Git

Following a discussion on the Bricolage developers mail list, I started down the path last week of migrating the Bricolage Subversion repository to Git. This turned out to be much more work than I expected, but to the benefit of the project, I think. Since I had a lot of questions about how to do certain things and how Git thinks about certain things, I wanted to record what I worked out here over the course of a few entries. Maybe it will help you manage your migration to Git.

The first thing I tried to do was use git-svn to migrate Bricolage to Git. I pointed it to the root directory and let it rip. I immediately saw that it noticed that the root was originally at the root of the repository, rather than the “bricolage” subdirectory, and so followed that path and started pulling stuff down. In a separate terminal window, I was watching the branches build up, and there were a lot of them, many named like:

David
David@5248
David@584
tags/Release_1_2_1
tags/Release_1_2_1@5249
tags/Release_1_2_1@577

Although many of those branches and tags hadn't been used since the beginning of time, and certainly not since Bricolage was moved to Subversion from its original home in SourceForge CVS, because Subversion has no real concept of branches or tags, git-svn was duly copying them all, including the separate histories for each. Yow.

I could have dealt with that, renaming things, deleting others, and grafting where appropriate (more on grafting in a minute), but then I got this error from git-svn:

bricolage/branches/rev_1_8/lib/Bric/App/ApacheConfig.pm was not
found in commit e5145931069a511e98a087d4cb1a8bb75f43f899 (r5256)

This was annoying, especially since the file clearly does exist in that commit:

svn list -r5256 http://svn.bricolage.cc/bricolage/branches/rev_1_8/lib/Bric/App/ApacheConfig.pm
ApacheConfig.pm

I posted to the Git mail list about this issue, but unfortunately got no reply. Given that it was taking around 30 hours(!) to get to that point (and about 18 hours once I started using a local copy of the Subversion repository, thank to a suggestion from Ask Bjørn Hansen), I started thinking about how to simplify things a bit.

Since most of the moving stuff around happened immediately after the move to Subversion, and before we started committing working code to the repository, it occurred to me that I could probably go back to the original Bricolage CVS Repository on SourceForge, migrate that to Git, and then just migrate from Subversion starting from the first real commit there. Then I could just stitch the two repositories together.

From CVS to Git

Thanks to advice from IRC, I used cvs2git to build a repository from a dump from CVS. Apparently, git cvsimport makes a lot of mistakes, while cvs2git does a decent job keeping branches and tags where they should be. It's also pretty fast; once I set up its configuration and ran it, it took only around 5 minutes for it to build import files for git fast-import. It also has some nice features to rename symbols (tags), ignore tags, assign authors, etc. I'm aware of not tool to migrate Subversion to Git that does the same thing.

Once I had my dump, I started writing a script to import it into Git. The basic import looks like this:

GITREPO=/Users/david/Desktop/git_from_cvs
rm -rf $GITREPO
mkdir $GITREPO
chdir $GITREPO
git init
cat ../cvs2svn-tmp/git-blob.dat ../cvs2svn-tmp/git-dump.dat | git fast-import
svn2git --no-clone
git gc
git reset --hard

I used svn2git to convert remote branches to local tags and branches The --no-clone option is what keeps it from doing the Subversion stuff; everything else is the same for a new conversion from CVS. I also had to run git reset --hard to throw out uncommitted local changes. What changes? I'm not sure where they came from, but after the last commit is imported from CVS, all of the local files in the master branch are deleted, but that change is not committed. Strange, but by doing a hard reset, I reverted that change with no harm done.

Next, I started looking at the repository in GitX, which provides a decent graphical interface for browsing around a Git repository on Mac OS X. There I discovered that a major benefit to importing from CVS rather than Subversion is that, because CVS has real tags, those tags are properly migrated to Git. What this means is that, because the Bricolage project (nearly) always tagged merges between branches and included the name of the appropriate tag name in a merge commit message, I was able to reconstruct the merge history in Git.

For example, there were a lot of tags named like so:

% git tag
rev_1_8_merge-2004-05-04
rev_1_6_merge-2004-05-02
rev_1_6_merge-2004-04-10
rev_1_6_merge-2004-04-09
rev_1_6_merge-2004-03-16

So if I wanted to find the merge commit that corresponded to that first tag, all I had to do was sort the commits in GitX by date and look near 2004-05-04 for a commit message that said something like:

Merge from rev_1_8. Will tag that branch "rev_1_8_merge-2004-05-04".

That commit's SHA key is “b786ad1c0eeb9df827d658a81dc2d32ec6108e92”. Its parent's SHA key is “11dbbd49644aaa607bd83f8d542d37fcfbd5e63b”. So then all I had to do was to tell git that there is a second parent for that commit. Looking in GitX for the commit tagged “rev_1_8_merge-2004-05-04”, I found that its SHA key is “4fadb117a71a49add69950eccc14b77a04c8ec68”. So to assign that as a second parent, I write a line to the file .git/info/grafts that describes its parentage:

b786ad1c0eeb9df827d658a81dc2d32ec6108e92 11dbbd49644aaa607bd83f8d542d37fcfbd5e63b 4fadb117a71a49add69950eccc14b77a04c8ec68

Once I had all the grafts written, I just ran git filter-branch and they were permanently rewritten to the new hierarchy.

And that's it! The parentage is now correct. It was a lot of busy work to create the mapping between tags and merges, but it's nice to have it all done and properly mapped out historically in Git. I even found a bunch merges with no corresponding tags and figured out the proper commit to link them up to (though I stopped when I got back to 2002 and things get really confusing). And now, because the merge relationships are now properly recorded in Git, I can drop those old merge tags: as workarounds for a lack of merge tracking in CVS, they are no longer necessary in Git.

Next up, how I completed the merge from Subversion. I'll write that once I've finally got it nailed down. Unfortunately, it takes an hour or two to export from Subversion to Git, and I'm having to do it over and over again as I figure stuff out. But it will be done, and you'll hear more about it here.

Backtalk