Migrating Bricolage CVS to Git
Following a discussion on the Bricolage developers mail list, I started down the path last week of migrating the Bricolage Subversion repository to Git. This turned out to be much more work than I expected, but to the benefit of the project, I think. Since I had a lot of questions about how to do certain things and how Git thinks about certain things, I wanted to record what I worked out here over the course of a few entries. Maybe it will help you manage your migration to Git.
The first thing I tried to do was use
git-svn to migrate Bricolage to Git. I
pointed it to the root directory and let it rip. I immediately saw that it
noticed that the root was originally at the root of the repository, rather than
the “bricolage” subdirectory, and so followed that path and started pulling
stuff down. In a separate terminal window, I was watching the branches build up,
and there were a lot of them, many named like:
David David@5248 David@584 tags/Release_1_2_1 tags/Release_1_2_1@5249 tags/Release_1_2_1@577
Although many of those branches and tags hadn’t been used since the beginning of
time, and certainly not since Bricolage was moved to Subversion from its
original home in SourceForge CVS, because Subversion has no real concept of
branches or tags,
git-svn was duly copying them all, including the separate
histories for each. Yow.
I could have dealt with that, renaming things, deleting others, and grafting
where appropriate (more on grafting in a minute), but then I got this error from
bricolage/branches/rev_1_8/lib/Bric/App/ApacheConfig.pm was not found in commit e5145931069a511e98a087d4cb1a8bb75f43f899 (r5256)
This was annoying, especially since the file clearly does exist in that commit:
svn list -r5256 http://svn.bricolage.cc/bricolage/branches/rev_1_8/lib/Bric/App/ApacheConfig.pm ApacheConfig.pm
I posted to the Git mail list about this issue, but unfortunately got no reply. Given that it was taking around 30 hours(!) to get to that point (and about 18 hours once I started using a local copy of the Subversion repository, thank to a suggestion from Ask Bjørn Hansen), I started thinking about how to simplify things a bit.
Since most of the moving stuff around happened immediately after the move to Subversion, and before we started committing working code to the repository, it occurred to me that I could probably go back to the original Bricolage CVS Repository on SourceForge, migrate that to Git, and then just migrate from Subversion starting from the first real commit there. Then I could just stitch the two repositories together.
From CVS to Git
Thanks to advice from IRC, I used
cvs2git to build a repository from a dump
from CVS. Apparently,
git cvsimport makes a lot of mistakes, while
does a decent job keeping branches and tags where they should be. It’s also
pretty fast; once I set up its configuration and ran it, it took only around 5
minutes for it to build import files for
git fast-import. It also has some
nice features to rename symbols (tags), ignore tags, assign authors, etc. I’m
aware of not tool to migrate Subversion to Git that does the same thing.
Once I had my dump, I started writing a script to import it into Git. The basic import looks like this:
GITREPO=/Users/david/Desktop/git_from_cvs rm -rf $GITREPO mkdir $GITREPO chdir $GITREPO git init cat ../cvs2svn-tmp/git-blob.dat ../cvs2svn-tmp/git-dump.dat | git fast-import svn2git --no-clone git gc git reset --hard
I used svn2git to convert remote branches to local tags and branches The
--no-clone option is what keeps it from doing the Subversion stuff; everything
else is the same for a new conversion from CVS. I also had to run
git reset --hard to throw out uncommitted local changes. What changes? I’m not
sure where they came from, but after the last commit is imported from CVS, all
of the local files in the master branch are deleted, but that change is not
committed. Strange, but by doing a hard reset, I reverted that change with no
Next, I started looking at the repository in GitX, which provides a decent graphical interface for browsing around a Git repository on Mac OS X. There I discovered that a major benefit to importing from CVS rather than Subversion is that, because CVS has real tags, those tags are properly migrated to Git. What this means is that, because the Bricolage project (nearly) always tagged merges between branches and included the name of the appropriate tag name in a merge commit message, I was able to reconstruct the merge history in Git.
For example, there were a lot of tags named like so:
% git tag rev_1_8_merge-2004-05-04 rev_1_6_merge-2004-05-02 rev_1_6_merge-2004-04-10 rev_1_6_merge-2004-04-09 rev_1_6_merge-2004-03-16
So if I wanted to find the merge commit that corresponded to that first tag, all I had to do was sort the commits in GitX by date and look near 2004-05-04 for a commit message that said something like:
Merge from rev_1_8. Will tag that branch "rev_1_8_merge-2004-05-04".
That commit’s SHA key is “b786ad1c0eeb9df827d658a81dc2d32ec6108e92”. Its
parent’s SHA key is “11dbbd49644aaa607bd83f8d542d37fcfbd5e63b”. So then all I
had to do was to tell git that there is a second parent for that commit. Looking
in GitX for the commit tagged “rev_1_8_merge-2004-05-04”, I found that its
SHA key is “4fadb117a71a49add69950eccc14b77a04c8ec68”. So to assign that as a
second parent, I write a line to the file
.git/info/grafts that describes its
b786ad1c0eeb9df827d658a81dc2d32ec6108e92 11dbbd49644aaa607bd83f8d542d37fcfbd5e63b 4fadb117a71a49add69950eccc14b77a04c8ec68
Once I had all the grafts written, I just ran
git filter-branch and they were
permanently rewritten to the new hierarchy.
And that’s it! The parentage is now correct. It was a lot of busy work to create the mapping between tags and merges, but it’s nice to have it all done and properly mapped out historically in Git. I even found a bunch merges with no corresponding tags and figured out the proper commit to link them up to (though I stopped when I got back to 2002 and things get really confusing). And now, because the merge relationships are now properly recorded in Git, I can drop those old merge tags: as workarounds for a lack of merge tracking in CVS, they are no longer necessary in Git.
Next up, how I completed the merge from Subversion. I’ll write that once I’ve finally got it nailed down. Unfortunately, it takes an hour or two to export from Subversion to Git, and I’m having to do it over and over again as I figure stuff out. But it will be done, and you’ll hear more about it here.
Looking for the comments? Try the old layout.