Migrating Bricolage Subversion to Git
Following up on last week’s post on migrating the old Bricolage SourceForge CVS repository to Git, here are my notes on migrating the current Bricolage Subversion repository to Git.
It turns out that migrating from Subversion is much more of a pain than
migrating from CVS. Why? Because CVS has real tags, while Subversion does not.
So while git-svn
tries to identify all of your tags and branches, it’s really
relying on your Subversion repository using standard directories for all of your
branches and tags. And while we’ve used a standard for branches directory, our
tags setup is a bit more complicated.
The problem was that we used tags every time we merged between branches. This meant that we ended up with a lot of tags with names like “merge_rev_1_10_5665” to indicate a merge from the “rev_1_10” branch into trunk at r5665. Plus we had tags for releases. So Marshall took it upon himself to reorganize the tags in the Subversion tree so that all release tags went into the “releases” subdirectory, and merges went into subdirectories named for the branch from which the merge derived. Those subdirectories went into the “merges” subdirectory. We ended up with a directory structure organized like this:
/tags/
/releases/
/1.10.1/
/1.10.2/
/1.10.3/
/merges/
/dev_ajax/
/trunk-7890
/rev_1_10/
/trunk-7043/
/trunk-7194/
/trunk-7300/
This was useful for keeping things organized in Subversion, so that we could
easily find a tag for a previous merge in order to determine the revisions to
specify for a new merge. But because older tags were moved from previous
locations, and because newer tags were in subdirectories of the “tags”
directory, git-svn
did not identify them as tags. Well, that’s not really
fair. It did identify earlier tags, before they were moved, but all the other
tags were not found. Instead I ended up with tags in Git named tags/releases
and tags/merges
, which was useless. But even if all of our tags had been
identified as tags, none had parent commit IDs, so there was no place to see
where they actually came from.
So to rebuild the commit, release, and merge history from Subversion, I first
created a local copy of the subversion repository using svnsync
. Then I cloned
it to Git like so:
SVNREPO=file:///Users/david/svn_bricolage_cc
git svn init $SVNREPO --stdlayout
git config svn.authorsfile /Users/david/bric_authors.txt
git svn fetch --no-follow-parent --revision 5517:HEAD
By starting with r5517, which was the first real commit to Subversion, I avoided
the git-svn
error I reported last week. In truth, though, I ended up running
this clone many, many times. The first few times, I ran it with
--no-metadata
, as recommended in various HOWTOs. But then I kept getting
errors such as:
git svn log
fatal: bad default revision 'refs/remotes/git-svn'
----------------------------------------------------
This was more than a little annoying, and it took me a day or so to realize that
this was because I had been using --no-metadata
. Once I killed off that
option, things worked much better
Furthermore, by starting at r5517 and passing the --no-follow-parent
option,
git-svn
ran much more quickly. Rather than taking 30 hours to get all
revisions including stuff that had been moved around (and then failing), it now
took around 90 minutes to do the export. Much more manageable, although I also
started making backup copies and restoring from them as I experimented with
fixing branches and tags. Ultimately, I ended up also passing the
--ignore-paths
option, to exclude various branches that were never really used
or that I had already fetched in their entirety from CVS:
git svn fetch --no-follow-parent --revision 5517:HEAD \
--ignore-paths '(David|Kineticode|Release_|dev_(callback|(media_)?templates)|rev_1_([024]|[68]_temp)|tags/(Dev-|Release_|Start|help|mark|rel_1_([24567]|8_0)|rev_1_([26]|8_merge-2004-05-04)))|tmp'
svn2git --no-clone
The call to svn2git converts remote branches to local tags and branches. Now I had a reasonably clean copy of the repository (aside from the 120 or so commits from when Marshall did the tags reorganization) for me to work with. I opened it up with GitX and started scripting out merges.
To assist in this, I took a hint from Ask Bjørn Hansen, sent in email in response to a Tweet, and tagged every single commit with its corresponding Subversion revision number, like so (in Perl):
for my $c (`git rev-list --all --date-order --timestamp | sort -n | awk '{print \$2}'`) {
chomp $c;
my ($svnid) = `git show -s $c | tail -1` =~ /[@](\d+)\s+/;
system qw(git tag -f), $svnid, $c;
}
The nice thing about this is that it made it easy for me to scan through the commits in GitX and see where things were. It also meant that I could reference these tags when I wrote the code to manage the merges. So what I did was sort the commits in reverse chronological order, and then search for those with the word “merge” in their subjects. When one was clearly for a merge (as opposed to simply using the word “merge”), I would disable the search, scroll through the commits until I found the selected commit, and then look for a likely prior commit that it merged from.
This was a bit of pain in the ass, because, unfortunately, GitX doesn’t keep the selected commit record in the middle of the screen when you cancel the search. Mail.app does this right: If I do a search, select a message, then cancel the search, the selected message is still in the middle of the screen. But with GitX, as I said, I have to scroll to find it. This wasn’t going to scale very well. So what I did instead was search for “merge”, then I took a screen shot of the results and cancelled the merge. Then I just opened the screenshot in Preview, looked at the records there, then found them in GitX. This made things go quite a bit faster.

As a result, I added a migration function to properly tag merges. It looked like this:
sub graft_merges {
print "Grafting merges\n";
# Handle the merges.
for my $graft (
[qw( trunk@5524 rev_1_8@5523 )],
[qw( trunk@5614 rev_1_8@5613 )],
[qw( rev_1_8@5591 trunk@5590 )],
) {
my ($commit, $parent) = map { s/.+[@]//; $_ } @$graft;
my $cmd = "\$(git rev-parse $commit) "
. "\$(git rev-parse $commit^) "
. "\$(git rev-parse $parent)";
`echo "$cmd" >> .git/info/grafts`;
}
}
By referencing revision tags explicitly, I was able to just use git rev-parse
to look up SHA1 hash IDs to put into .git/info/grafts
. This saved me the
headache of dealing with very long IDs, but also allowed me to easily keep track
of revision numbers and branches (the branch information is actually superfluous
here, but I kept it for my sanity). So, basically, for
[qw( trunk@5524 rev_1_8@5523 )]
, it ends up writing the SHA1 hashes for r5524,
the existing parent commit for r5524 (that’s the $commit^
bit), and for the
new parent, r5523. I ended up with 73 merges that needed to be properly
recorded.
With the merges done, I next dove into branches. For some reason, git-svn
failed to identify a parent commit for any branch. Maybe because I started
with r5517? I have no idea. So I had to search through the commits to see when
branches were started. I mainly did this by looking at the branches in ViewVC.
By clicking each one, I was able to see the earliest commit, which usually had a
name like “Created a branch for my SoC project.” I would then look up that
commit in ViewVC, such as r7423, which started the “dev_ajax” branch, just to
make sure that it was copied from trunk. Then I simply went into GitX, found
r7423, then looked back to the last commit to trunk before r7423. That was the
parent of the branch. With such data, I was able to write a function like this:
sub graft_branches {
print "Grafting branches\n";
for my $graft (
[qw( dev_ajax@7423 trunk@7301 )],
[qw( dev_mysql@7424 trunk@7301 )],
[qw( dev_elem_occurrence@7427 trunk@7301 )],
) {
my ($commit, $parent) = map { s/.+[@]//; $_ } @$graft;
my $cmd = "\$(git rev-parse $commit) "
. "\$(git rev-parse $parent)";
`echo "$cmd" >> .git/info/grafts`;
}
}
Here I only needed to look up the revision and its parent and write it to
.git/info/grafts
. Then all of my branches had parents. Or nearly all of them;
those that were also in the old CVS repository will have to wait until the two
are stitched together to find their parents.
Next I needed to get releases properly tagged. This was not unlike the merge tag work: I just had to find the proper revision and tag it. This time, I looked through the commits in GitX for those with “tag for” in their subjects because, conveniently, I nearly always used this phrase in a release tag, as in “Tag for the 1.8.11 release of Bricolage.” Then I just looked back from the tag commit to find the commit copied to the tag, and that commit would be tagged with the release tag. The function to create the tags looked like this:
sub tag_releases {
print "Tagging releases\n";
for my $spec (
[ 'rev_1_8@5726' => 'v1.8.1' ],
[ 'rev_1_8@5922' => 'v1.8.2' ],
[ 'rev_1_8@6073' => 'v1.8.3' ],
) {
my ($where, $tag) = @{$spec};
my ($branch, $rev) = split /[@]/, $where;
my $tag_date = `git show --pretty=format:%cd -s $rev`;
chomp $tag_date;
local $ENV{GIT_COMMITTER_DATE} = $tag_date;
system qw(git tag -fa), $tag, '-m', "Tag for $tag release of Bricolage.", $rev;
}
}
I am again indebted to Ask for the code here, especially to set the date for the tag.
Since I had created new release tags and recreated the merge history in Git, I
no longer needed the old tags from Subversion, so next I rewrote the
--ignore-paths
option to exclude all of the tags directories, as well as some
branches that were never used:
SVNREPO=file:///Users/david/svn_bricolage_cc
git svn init $SVNREPO --stdlayout
git config svn.authorsfile /Users/david/bric_authors.txt
git svn fetch --no-follow-parent --revision 5517:HEAD
git svn fetch --no-follow-parent --revision 5517:HEAD \
--ignore-paths '(David|Kineticode|Release_|dev_(callback|(media_)?templates)|rev_1_([024]|[68]_temp)|tags/)|tmp';
With this in hand, I killed off the call to svn2git
, opting to convert trunk
and the remote branches myself (easily done by copying-and-pasting the relevant
Perl code). Then all I needed to do was clean up the extant tags and run
git-filter-branch
to make the grafts permanent:
sub finish {
print "Deleting old tags\n";
my @tags = grep m{^tags/}, map { s/^\s+//; s/\s+$//; $_ } `git branch -a`;
system qw(git branch -r -D), $_ for @tags;
print "Deleting revision tags\n";
@tags_to_delete = grep { /^\d+$/ } map { s/^\s+//; s/\s+$//; $_ } `git tag`;
system qw(git tag -d), $_ for @tags_to_delete;
print "Grafting...\n";
system qw(git filter-branch);
system qw(git gc);
}
And now I have a nicely organized Git repository based on the Bricolage Subversion repository, with all (or most) merges in their proper places, release tags, and branch tracking. Now all I have to do is stitch it together with the repository based on CVS and I’ll be ready to put this sucker on GitHub! More on that in my next post.
Looking for the comments? Try the old layout.