SVN to GIT migration (1)

Preface

Abandoning SVN in our projects (gluegen, jogl and jogl-demos) for the various reasons, http://git.or.cz/gitwiki/GitSvnComparsion,
we had to choose a new SCM.

Here are some reasons why git wins over mercurial/hg,

especially the branch/merge handling, maturity and wide growing user base favors git.
However, it may also be just a matter of taste ..

Some git resources:

SVN to git Conversion

The conversion demonstrated here is using the dev.java.net gluegen repository as an example, we have converted the jogl and jogl-demos the same way.

We have used git version 1.6.3.3.

Instead of using git-svn directly, we utilize the tool svn-all-fast-export, http://repo.or.cz/w/svn-all-fast-export.git,
which not only uses a fast local SVN repository, but also migrates all local branches and tags to git branches.
The latter is true, SVN tags converted to git branches, since SVN tags are actually branches.

Update: It turns out that the tool http://repo.or.cz/w/svn-all-fast-export.git
is not able to find the proper tag and branch points within the original repository,
hence we will use git-svn.

Directory Layout

We assume the following layout ..

/../JOGL
/../JOGL/svn-server-sync
/../JOGL/svn
/../JOGL/git-svn

All subsequent action start from the root directory JOGL.

SVN Preperation

Clone Repository

Update: This task is optional. However it is highly recommended since it dramatically speeds up the SVN checkout process.

First we create a local clone of the SVN repository. This is already tricky, since the svn versions are not backward compatible,
i.e. dev.java.net SVN repository cannot be synchronized with svn version 1.6. The latest compatible version is svn 1.5.2.
Here we used an installed svn version 1.4.4 from a MacOSX machine!

1: mkdir svn-server-sync
2: cd svn-server-sync
3: svnadmin create gluegen
4: cp gluegen/hooks/pre-revprop-change.tmpl gluegen/hooks/pre-revprop-change
5: vi gluegen/hooks/pre-revprop-change
6: chmod ugo+x gluegen/hooks/pre-revprop-change
7: svnsync --username userid --password pwd init file://`pwd`/gluegen https://gluegen.dev.java.net/svn/gluegen
8: svnsync sync file://`pwd`/gluegen

Step [1..3] create the local SVN repository.

Steps [4..6] create the svn critical hook file and makes it writeable, while step 5 edits it as follows:

diff -Nur gluegen/hooks/pre-revprop-change.tmpl gluegen/hooks/pre-revprop-change
--- gluegen/hooks/pre-revprop-change.tmpl       2009-07-08 09:18:55.000000000 -0700
+++ gluegen/hooks/pre-revprop-change    2009-07-08 09:20:06.000000000 -0700
@@ -60,6 +60,8 @@
 PROPNAME="$4"
 ACTION="$5"

+exit 0
+
 if [ "$ACTION" = "M" -a "$PROPNAME" = "svn:log" ]; then exit 0; fi

 echo "Changing revision properties other than svn:log is prohibited" >&2

Step 7 links the local SVN repository  to it’s remote master.

Step 8 finally pulls it down to our local copy.

This is done for the jogl and jogl-demos repositories as well.

Create author mapping file

Create a svn.authors file, which maps the svn author names to git author names:

#!/bin/bash
authors=$(svn log -q | grep -e '^r' | awk 'BEGIN { FS = "|" } ; { print $2 }' | sort | uniq)
for author in ${authors}; do
  echo "${author} = ${author} <${author}@dev.java.net>" >> svn.authors ;
done

Use the above script in all of your original checked-out svn repository directories (gluegen, jogl and jogl-demos) to catch all author names.
Fix the svn.authors file’s email addresses if appropriate.
Add the mandatory author names:

(no author) = First Last <no.author@domain.net>
gfxadmin = gfxadmin <gfxadmin@domain.net>
root = root <root@domain.net>
httpd = httpd <httpd@domain.net>

Move old local SVN repositories

We move away the old SVN checkouts ..

mkdir svn
mv gluegen jogl jogl-demos svn/

Converting to git

Preperation

Get the tool svn-all-fast-export, http://repo.or.cz/w/svn-all-fast-export.git, via git clone git://repo.or.cz/svn-all-fast-export.git, build and install it.

Go our parent directory JOGL, where svn-server-sync resides.

mkdir git-svn
cd git-svn

Copy the previously created svn.authors file to this directory git-svn.

Copy the file merged-branches-tags.rules out of the checked out svn-all-fast-export.git
repository, svn-all-fast-export/samples/merged-branches-tags.rules, to this directory git-svn.

Converting

cd git-svn

We customize the svn-all-fast-export configuration file, here for gluegen:
sed ‘s/myproject/gluegen/g’ merged-branches-tags.rules > gluegen-merged-branches-tags.rules

git svn --authors-file=./svn.authors clone -s file://`pwd`/../svn-server-sync/gluegen gluegen 2>&1 | tee gluegen.log

This task migrates the SVN repository to git, note that all tags of the SVN repository are represented as branches within git,
which we will fix within the next step. Don’t forget to check the log file, here gluegen.log.

In case somethings goes wrong, ie the task is incompete, e.g. due to a missing author-name mapping,
fix that error (add author-name to svn.authors) and continue:

cd gluegen
git svn --authors-file=../svn.authors fetch  2>&1 | tee ../gluegen-2.log

Pick remote branches & manual tagging

cd git-svn/gluegen

The converted repository has referenced branches to the remote SVN origin,
we have to pick these and make them local.

Also, as mentioned above, SVN tags are actually branches, so we have to converted them to git tags.

git branch -a
* master
  remotes/1.0b06-maint
  remotes/1.0b06a
  remotes/JOGL_2_SANDBOX
  remotes/tags/1.0b06
  remotes/tags/1.0b06a
  remotes/trunk
git tag -l

The above shows us a few remote branches and no tags.

We have to distinguish the remote branches from the tags,
which is easy at looking at the SVN branches page.  Valid remote branches are 1.0b06-maint and the JOGL_2_SANDBOX.
The below commands create local branches from the remote ones.

git checkout --track -b 1.0b06-maint remotes/1.0b06-maint
git checkout --track -b JOGL_2_SANDBOX remotes/JOGL_2_SANDBOX

Last but not least, we have to manually create git tags from the fake SVN tags, which are visible as remotes/tags/* branches.

1: git checkout --track -b tag_1.0b06 remotes/tags/1.0b06
2: git checkout --track -b tag_1.0b06a remotes/tags/1.0b06a
3: git checkout master
4: git tag 1.0b06 tag_1.0b06
5: git tag 1.0b06a tag_1.0b06a
6: git branch -D tag_1.0b06 tag_1.0b06a
7: git branch -a
  1.0b06-maint
  JOGL_2_SANDBOX
* master
  remotes/1.0b06-maint
  remotes/1.0b06a
  remotes/JOGL_2_SANDBOX
  remotes/tags/1.0b06
  remotes/tags/1.0b06a
  remotes/trunk
8: git tag -l
  1.0b06
  1.0b06a
9: git branch -r -D 1.0b06-maint 1.0b06a JOGL_2_SANDBOX tags/1.0b06 tags/1.0b06a trunk
X: git branch -a
  1.0b06-maint
  JOGL_2_SANDBOX
* master

Step [1..2] creates local branches with the prefix tag_ from the remote ones.
Step 3 switches back to the master branch
Step [4..5] creates tags from the local branches
Step 6 deletes the local branches, which we just have tagged.
Step 7 shows the branches
Step 8 shows the tags
Step 9 removes the remote branches, which disconnects us from the svn repository
Step X: shows the branches

In case you have many branches, you may use a simple shell script, fed with a file containing the branch names,
without the ref-path (remotes/)

#! /bin/sh

branchfile=$1
shift

if [ -z "$branchfile" ] ; then
    echo Usage $0 branchfile containing branchnames without ref path
    exit 1
fi

for i in `cat $branchfile` ; do
    git checkout --track -b $i remotes/$i
    git checkout master
    git branch -r -D $i
done

In case you have many tags, you may use a simple shell script, fed with a file containing the tag names,
without the ref-path (remotes/tags/)

#! /bin/sh

tagfile=$1
shift

if [ -z "$tagfile" ] ; then
    echo Usage $0 tagfile containing tagnames without ref path
    exit 1
fi

for i in `cat $tagfile` ; do
    git checkout --track -b tag_"$i" remotes/tags/$i
    git tag $i tag_"$i"
    git checkout master
    git branch -D tag_"$i"
    git branch -r -D tags/$i
done

Cleanup

1: cd git-svn/gluegen
2: git gc
3: cd ..
4: git clone --mirror gluegen gluegen.git

Since the conversion left a lot of dirt in the git repository, and we have joggled around with a lot branches,
we want to manually garbage collect (gc) our folder with the Step 2.
Step 4 creates a bare mirror clone of the cleaned up git repository, you may like to use this as a backup.

Pre cleanup: 15MByte, post cleanup 6.6MByte, bare mirror 2.7M.

Publish git repository to the server

cd git-svn/gluegen.git

Now we push our conversion to the git server’s repository:

git push --mirror ssh://username@git.kenai.com/gluegen~gluegen-git

The mirror option ensures a 1:1 copy including all branches and tags.

git Usage

We assume to be in our root directory JOGL.

No more gluegen, jogl or jogl-demos directory exist anymore, since we moved it away to the new subdirectory svn.

Clone

1a: git clone git://kenai.com/gluegen~gluegen-git  gluegen
1b: git clone ssh://username@git.kenai.com/gluegen~gluegen-git gluegen
2: cd gluegen
3: git branch -a
4: git remote show origin

Step 1a or 1b creates the local gluegen git clone in the directory gluegen,
where 1a is the anonymous checkout and 1b uses the SSH login shell remote connection,
the latter is recommended for putbacks.

Step 2 cd’s into it.

Step 3 lists all branches.

Step 4 shows remote tracked branches

Checkout a remote tracked branch

In case you like to browse around within some branch, which is not yet local, ie remote,
you have to checkout the remote one to become local, ie the JOGL_2_SANDBOX.

git remote show origin
git checkout --track -b JOGL_2_SANDBOX origin/JOGL_2_SANDBOX
git checkout JOGL_2_SANDBOX

Then you can switch between local branches as usual ..

Switch local branches

git checkout JOGL_2_SANDBOX
git checkout master

Remove a remote branch on server

You may have created and published the new branch my_branch derived from master:

git branch --track my_branch origin/master
git checkout my_branch
.. changes ..
pit push --all

.. or you were using a remote branch ..

git checkout --track -b my_branch origin/my_branch
git checkout my_branch
.. changes ..
git push --all

Now we like to remove the branch my_branch,
and remove it on the server as well.

1: git branch -D my_branch
2: git push origin :refs/heads/my_branch

Step 1 removes the branch my_branch in the local repository.
Step 2 removes the branch my_branch on the server repository.

Another user may want to clean up the local repository, ie remove a tracked remote branch:

1: git branch -r -D origin/my_branch
2: git fetch

Last but not least …

.. participation and contributions.

Git’s model is not really layed out for central SCM,
sure it can be abused as such, as you have seen above.

It’s real powers are it’s distribution qualities!

Either a contributor has it’s own repository online via a webserver or ssh access,
or she can send the patch via email.
However it is offered to an integrator within the chain of trust (in a project),
she asks the integrator to pull the branch via the mentioned online or email content.
The integrator then verifies the patch somehow and merges it to her own branch, etc etc.
These are the key features of git,
a very reliable high performance branching and merging.

2 comments to SVN to GIT migration (1)

  • How about providing a nightly SVN snapshot from the git repository?

    I’m sure everyone will be using git in a couple years time, just like we all migrated from CVS to SVN. But right now, SVN is really well supported in terms of tools. A simple nightly snapshot would make the JOGL sources available to all those not quite ready to make the jump to yet another SCM solution.

  • No, thank you .. we abandon the ugly SVN for good :)
    But feel free to setup a SVN repository and maintain it, you can.
    Everything not distributed (hg or git) is too weak nowadays ..