Jan 20, 2012
Scratching one's itch
Project at $JOB, March 2010 — 5 years, 22000 commits, 2.8GB
- git-svn — 4 days, NN GB RAM, failed at 90%
- svn-all-fast-export — skip 3 commits or spew indefinitely
- svn2git.py — quick for 50% then slowed to glacial pace
Complete and timely conversion impossible!
Sufficiently numerous anecdotes are data
Maybe it's not just my repository?
Reached out to the git community for anecdotes
Came with a proposal in hand
Saved the world in less than 30 days
OK, so it was hard. Why?
Mapping Subversion history to git DAG non-trivial
- Empty directories
- File permissions
- Symlink representation
- Commit and file metadata
So what was the proposal?
Translator — Subversion dump to git fast-import
Data structure that maps between them and scales linearly
Independent of both projects
Merged svn-dump-fast-export into git-contrib as svn-fe
Sounds easy, why more than 30 days?
Although it 'uses only timeless fs concepts', verbs in Subversion dump format not clearly defined.
The obvious one was delete, the rest full of surprises.
Thus began a 3 month protocol reverse-engineering effort.
So we have a translator, how is it fed?
Extracting complete history from Subversion server is hard.
- svnsync + svnadmin dump
- svk sync + svk admin dump
For large repositories, ask for a compressed dump.
Enter svnrdump, the product of Ram's GSoC 2010 project.
svnrdump — Google Summer of Code 2010
- First-class Subversion tool
- svnrdump dump
- svnrdump load
- General purpose tool suitable for *Nix style interaction
Any issues with git fast-import?
Protocol extensions for bi-directional communication
- cat-blob — read existing or imported blob data
- ls — inspect active commit or named trees
Complexity can be removed from the translator.
Applying Subversion binary deltas becomes straightforward.
git-remote-svn — Google Summer of Code 2011
- svn-fe additional command-line flags
- git-remote-svn wrapper in Python
- Numerous bug fixes for git fast-import
- Documentation fixes for git fast-import
- Use cat-blob as a hint for delta computation
Sverre Rabbelier, Jeff King, et al.
In parallel to all this work, there were many improvements to the remote helper infrastructure which would ultimately be used to integrate the translator into the natural git flow.
Jérémie Nikaes' git-remote-mediawiki helped spur things along.
- Merge outstanding GSoC patches into git-core
- Upstream vcs-svn from git-core to svn-dump-fast-export
- Implement fast-import protocol extensions for hg or bzr
- Reverse translator to enable writing to Subversion
- Integrate svn-dump-fast-export with hg or bzr
Merge outstanding GSoC patches into git-core
- 90+ patches not yet merged into jch/master
- prioritise topics and reorder for submission
- garner support for inclusion
Upstream vcs-svn from git-core to svn-dump-fast-export
Implement fast-import protocol extensions for hg or bzr
Reverse translator to enable writing to Subversion
Integrate svn-dump-fast-export with hg or bzr
Questions & Answers
For more information or to get involved email email@example.com and consider cc'ing firstname.lastname@example.org
Git Logo and Icon by Dylan Beattie distributed under Creative Commons Attribution-ShareAlike 3.0 Unported License.