Migrating to Git, Part 3: Moving your code to Git
September 11, 2012
This is the third and final part in a three part series about migrating to Git. I recommend reading part one and part two if you haven’t already. Migrating to Git is worth it, but it’s not a trivial decision.
Migrating to Git, Part 1: Advantages
Migrating to Git, Part 2: Prerequisites
If you’re ready to proceed with a migration to Git, congratulations! You’re about to start using one of the best development tools available today.
Anyone who has worked with me knows I’m a big fan of checklists, and the transition to Git is non-trivial enough to warrant a solid understanding of the questions that should be answered before pulling the trigger on migration.
1. Do you need to maintain commit history?
If you’re moving from SVN to Git, the migration to Git is greatly simplified if you don’t need to retain revision history. Otherwise, you’ll need to explore SubGit or git-svn.
Some of the steps required for git-svn are:
- Retrieve a list of SVN committers and transform them to Git users (name, e-mail)
- Clone the SVN repo using git-svn
- Convert svn:ignore to .gitignore
- Push the SVN repo to a bare Git repo
- Rename “trunk” to “master”
- Clean up branches and tags
These aren’t the most time consuming tasks, but they’re not quick either. If you’re unconcerned about porting over history, I highly recommend skipping most of the steps above. We simply kept historical data in FishEye and instructed developers to query it if they had any questions about pre-Git revisions. Other version control systems will have their own tools to help with the migration, but converting each different VCS is beyond the scope of this series.
2a. Install Git and other dependencies on the central Git server
At a bare minimum you’ll need to install Git. I’m assuming you’ll be using a Linux server, so this is an easy step. You may also need to install an up-to-date version of Java or Ruby (depending on what repository management tool you choose below).
It’s easy to initially set up Git by hand, but it becomes more and more of a burden to maintain a central Git repository without an easy-to-use management tool. Popular repository management tools include:
My personal favourite is GitHub Enterprise, but the cost is hard to justify. GitHub Enterprise would cost our team of over 20 developers $10,000 per year in licensing. Ouch.
We’re using Atlassian Stash at the moment instead, and while it’s frustratingly bare-bones, it provides the bare minimum of what we need and is much cheaper. We’re banking on it gaining features as it matures.
Kando adds workflow functionality, similar to the concept of ClearCase swim-lanes. Depending on your existing workflow it may be too prescriptive for your needs or it may be exactly what you need.
msysgit is a no brainer for those in a Windows environment. This should be installed on all developer workstations before the migration, and each developer should create a local repository as a test to make sure it’s set up properly. This will help to avoid any day-of-migration surprises that will kill velocity.
Other helpful tools include:
- An advanced three-way diff and merge tool such as Beyond Compare
- A UI tool such as
Our developers are in a Windows environment, and we use Beyond Compare, TortoiseGit, and bash. For the most part we’re happy, but I’ve also heard good things about SmartGit. I’ve also heard amazing things about Tower, and I would not hesitate to switch to it if our developers used OSX.
It’s worth the time to stop and identify areas of weakness in your current process. Opportunities for improvement may exist with support from the right choice of tools. This is an especially good opportunity to identify a solid code review tool, which will also force the team to decide on a post-commit or pre-commit workflow.
- Post-commit workflow: Review code after it has been pushed and merged to the development branch
- Pre-commit workflow: Review code before it is merged
- Pull requests rather than pushing are an example of a pre-commit workflow
Some decent code review tools for Git include:
As of version 1.3 Stash supports pull requests, which are an excellent choice for code reviews. GitHub Enterprise and GITLAB also support pull requests, so it’s not always necessary to augment your Git management tool with an external review tool. You should evaluate each and decide which works the best for your team. One thing to keep in mind is that pull requests only help to review code that has yet to be merged.
3. SSH authentication
Developers will most likely authenticate with remote Git repositories using SSH. This will require all developers to generate a private/public SSH key pair and add their public key to the Git server (in the authorized_keys file or adding the public key to the Git repository management tool). Using a repository management tool like GitHub Enterprise will make maintaining SSH keys a much more streamlined process rather than manually adding them to the authorized_keys file.
Custom scripts with SVN references should be identified as these will obviously break after the migration. This is typically the most time consuming task if a lot of SVN scripts and hooks exist as part of the current development workflow.
The migration itself is fairly straightforward if you aren’t preserving SVN history. If you need to port history this list will look slightly more complicated and you should look into using SubGit.
1. Design a new repository structure
We had a single large SVN repository with every single application in our organization. Rather than port our monolithic SVN repository to Git, we decided to break it into smaller repositories that better fit with the conceptual and logical view of our applications.
We created a separate repository for our:
- Application code
- Application configuration artefacts
- Back-end processing jobs
- Static web resources and Apache configuration
2. Create the new Git repositories locally
We grabbed the latest version of our code from SVN and split our monolithic repository into four separate local folders. We had to clean the SVN kruft from each folder, which was incredibly fun. After all of this was done we initialized each folder as a new Git repository.
3. Add content to Git before building applications
We added all of our files to Git first before performing a full-build or even importing our applications into an IDE. This made it easy to create our .gitignore files, because we performed a commit of what’s in SVN and nothing else (such as compiled resources or IDE project configuration files).
4. Build applications and configure .gitignore
Perform a full build of each application after you have all of your files committed to Git. You’ll immediately be able to identify all the generated build artefacts that shouldn’t be versioned. These artefacts can safely be added to .gitignore. Remember to add and commit your .gitignore file(s) too!
This is arguably one of the most time consuming tasks of the migration. For instance, Jenkins/Hudson users will need to update all of their scripts to pull from Git instead of SVN. All custom SVN-related scripts will need to be updated.
6. Create central repositories and push local commits
If you’re using a tool like Stash or GitHub Enterprise, you’ll manage the creation of new repositories through the tool’s UI. Otherwise, you will need to create bare repositories on the Git server.
7. Every developer should clone, commit, push, pull
It’s fairly important for all developers to clone the repo(s) right away and push code to identify any issues that may have cropped up during the migration. It’s recommended that the person leading the migration create a throw-away branch and instruct all developers to check it out and perform a few smoke-test changes.
At the very least the entire team should be able to:
- Build their environments locally
- Switch branches
- Make a small throw-away change
- Push their change
- Pull everyone else’s changes
- Switch branches back to the master (or release) branch
If you’re feeling brave, you can even set things up so a change forces merge conflicts. This will get the team used to using the difftool and mergetool commands.
8. Application(s) should be built in a test environment and shaken-down
Once developers are fairly happy that they’re set up and ready to go, the version of code now on the central repositories should be built and deployed to test environments. Presumably this will be done using a continuous integration tool and the fancy new Git scripts that were updated during pre-migration.
Most of the issues that come up during post-migration are related to the rationalization for switching to Git in the first place. During our transition we experienced very few technical issues with our code, but we experienced the natural frustration of learning a new tool and a new development process all at the same time.
Be prepared to deal with:
- Build script failures after the modification of SVN scripts to support Git
- Learning curve of using Git for the first time
- Resistance from fans of the status quo
- Questions from management as to the return on investment
Is it worth it? Absolutely. After the initial learning curve with Git, the smiles per minute on our team has gone up significantly.