First steps with git #
A few weeks ago I started to use git not only for tracking changes in my own private
repository but also for Mahout development and for reviewing patches. My setup probably is a bit unusual, so I thought,
I’d first describe that before diving deeper into the specifc steps.
Workflow to implement
With my
development I wanted to follow Mahout trunk very closely, integrating and merging any changes as soon as I continue to
work on the code. I wanted to be able to work with two different machines on the client side that are located at two
distinct physical locations. I was fine with publishing any changes or intermediate progress online.
The
tools used
I setup a clone of the official Mahout git repository on github as a place the check changes into
and as a place to publish my own changes.
On each machine used, I cloned this github repository. After that I
added the official Mahout git repository as upstream repository to be able to fetch and merge in any upstream
changes.
Command set
After cloning the official Mahout repository into my own github account, the
following set of commands was used on a single client machine to clone and setup the repository. See also the Github help on forking git repositories.
#clone the github
repository
git clone
git@github.com:MaineC/mahout.git
#add upstream to the local clone
git remote add
upstream git://git.apache.org/mahout.git
One additional piece of configuration that helped make life
easier was to setup a list of files and file patterns to be ignored by
git.
Each distinct changeset (be it code review, code style changes or steps towards own changes) would then
be done in their own branches locally. To share them with other developers as well as make them accessible to my second
machine I would use the following commands on the machine used for initial development:
#create the
branch
git branch MAHOUT-666
#publish the branch on github
git push origin MAHOUT-666
To
get all changes both from my first machine and from upstream into the second machine all that was needed
was:
#select correct local branch
git checkout trunk
#get and merge changes from upstream
git
fetch upstream
git merge upstream/trunk
#get changes from github
git fetch origin
git merge
origin/trunk
#get branch from above
git checkout -b MAHOUT-666 origin/MAHOUT-666
Of course
pushing changes into an Apache repository is not possible. So I would still end up creating a patch, submit that to
JIRA for review and in the end apply and commit that via svn. As soon as these changes finally made it into the
official trunk all branches created earlier were rendered obsolete.
What still makes me stick with git
especially for reviewing patches and working on multiple changesets is it’s capability to quickly and completely
locally create branches. This feature totally changed my so-far established workflow for keeping changesets
separate:
With svn I would create a separate checkout of the original repository from a remote server, make my
changes or even just apply a patch for review. To speed things up or be able to work offline I would keep one svn
checkout clean, copy that to a different location and only there apply the patch.
In combination with using an
IDE this workflow would result in me having to re-import each different checkout as a separate project. Even though
both Idea and Eclipse are reasonably fast with importing and setting up projects it would still cost some
time.
With git all I do is one clone. After that I can locally create branches w/o contacting the server again.
I usually keep trunk clean from any local changes - patches are applied to separate branches for review. Same happens
to any code modifications. That way all work can happen when disconnected from the version control server.
When
combined with IntelliJ Idea fun becomes even greater: The IDE regularly scans the filesystem for updated files. So
after each git checkout I’ll find the IDE automatically adjust to the changed source code - that way avoiding project
re-creation. Same is of course possible with Eclipse - it just involves one additional click on the Refresh
button.
For me git helped speed up my work processes and supported use cases that otherwise would have involved
sending patches to and fro between separate mailboxes. That way work with patches and changeset seemed way more natural
and better supported by the version control system itself. In addition it of course is a great relief to be able to
checkin, diff, log, checkout etc. even when disconnected from the network - which for me still is one of the biggest
advantages of any distributed version control system.
Update
Lance Norskog recently pointed
out one more step that is helpful:
You didn’t mention how to purge your project branch out of the github fork. From http://help.github.com/remotes/: Deleting a remote branch or tag
This command is a bit arcane at first glance… git push REMOTENAME :BRANCHNAME. If you look at the advanced push syntax above it should make a bit more sense. You are literally telling git “push nothing into BRANCHNAME on REMOTENAME”. And, you also have to delete the branch locally also.