Tuesday, 14 October 2008

Git: on rebasing

(This is a follow-up to How remotes work).

We've seen how git manages to merge your local changes when you pull from a remote repository.

This approach has a small downside, aesthetically: that is, the creation of a large number of merge commits, making the history more difficult to read. Wouldn't it be nice if git offered you the possibility to simply re-apply your local changes on top of what you just pulled?

Rejoice, because that's what the git rebase command is for.

git rebase origin/master will take the local commits (that are reachable from the master head, but not from origin/master), remove them from the commit tree, and re-apply them on top of origin/master, before moving the master head to the top of the new line of commits it just created. That way, the history is kept linear:

You can afterwards just push your new commits.

Important warning: git rebase changes your commits. Because their place in the tree will be different, their SHA1 will be different as well; and the old ones will disappear. For that reason, you must not manipulate commits with rebase if you have already published them in a shared repository from which someone else might have fetched.

Rebasing is a powerful tool that will enable you to manipulate your branches, moving lines of commits from one location to another. The git-rebase man page has more examples.


Aristotle said...

There is one imprecision in your explanation that won’t be obvious to someone new to git: rebase does not actually remove any commits. The only commit series is still in the repository, it’s just that no named branch is pointing at it. You can retrieve the SHA-1 of your old tip using the reflog, f.ex., and then reset your branch to that commit, and like magic you’re back in your previous state.

Git almost never removes data from the repository if you haven’t explicitly told it to do so.

Aristotle said...

s/the only commit series/the old commit series/

Rafael said...

Aristotle, that's right, the commits need to be garbage collected to be really disposed of. But that's an implementation detail :)

Also, I forgot to mention that the git pull --rebase command does a fetch followed by a rebase, instead of a fetch followed by a merge (which is the default). That's what you get when you try publishing in a hurry a post that you half-wrote over several evenings...

Aristotle said...

I don’t regard it as an implementation detail. It is very much by design, and the fact that my data is always there for recovery as long as I look in the right places has saved my bacon on more than a couple occasions.

Also, I think git is most easily understood from the internals on up. If you simply regard it as a blackbox and try to transfer your Subversion or CVS know-how, you will end up very very confused. The Peepcode git book is very helpful here – it costs US$9, but that’s not much and it’s money well spent. Maybe the community book being written over at git-scm.com (same principal author) will eventually match it.

Having this level of understanding of git, which is feasible because the fundamental model is so simple in its design, has also made me feel much safer and more confident than I ever could with Subversion. I can just go ahead and try things without fear.