April 19, 2017 Posted in git

What's in a Branch

Regardless of how you choose to track your history, one of the things you often want to know is which commits are in what branch. Sounds easy enough, right? And yet, you wouldn't believe just how cumbersome certain version control systems make answering such a simple question. What I think you'll find even harder to believe, however, is the fact that with Git it's as easy as pie.

Graphs and References

Before I tell you all about querying the state of your branches, let’s back up for a second and remind ourselves of how Git views history.

Consider this graph:

Directed acyclic graph

What you’re seeing here is a directed acyclic graph: a fancy name used to describe a group of nodes (graph) where the edges point to a certain direction (directed) and never loop back on themselves (acyclic).

Why is it relevant? Because this is how Git represents history.

In Git’s parlance, each node represents a commit and each commit has exactly one edge that connects it to its parent. In other words, the directed acyclic graph of a Git history can only go in one direction: backwards.

So far, so good. Now let’s add one more piece of information to the mix:


See that master label? That’s a branch. Branches are simply references that point to specific commits. In fact, a branch is a 41 bytes text file that contains the ID of the commit it references. Don’t believe me? Try running this command in the root of your repository:1

cat .git/refs/heads/master

You’ll get back something like this:


That 41 characters string is the SHA-1 hash of the commit object that’s currently referenced by the master branch. Go ahead, verify it with:2

git show 514e6c9

Hopefully, you’ll believe me now. So, let’s boil it all down to a single sentence to make it easier to remember:

In Git, a branch is a reference to the latest commit in a sequence; the history of a branch is reconstructed starting from that latest commit going backwards, following the chain of parents.


Now that we have a good mental model for thinking about history, we can talk about the concept of reachability.

Imagine we have a history that looks like this:


Here, we have two branches named master and feature that diverge on commit B. We can immediately observe two things at first glance:

  • The feature branch contains commits E and D which are not in master.
  • The master has commit C that’s not in feature.

Sure, it’s easy enough to tell when your history is this small—and you have a pretty graph to look at—but it might not be as obvious once you deal with more than two branches and a large number of commits.3

But don’t despair: everything becomes much clearer once you start thinking in terms of commits and what is reachable from which branch. Let me explain:

A commit A is said to be reachable from another commit B if there exists a contiguous path of commits that lead from B to A.

In other words, A is reachable from B if you can start from B and arrive at A just by following the chain of parents.

Easy, right? Now, combine this concept with the notion that branches are just references to commits and you have all the pieces you need to solve the puzzle!

Reachability is a powerful concept because it allows us to take our initial question:

Which commits are in a branch?

and turn it into:

Which commits are reachable from a branch and not from another?

Git has a way to express this: it’s called the double dot notation. Consider this command:

git log --oneline master..feature
9b571c2 E
fa77581 D

This literally means: show me the commits that are not reachable from the first reference in the range (master) but that are reachable from the second reference (feature). The results is commits E and D:

Reachable from feature

Observe what happens when we switch places between the two branch references:

git log --oneline feature..master
2eec656 C

That’s right, we get commit C, that is the commit not reachable from feature but reachable from master:

Reachable from master

This expression is so useful that I even made an alias for it:

git config --global alias.new "log master..HEAD"

Now, every time I want to know which commits are in my current branch (referenced by HEAD) that I haven’t yet merged into master, I simply say:

git new

What Was Merged?

If your workflow involves a lot of merge commits (like GitFlow), one of the questions that will pop up a lot is:

Which commits were brought into a branch by a specific merge?

To answer that, let’s consider our two sample branches; this time, we’re going to merge feature feature into master:

Merged feature into master

Let’s play a bit of Jeopardy4: if the answer is commits E and D, what’s the Git command? Remember, we don’t have a pretty graph to look at; all we have is the console and the concept of reachability that we talked about before. Give it some thought. Can you guess it?

Let me give you a hint. Another way of phrasing the question we’re looking for is:

Which commits were not reachable from master before the merge commit but are reachable now?

Considering that the first parent of a merge commit is always the destination branch—that is the branch that was merged to—one way to express that would be:

git log --oneline M^..M
cad1c97 M
9b571c2 E
fa77581 D

This is saying: show me the commits that are not reachable from the first parent of the merge commit M (that is C) but that are reachable from M.

What was merged into master

As you would expect, we get back M itself followed by E and D, that is the commits merged into master 🎉

This expression is so common that it even has a shorter—albeit more unreadable—version as of Git 2.11:

git log M^-1

Just when you thought Git commands couldn’t get any more cryptic, right? Anyway, this is the equivalent of M^..M where ^-1 refers to the first parent of M.

Of course, we don’t have to limit ourselves to just the list of commits. If we wanted, you could also get a patch containing the collective changes that got merged into master by saying:

git diff M^-1

Git’s syntax might be ridiculously opaque at times, but finding out what’s in a branch is easier than ever thanks to Git’s intuitive branching model.

Was this helpful? If you like, you can find even more ways to slice and dice the history of your Git repository in my Pluralsight course Advanced Git Tips and Tricks.

  1. If you’re on Windows and don’t use Bash, you can replace that with: notepad .git\refs\heads\master

  2. You don’t have to use the entire SHA-1 hash here; just enough for Git to tell which object it belongs to. For most repositories, the first 7 characters are enough to uniquely identify an object. Git calls this the abbreviated hash. 

  3. Actually, it doesn’t take much before this happens: imagine a typical GitFlow scenario where you have multiple feature and bugfix branches running in parallel and you need to tell which commits are available in develop and which aren’t. 😰 

  4. I’ll tell you the answer and you’ll have to guess the question. 

August 25, 2016 Posted in git

Git Undo

Tell me if you recognize this scenario: you’re in the middle of rewriting your local commits when you suddenly realize that you have gone too far and, after one too many rebases, you are left with a history that looks nothing like the way you wanted. No? Well, I certainly do. And when that happens, I wish I could just CTRL+Z my way back to where I started. Of course, it’s never that simple — not even in a GUI.

It was in one of those moments of despair that I finally decided to set out to create my own git undo command. Here’s what I came up with and how I got there.

The Reflog

My story of undoing things in Git starts with the reflog. What’s the reflog, you might ask. Well, I’m here to tell you: every time a branch reference moves1 Git records its previous value in a sort of local journal. This journal is the called the reference log — or reflog for short.

In a repository there is a reflog for each branch as well as a separate one for the HEAD reference.

Getting the list of entries in a branch’s reflog is as easy as saying git reflog followed by the name of the branch:

git reflog master

shows the reflog entries for the master branch:

Output of git-reflog for the master branch

If you instead wanted to look at HEAD’s own reflog, you would simply omit the argument and say:

git reflog

which yields the same output, only for the HEAD reference:

Output of git-reflog for the HEAD reference

What isn’t immediately obvious is that the entries in the reflog are stored in reverse chronological order with the most recent one on top.

What is obvious, on the other hand, is that each entry has its own index. This turns out to be extremely useful, because we can use that index to directly reference the commit associated to a certain reflog entry. But more on that later. For now, suffice it to say that in order to reference a reflog entry, we have to use the syntax:

where the two parts separated by the @ sign are:

  • reference which can either be the name of a branch or HEAD
  • index which is the entry’s position in the reflog2

For example, let’s say that we wanted to look at the commit HEAD was referencing two positions ago. To do that, we could use the git show command followed by [email protected]{2}:

git show [email protected]{2}

If we, instead, wanted to look at the commit master was referencing just before the latest one we would say:

git show [email protected]{1}

The Undo Alias

Here’s my point: the reflog keeps track of the history of commits referenced by a branch, just like a web browser keeps track of the history of URLs we visit.

This means that the commit referenced by @{1} is always the commit that was referenced just before the current one.

If we were to combine the reflog with the git reset command like this:

git reset --hard [email protected]{1}

we would suddenly have a way to move HEAD, the index and the working directory to the previous commit referenced by a branch. This is essentially the same as pressing the back button in our web browser!

At this point, we have everything we need to implement our own git undo command, which we do in the form of an alias. Here it is:

git config --global alias.undo '!f() { \
    git reset --hard $(git rev-parse --abbrev-ref HEAD)@{${1-1}}; \
}; f'

I realize it’s quite a mouthful so let’s break it down piece by piece:

  1. !f() { ... } f
    Here, we’re defining the alias as a shell function named f which is then invoked immediately.

  2. $(git rev-parse --abbrev-ref HEAD)@{...}
    We use the git rev-parse command followed by the --abbrev-ref option to get the name of the current branch, which we then concatenate with @{...} to form the reference to a previous position in the reflog (e.g. [email protected]{1}).

  3. ${1-1}
    We specify the position in the reflog as the first parameter $1 with a default value of 1. This is the whole reason why we defined the alias as a shell function: to be able to provide a default value for the parameter using the standard Bash syntax.

The beauty of using an optional parameter like this, is that it allows us to undo any number of operations. At the same time, if we don’t specify anything, it’s going to undo the just latest one.

Trying It Out

Let’s say that we have a history that looks like this:3

History before the rewrite

We have two branches — master and feature — that have diverged at commit C. For the sake of our example, let’s also assume that we wanted to remove the latest commit in master — that is commit F — and then merge the feature branch:

git reset --hard HEAD^
git merge feature

At this point, we would end up with a history looking like this:

History after the rewrite

As you can see, everything went fine — but we’re still not happy. For some reason, we want to go back to the way history was before. In practice, this means we need to undo our latest two operations: the merge and the reset. Time to whip out that undo alias:

git undo 2

This moves HEAD to the commit referenced by [email protected]{2} — that is the commit the master branch was pointing to 2 reflog entries ago. Let’s go ahead and check our history again:

History restored with the undo alias

And everything is back the way it was. \o/

But what if wanted to undo the undo? Easy. Since git undo itself creates an entry in the reflog, it’s enough to say:

git undo

which, without argument, is the equivalent of saying git undo 1.

Did you find this useful? If you're interested in learning other techniques like the one described in this article, I wrote down a few more in my Pluralsight course Advanced Git Tips and Tricks.

  1. That is, it’s modified to point to a different commit than it did before. 

  2. You can also use dates here. Try for example [email protected]{yesterday} or [email protected]{2.days.ago} — pretty amazing, don’t you think? 

  3. I like my history succinct and colorful. For this reason, I never use the plain git log; instead, I define an alias called lg where I use the --pretty option to customize its output. If you want to know more, I wrote about this a while ago when talking about the importance of a good-looking history

June 20, 2016 Posted in musings

On Being a Good .NET Developer

While reading Rob Ashton’s thought-provoking piece titled “Why you can’t be a good .NET developer” over my morning cappuccino the other day, for the first few paragraphs I found myself nodding in agreement.

Having been a consultant for the past fifteen years, I’ve certainly come across more than a few teams where the “lowest common denominator” was without a doubt the driving force behind every decision. This isn’t in any way unique to .NET, though. I have seen the exact same thing happen in other platforms as well: Java, JavaScript and — to some degree — even C, C++1.

What they all have in common is a humongous active user base.

You see, it’s simply a matter of statistics: the more popular the platform2, the higher the number of beginners. The two variables are directly proportional to each other — some might argue even exponential. If you’re looking for a concrete example, consider the amount of novice JavaScript developers brought in by the popularity of jQuery.

The problem is not that .NET has an unusually high number of “lowest common denominators”. That number is simply higher compared to platforms with a narrower, mostly self-selected, audience.

The problem — and this is where I disagree with the underlying message in that article — is failing a platform based on the number of inexperienced programmers who work with it.

I also don’t think that fleeing is the right way to handle the situation. I don’t know about you, but I like to apply the Boy Scout Rule in more than just code; when I join a team, I want to leave it in better shape than I found it. This means that if I join a team who is dominated by inexperienced programmers, I don’t see it as an excuse to hold back on quality. Quite the opposite, I feel compelled to introduce the team to new ways of doing things, new perspectives. Note that I don’t force anything on anyone; instead, I try to lead by example.

For instance, if I see that the team is stuck using TFS, I will still use Git on my machine and add a bridge like git-tfs to collaborate. Sooner or later, without mistake, someone is going to wonder why I do that. Driven by curiosity, they’ll ask me to explain how Git is better than TFS and I’ll be more than happy to tell them all about it. After a while, that same person — or someone else on the team — is going to start using Git on their own machine and, soon enough, the entire team will be sitting in a console firing Git commands like there’s no tomorrow, wondering why they hadn’t learned it earlier.

I never compromise on excellence. It’s just that with some teams, the way to get there is longer than with others.

To me the solution isn’t to run away from beginners. It’s to inspire and mentor them so that they won’t stay beginners forever and instead go on to do the same for other people. That applies as much to .NET as it does to any other platform or language.

If you aren’t the type of person who has the time or the interest to raise the lowest common denominator, that’s perfectly fine. I do believe you’re better off moving somewhere else where your ambitions aren’t being held back by inexperienced team members. As for myself, I’ll stay behind — teaching.

  1. C and C++ have a steep learning curve which forces programmers to move past the beginner stage far more quickly than with other languages in order to get anything done. So, while C and C++ are immensely widespread, the number of novices who work with them tends to stay relatively low. 

  2. Just to be clear, by “platform” I mean a programming language together with its ecosystem of libraries, frameworks and tools.