January 19, 2012 Posted in programming

Better Diffs with PowerShell

I love working with the command line. In fact, I love it so much that I even use it as my primary way of interacting with the source control repositories of all the projects I’m involved in. It’s a matter of personal taste, admittedly, but there’s also a practical reason for that.

Depending on what I’m working on, I regularly have to switch among several different source control systems. Just to give you an example, just in the last six months I’ve been using Mercurial, Git, Subversion and TFS on a weekly basis. Instead of having to learn and get used to different UIs (whether it be standalone clients or IDE plugins), I find that I can be more productive by sticking to the uniform experience of the command line based tools.

To enforce my point, let me show you how to check in some code in the source control systems I mentioned above:

  • Mercurial: hg commit -m "Awesome feature"
  • Git: git commit -m "Awesome feature"
  • Subversion: svn commit -m "Awesome feature"
  • TFS: tf checkin /comment:"Awesome feature"

As you can see, it looks pretty much the same across the board.

Of course, you need to be aware of the fundamental differences in how Distributed Version Control Systems (DVCS) such as Mercurial and Git behave compared to traditional centralized Version Control Systems (VCS) like Subversion and TFS. In addition to that, each system tries to characterize itself by having its own set of features or by solving a common problem (like branching) in a unique way. However, there aspects must be taken into consideration regardless of your client of choice. What I’m saying is that the command line interface at least offers a single point of entry into those systems, which in the end makes me more productive.

Unified DIFFs

One of the most basic features of any source control system is the ability to compare two versions of the same file to see what’s changed. The output of such comparison, or DIFF, is commonly represented in text using the Unified DIFF format, which looks something like this:

@@ -6,12 +6,10 @@
-#import <SenTestingKit/SenTestingKit.h>
-#import <UIKit/UIKit.h>
-
@interface QuoteTest : SenTestCase {
}

- (void)testQuoteForInsert_ReturnsNotNull;
+- (void)testQuoteForInsert_ReturnsPersistedQuote;

@end

In the Unified DIFF format changes are displayed at the line level through a set of well-known prefixes. The rule is simple:

A line can either be added, in which case it will be preceded by a + sign, or removed, in which case it will be preceded by a - sign. Unchanged lines are preceded by a whitespace.

In addition to that, each modified section, referred to as hunk, is preceded by a header that indicates the position and size of the section in the original and modified file respectively. For example this hunk header:

@@ -6,12 +6,10 @@

means that in the original file the modified lines start at line 6 and continue for 12 lines. In the new file, instead, that same change starts at line 6 and includes a total of 10 lines.

True Colors

At this point, you may wonder what all of this has to do with PowerShell, and rightly so. Remember when I said that I prefer to work with source control from the command line? Well, it turns out that scrolling through gobs of text in a console window isn’t always the best way to figure out what has changed between two change sets.

Fortunately, since PowerShell allows to print text in the console window using different colors, it only took a switch statement and a couple of regular expressions, to turn that wall of text into something more readable. That’s how the Out-Diff cmdlet was born:

function Out-Diff {
<#
.Synopsis
    Redirects a Universal DIFF encoded text from the pipeline to the host using colors to highlight the differences.
.Description
    Helper function to highlight the differences in a Universal DIFF text using color coding.
.Parameter InputObject
    The text to display as Universal DIFF.
#>
[CmdletBinding()]
param(
    [Parameter(Mandatory=$true, ValueFromPipeline=$true)]
    [PSObject]$InputObject
)
    Process {
        $contentLine = $InputObject | Out-String
        if ($contentLine -match "^Index:") {
            Write-Host $contentLine -ForegroundColor Cyan -NoNewline
        } elseif ($contentLine -match "^(\+|\-|\=){3}") {
            Write-Host $contentLine -ForegroundColor Gray -NoNewline
        } elseif ($contentLine -match "^\@{2}") {
            Write-Host $contentLine -ForegroundColor Gray -NoNewline
        } elseif ($contentLine -match "^\+") {
            Write-Host $contentLine -ForegroundColor Green -NoNewline
        } elseif ($contentLine -match "^\-") {
            Write-Host $contentLine -ForegroundColor Red -NoNewline
        } else {
            Write-Host $contentLine -NoNewline
        }
    }
}

Let’s break this function down into logical steps:

  1. Take whatever input comes from the PowerShell pipeline and convert it to a string.
  2. Match that string against a set of regular expressions to determine whether it’s part of the Unified DIFF format.
  3. Print the string to the console with the appropriate color: green for added, red for removed and gray for the headers.

Pretty simple. And using it is even simpler: just load the script into your PowerShell session using dot sourcing or by adding it to your profile and redirect the output of a ‘diff’ command to the Out-Diff cmdlet through piping to start enjoying colorized DIFFs. For example the following commands:

. .\Out-Diff.ps1
git diff | Out-Diff

will generate this output in PowerShell:

The Out-Diff cmdlet in action The Out-Diff PowerShell cmdlet in action

One thing I’d like to point out is that even if the output of git diff consists of many lines of text, PowerShell will redirect them to the Out-Diff function one line at a time. This is called a streaming pipeline and it allows PowerShell to be responsive and consume less memory even when processing large amounts of data. Neat.

Wrapping up

PowerShell is an extremely versatile console. In this case, it allowed me to enhance a traditional command line tool (diff) through a simple script. Other projects, like Posh-Git and Posh-Hg, take it even further and leverage PowerShell’s rich programming model to provide a better experience on top of existing console based source control tools. If you enjoy working with the command line, I seriously encourage you to check them out.

Enrico Campidoglio

Hi, I'm Enrico Campidoglio. I'm a freelance programmer, trainer and mentor focusing on helping teams develop software better. I write this blog because I love sharing stories about the things I know. You can read more about me here, if you like.