January 19, 2012 Posted in programming

Better Diffs with PowerShell

I love working with the command line. In fact, I love it so much that I even use it as my primary way of interacting with the source control repositories of all the projects I’m involved in. It’s a matter of personal taste, admittedly, but there’s also a practical reason for that.

Depending on what I’m working on, I regularly have to switch among several different source control systems. Just to give you an example, just in the last six months I’ve been using Mercurial, Git, Subversion and TFS on a weekly basis. Instead of having to learn and get used to different UIs (whether it be standalone clients or IDE plugins), I find that I can be more productive by sticking to the uniform experience of the command line based tools.

To enforce my point, let me show you how to check in some code in the source control systems I mentioned above:

  • Mercurial: hg commit -m "Awesome feature"
  • Git: git commit -m "Awesome feature"
  • Subversion: svn commit -m "Awesome feature"
  • TFS: tf checkin /comment:"Awesome feature"

As you can see, it looks pretty much the same across the board.

Of course, you need to be aware of the fundamental differences in how Distributed Version Control Systems (DVCS) such as Mercurial and Git behave compared to traditional centralized Version Control Systems (VCS) like Subversion and TFS. In addition to that, each system tries to characterize itself by having its own set of features or by solving a common problem (like branching) in a unique way. However, there aspects must be taken into consideration regardless of your client of choice. What I’m saying is that the command line interface at least offers a single point of entry into those systems, which in the end makes me more productive.

Unified DIFFs

One of the most basic features of any source control system is the ability to compare two versions of the same file to see what’s changed. The output of such comparison, or DIFF, is commonly represented in text using the Unified DIFF format, which looks something like this:

@@ -6,12 +6,10 @@
-#import <SenTestingKit/SenTestingKit.h>
-#import <UIKit/UIKit.h>
-
@interface QuoteTest : SenTestCase {
}

- (void)testQuoteForInsert_ReturnsNotNull;
+- (void)testQuoteForInsert_ReturnsPersistedQuote;

@end

In the Unified DIFF format changes are displayed at the line level through a set of well-known prefixes. The rule is simple:

A line can either be added, in which case it will be preceded by a + sign, or removed, in which case it will be preceded by a - sign. Unchanged lines are preceded by a whitespace.

In addition to that, each modified section, referred to as hunk, is preceded by a header that indicates the position and size of the section in the original and modified file respectively. For example this hunk header:

@@ -6,12 +6,10 @@

means that in the original file the modified lines start at line 6 and continue for 12 lines. In the new file, instead, that same change starts at line 6 and includes a total of 10 lines.

True Colors

At this point, you may wonder what all of this has to do with PowerShell, and rightly so. Remember when I said that I prefer to work with source control from the command line? Well, it turns out that scrolling through gobs of text in a console window isn’t always the best way to figure out what has changed between two change sets.

Fortunately, since PowerShell allows to print text in the console window using different colors, it only took a switch statement and a couple of regular expressions, to turn that wall of text into something more readable. That’s how the Out-Diff cmdlet was born:

function Out-Diff {
<#
.Synopsis
    Redirects a Universal DIFF encoded text from the pipeline to the host using colors to highlight the differences.
.Description
    Helper function to highlight the differences in a Universal DIFF text using color coding.
.Parameter InputObject
    The text to display as Universal DIFF.
#>
[CmdletBinding()]
param(
    [Parameter(Mandatory=$true, ValueFromPipeline=$true)]
    [PSObject]$InputObject
)
    Process {
        $contentLine = $InputObject | Out-String
        if ($contentLine -match "^Index:") {
            Write-Host $contentLine -ForegroundColor Cyan -NoNewline
        } elseif ($contentLine -match "^(\+|\-|\=){3}") {
            Write-Host $contentLine -ForegroundColor Gray -NoNewline
        } elseif ($contentLine -match "^\@{2}") {
            Write-Host $contentLine -ForegroundColor Gray -NoNewline
        } elseif ($contentLine -match "^\+") {
            Write-Host $contentLine -ForegroundColor Green -NoNewline
        } elseif ($contentLine -match "^\-") {
            Write-Host $contentLine -ForegroundColor Red -NoNewline
        } else {
            Write-Host $contentLine -NoNewline
        }
    }
}

Let’s break this function down into logical steps:

  1. Take whatever input comes from the PowerShell pipeline and convert it to a string.
  2. Match that string against a set of regular expressions to determine whether it’s part of the Unified DIFF format.
  3. Print the string to the console with the appropriate color: green for added, red for removed and gray for the headers.

Pretty simple. And using it is even simpler: just load the script into your PowerShell session using dot sourcing or by adding it to your profile and redirect the output of a ‘diff’ command to the Out-Diff cmdlet through piping to start enjoying colorized DIFFs. For example the following commands:

. .\Out-Diff.ps1
git diff | Out-Diff

will generate this output in PowerShell:

The Out-Diff cmdlet in action The Out-Diff PowerShell cmdlet in action

One thing I’d like to point out is that even if the output of git diff consists of many lines of text, PowerShell will redirect them to the Out-Diff function one line at a time. This is called a streaming pipeline and it allows PowerShell to be responsive and consume less memory even when processing large amounts of data. Neat.

Wrapping up

PowerShell is an extremely versatile console. In this case, it allowed me to enhance a traditional command line tool (diff) through a simple script. Other projects, like Posh-Git and Posh-Hg, take it even further and leverage PowerShell’s rich programming model to provide a better experience on top of existing console based source control tools. If you enjoy working with the command line, I seriously encourage you to check them out.


December 15, 2011 Posted in programming  |  autofixture

Keep your unit tests DRY with AutoFixture Customizations

When I first incorporated AutoFixture as part of my daily unit testing workflow, I noticed how a consistent usage pattern had started to emerge. This pattern can be roughly summarized in three steps:

  1. Initialize an instance of the Fixture class.
  2. Configure the way different types of objects involved in the test should be created by using the Build method.
  3. Create the actual objects with the CreateAnonymous or CreateMany methods.

As a result, my unit tests had started to look a lot like this:

[Test]
public void WhenGettingAListOfPublishedPostsThenItShouldOnlyIncludeThose()
{
    // Step 1: Initialize the Fixture
    var fixture = new Fixture();

    // Step 2: Configure the object creation
    var draft = fixture.Build()
        .With(a => a.IsDraft = true)
        .CreateAnonymous();
    var publishedPost = fixture.Build()
        .With(a => a.IsDraft = false)
        .CreateAnonymous();
    fixture.Register(() => new[] { draft, publishedPost });

    // Step 3: Create the anonymous objects
    var posts = fixture.CreateMany();

   // Act and Assert...
}

In this particular configuration, AutoFixture will satisfy all requests for IEnumerable types by returning the same array with exactly two Post objects: one with the IsDraft property set to True and one with the same property set to False.

At that point I felt pretty satisfied with the way things were shaping up: I had managed to replace entire blocks of boring object initialization code with a couple of calls to the AutoFixture API, my unit tests were getting smaller and all was good.

Duplication creeps in

After a while though, the configuration lines created in Step 2 started to repeat themselves across multiple unit tests. This was naturally due to the fact that different unit tests sometimes shared a common set of object states in their test scenario. Things weren’t so DRY anymore and suddenly it wasn’t uncommon to find code like this in the test suite:

[Test]
public void WhenGettingAListOfPublishedPostsThenItShouldOnlyIncludeThose()
{
    var fixture = new Fixture();
    var draft = fixture.Build()
        .With(a => a.IsDraft = true)
        .CreateAnonymous();
    var publishedPost = fixture.Build()
        .With(a => a.IsDraft = false)
        .CreateAnonymous();
    fixture.Register(() => new[] { draft, publishedPost });
    var posts = fixture.CreateMany();

    // Act and Assert...
}

[Test]
public void WhenGettingAListOfDraftsThenItShouldOnlyIncludeThose()
{
    var fixture = new Fixture();
    var draft = fixture.Build()
        .With(a => a.IsDraft = true)
        .CreateAnonymous();
    var publishedPost = fixture.Build()
        .With(a => a.IsDraft = false)
        .CreateAnonymous();
    fixture.Register(() => new[] { draft, publishedPost });
    var posts = fixture.CreateMany();

    // Different Act and Assert...
}

See how these two tests share the same initial state even though they verify completely different behaviors? Such blatant duplication in the test code is a problem, since it inhibits the ability to make changes. Luckily a solution was just around the corner as I discovered customizations.

Customizing your way out

A customization is a pretty general term. However, put in the context of AutoFixture it assumes a specific definition:

A customization is a group of settings that, when applied to a given Fixture, control the way AutoFixture will create anonymous instances of the types requested through that Fixture.

What that means is that I could take all the boilerplate configuration code produced during Step 2 and move it out of my unit tests into a single place, that is a customization. That allowed me to specify only once how different objects needed to be created for a given scenario, and reuse that across multiple tests.

public class MixedDraftsAndPublishedPostsCustomization : ICustomization
{
    public void Customize(IFixture fixture)
    {
        var draft = fixture.Build()
            .With(a => a.IsDraft = true)
            .CreateAnonymous();
        var publishedPost = fixture.Build()
            .With(a => a.IsDraft = false)
            .CreateAnonymous();
        fixture.Register(() => new[] { draft, publishedPost });
    }
}

As you can see, ICustomization is nothing more than a role interface that describes how a Fixture should be set up. In order to apply a customization to a specific Fixture instance, you’ll simply have to call the Fixture.Customize(ICustomization) method, like shown in the example below. This newly won encapsulation allowed me to rewrite my unit tests in a much more terse way:

[Test]
public void WhenGettingAListOfDraftsThenItShouldOnlyIncludeThose()
{
    // Step 1: Initialize the Fixture
    var fixture = new Fixture();

    // Step 2: Apply the customization for the test scenario
    fixture.Customize(new MixedDraftsAndPublishedPostsCustomization());

    // Step 3: Create the anonymous objects
    var posts = fixture.CreateMany();

    // Act and Assert...
}

The configuration logic now exists only in one place, namely a class whose name clearly describes the kind of test data it will produce. If applied consistently, this approach will in time build up a library of customizations, each representative of a given situation or scenario. Assuming that they are created at the proper level of granularity, these customizations could even be composed to form more complex scenarios.

Conclusion

Customizations in AutoFixture are a pretty powerful concept in of themselves, but they become even more effective when mapped directly to test scenarios. In fact, they represent a natural place to specify which objects are involved in a given scenario and the state they are supposed to be in. You can use them to remove duplication in your test code and, in time, build up a library of self-documenting modules, which describe the different contexts in which the system’s behavior is being verified.


September 06, 2011 Posted in programming  |  autofixture

Behavior changes in AutoFixture 2.2 – Anonymous numbers

Now that AutoFixture 2.2 is approaching on the horizon, it’s a good time to start talking about some of the changes that were made to the underlying behavior of some existing APIs. I’ll start off this series of posts by focusing on the new generation strategy for anonymous numbers.

The good old fashioned way

Before I jump into the details of what exactly has been changed and how, allow me to set up a little bit of stage:

A key part of AutoFixture’s mission statement is to make the process of authoring unit tests faster by providing an easy way of creating test values (or “specimens“) for the variables involved in the test. The goal of providing values that are as neutral as possible to the test scenario at hand is achieved by employing “constrained non-deterministic” generation algorithms.

Put in simple terms, this essentially means that AutoFixture will come up with test values at run time that can be considered “random” within some predefined bounds. These bounds are imposed at the lowest level by the variable’s own data type: a string is a string, a number is a number and so on. More constraints, however, can be added at a higher level, based on any semantics the variable may have in the specific test scenario. For example a string can’t be longer than 20 characters or a number must be between 1 and 100.

AutoFixture comes with a set of built-in generation algorithms that can produce test values for all the primitive types included in the .NET Framework. The algorithm for numeric types has historically been based on individually incremented sequences, one for each numeric data type. Let’s look at an example that illustrates this:

var fixture = new Fixture();
Console.WriteLine("Byte specimen is {0}, {1}",
    fixture.CreateAnonymous(),
    fixture.CreateAnonymous());
Console.WriteLine("Int32 specimen is {0}, {1}",
    fixture.CreateAnonymous(),
    fixture.CreateAnonymous());
Console.WriteLine("Single specimen is {0}, {1}",
    fixture.CreateAnonymous(),
    fixture.CreateAnonymous());

// The output will be:
// Byte specimen is 1, 2
// Int32 specimen is 1, 2
// Single specimen is 1, 2

The key point here is that AutoFixture will only guarantee unique numeric specimens within the scope of a specific data type. Now, you may wonder how this would be a problem. Well, it certainly isn’t in itself, but if you asked AutoFixture to give you an anonymous instance of a class with multiple properties of different numeric types, you would get something like this:

public class NumericBag
{
    public byte ByteValue { get; set; }
    public int Int32Value { get; set; }
    public float SingleValue { get; set; }
}

var fixture = new Fixture();
var specimen = fixture.CreateAnonymous();
Console.WriteLine("ByteValue property is {0}", specimen.ByteValue);
Console.WriteLine("Int32Value property is {0}", specimen.Int32Value);
Console.WriteLine("SingleValue property is {0}", specimen.SingleValue);

// The output will be:
// ByteValue property is 1
// Int32Value property is 1
// SingleValue property is 1

We can agree that the end result doesn’t exactly live up to the expectation of anonymous values being “random”. Starting from version 2.2, however, this behavior is due to change.

The fresh new way

AutoFixture has taken a different approach to numeric specimen generation and will now by default return unique values across all numeric types. Running our first example in AutoFixture 2.2 will therefore yield a very different result:

var fixture = new Fixture();
Console.WriteLine("Byte specimen is {0}, {1}",
    fixture.CreateAnonymous(),
    fixture.CreateAnonymous());
Console.WriteLine("Int32 specimen is {0}, {1}",
    fixture.CreateAnonymous(),
    fixture.CreateAnonymous());
Console.WriteLine("Single specimen is {0}, {1}",
    fixture.CreateAnonymous(),
    fixture.CreateAnonymous());

// The output will be:
// Byte specimen is 1, 2
// Int32 specimen is 3, 4
// Single specimen is 5, 6

In other words, AutoFixture is being a little more “non-deterministic” when it comes to numeric test values. Take for example the following scenario:

public class NumericBag
{
    public byte ByteValue { get; set; }
    public int Int32Value { get; set; }
    public float SingleValue { get; set; }
}

var fixture = new Fixture();
var specimen = fixture.CreateAnonymous();
Console.WriteLine("ByteValue property is {0}", specimen.ByteValue);
Console.WriteLine("Int32Value property is {0}", specimen.Int32Value);
Console.WriteLine("SingleValue property is {0}", specimen.SingleValue);

// The output will be:
// ByteValue property is 1
// Int32Value property is 2
// SingleValue property is 3

See how all the numeric properties on the generated object have different values? That’s what I’m talking about.

Now, in theory, this shouldn’t be considered a breaking change. I say this because AutoFixture is all about anonymous variables, which, by definition, can’t be expected to have specific values during a test run. So, as long as you’ve played by this rule, the new behavior shouldn’t impact any of your existing tests.

However, if this does turn out to be a problem or you simply prefer the old way of doing things, you shouldn’t feel left out in the cold. The previous behavior is still in the box, packaged up in a nice customization unambiguously named NumericSequencePerTypeCustomization. The simple act of adding it to a Fixture instance will restore things the way they were:

var fixture = new Fixture();
fixture.Customize(new NumericSequencePerTypeCustomization());

If you wish to try this out today, I encourage you to go head and grab the latest build off of AutoFixture’s project page on TeamCity. Enjoy.