November 10, 2009 Posted in programming | sql

Speed up your queries in SQL Server by being declarative

It’s interesting how a lot of the work I’ve been doing lately has in some way involved a kind of performance tuning. Previously I’ve talked about how to increase the performance of .NET applications by using delegates instead of reflection in code that runs frequently.

This time it is all about performance in database processing.

The scenario

Imagine an application that manages a wait list. Users of this application put themselves in line and wait for their turn to gain access to some kind of shared resource. Here are the basic rules of this system:

The same user can appear in the wait list multiple times, once for every resource she is queuing for.
The users’ position in the wait list at any given time is decided by a score.
This score is calculated based on the number of credits each user has in the system compared to the amount required by the resource they wish to access.

Let’s say that this wait list is modeled in a Microsoft SQL Server database with the following schema:

WaitListSchema

The position of the different users in the wait list is periodically updated by a Stored Procedure that calculates the current score for each and every row in the WaitList table.

So far so good. Now, imagine that this WaitList table contains somewhere around 30 millions rows, and the Stored Procedure that updates all of the scores takes about 9 hours to complete. And now we have problem.

The imperative SQL approach

Before going into all kinds of general database optimization techniques, let’s start off by looking at how that Stored Procedure is implemented.

Here is a slightly simplified version of it:

CREATE PROCEDURE CalculateWaitListScores_Imperative
AS
BEGIN

DECLARE @rowsToCalculate INT

SELECT @rowsToCalcualte = COUNT(*)
FROM WaitList
AND Score IS NULL

WHILE ( @rowsToCalculate > 0 )
BEGIN

  DECLARE @userID INT
  DECLARE @resourceID INT
  DECLARE @score INT

  SELECT TOP 1 @userID = UserID, @resourceID = ResourceID
  FROM WaitList
  AND Score IS NULL

  -- The actual calculation of the score is omitted for clarity.
  -- Let's just say that it involves a SELECT query that joins
  -- the [WaitList] table with the [User] and [Resource] tables
  -- and applies a formula that associates the values
  -- of the [Credit] columns in each of them.
  -- For the sake of this example we just set it to a constant value
  SET @score = 150

  UPDATE WaitList
  SET Score = @score
  WHERE UserID = @userID
  AND ResourceID = @resourceID
  AND Score IS NULL

  SELECT @rowsToCalcualte = COUNT(*)
  FROM WaitList
  AND Score IS NULL

END

END

If you aren’t into the Transact-SQL language syntax, let me spell out the algorithm for you:

Get the number of rows in the WaitList table where the score has never been calculated
If there are any such rows, get the user and the resource IDs for the first row in the WaitList table where the score has never been calculated
Calculate the score for that user and resource
Update the score with the newly calculated value
Go to Step 1

In the worst case, this set of operations will be repeated 30 millions times, that is once for every row in the WaitList table. Think about it for a moment.

While looking at this code, I immediately imagined this dialogue taking place between SQL Server and the developer(s) who wrote the Stored Procedure:

Developer: Listen up, SQL Server. I want you to calculate a new score and update all of those 3o millions rows, but do it one row at a time.

SQL Server: That’s easy enough, but I’m pretty sure I can find a faster way to do this, if you’ll let me.

Developer: No, no. I want you to do exactly what I said. That way it’s easier for me to understand what’s going on and debug if any problem occurs.

SQL Server: Alright, you’re the boss.

Jokes aside, the bottom line here is this:

By implementing database operations in an imperative manner, you effectively tie up the hands of the query execution engine, thus preventing it from performing a number of optimizations at runtime in order to speed things up.

And that basically means trading performance and scalability for more fine-grained control.

The declarative SQL approach

Let’s see if we can make this Stored Procedure run any faster, by changing our approach to the problem altogether.

This time, we’ll tell the database what we want done in a declarative manner, and we’ll let the query execution engine figure out the best way to get the job done.

Here is a rewritten version of the original Stored Procedure:

CREATE PROCEDURE CalculateWaitListScores_Declarative
AS
BEGIN

UPDATE WaitList
SET Score = dbo.CalculateScore(UserID, ResourceID)
WHERE Score IS NULL

END

What we did is basically removing the explicit loop and merging all operations into a single UPDATE statement executed on the WaitList table, which invokes a custom a scalar function named CalculateScore to calculate the score with the value of the current row.

Now, let’s look at some performance comparison:

WaitListPerfChart

That’s a pretty significant leap in speed. How is that possible? A look at the CPU usage on the database server while running the two versions of the Stored Procedure pretty much explains it all:

CPU usage while executing CalculateWaitListScores_Imperative:

CpuUsageWithImperativeSql

CPU usage while executing CalculateWaitListScores_Declarative:

CpuUsageWithDeclarativeSql

As you see, in the first picture the CPU is steadily at 9-10% and is basically using only one out of four available cores. This is because SQL Server is forced to do its work sequentially and has to wait until the score for the current row has been calculated and updated before proceeding to the next.

In the second picture, we are simply telling SQL Server our intent, rather than dictating exactly how it should be done. This allows SQL Server to parallelize the workload than can now be executed on multiple CPU/Cores at once leveraging the full power of the hardware.

Lessons learned

Here are a couple of getaways I learned from this exercise:

SQL is a declarative language at its core, designed to work with sets of rows. That’s what it does best and that’s how you should use it.
Whenever possible, try to avoid applying an imperative programming mindset when implementing database operations, even if the constructs available in SQL-derived languages like T-SQL make it easy to do so
Don’t be afraid to give up some control over what happens at runtime when your database code runs. Let the database find out the best way to do things, and get ready to experience some great performance improvements.

Hope this helps.

/Enrico

September 04, 2009 Posted in programming | .net

Improving performance with generic delegates in .NET

Lately I have been involved in the performance profiling work of a Windows client application, which customers had lamented to be way too slow for their taste.

The application was originally developed a couple of years ago on top of the .NET Framework 2.0. Its user interface is built using Windows Forms and it retrieves its data from a remote remote server through Web Services using ASMX.

Everything worked just fine from a functionality standpoint. However customers complained over long delays as data was being retrieved from the Web Services and slowly populated the widgets on the screen. Something had to be done to speed things up.

Reflection is a bottleneck

A look with a .NET profiler tool such as JetBrains DotTrace revealed that a lot of time was spent sorting large collections of objects by the value of one of their properties. This would typically be done before binding them to various list controls in the UI. The code would typically look like this and was spread out all over the code base:

// Retrieves the entire list of customers from the DAL
List customerList = CustomerDAO.GetAll();

// Sorts the list of 'Customer' objects
// by the value of the 'FullName' property
customerList.Sort(new PropertyComparer("FullName"));

// Binds the list to a ComboBox control for display
cmbCustomers.DataSource = customerList;

Apparently line 6 was the one that takes forever to execute. Now, since the sorting algorithm used in the IList.Sort method can’t be changed from outside the class, the weak link here must be the PropertyComparer. But what is it doing? Well, here it is:

using System;
using System.Collections.Generic;
using System.Reflection;

namespace Thoughtology.Samples.Collections
{
    public class PropertyComparer : IComparer
    {
        private string propertyName;

        public PropertyComparer(string propertyName)
        {
            this.propertyName = propertyName;
        }

        public int Compare(T x, T y)
        {
            Type targetType = x.GetType();

            PropertyInfo targetProperty = targetType.GetProperty(propertyName);

            string xValueText = targetProperty.GetValue(x, null).ToString();
            string yValueText = targetProperty.GetValue(y, null).ToString();

            int xValueNumeric = Int32.Parse(xValueText);
            int yValueNumeric = Int32.Parse(yValueText);

            if (xValueNumeric < yValueNumeric)
            {
                return -1;
            }
            else if (xValueNumeric == yValueNumeric)
            {
                return 0;
            }
            else
            {
                return 1;
            }
        }
    }
}

Likely not the prettiest code you have ever seen. However, it’s pretty easy to see what it’s doing:

Extracts the value of the specified property from the input objects using reflection.
Converts that value to a String.
Parses the converted value to an Integer.
Compares the numeric values to decide which one is bigger.

That seems like a lot of extra work for a simple value comparison to me.

I’m sure the method was built that way for a reason. This IComparer class is designed to be “generic” and work on any type of value on any object. However my guess is that it won’t work with anything but primitive types (numbers, strings and booleans). In fact the default implementation of the Object.ToString() (used in lines 22-23) method returns the fully qualified name of the class, and that usually doesn’t isn’t much of a sorting criteria in most cases.

The real performance bottleneck here is caused by the use of reflection inside of a method that is called hundreds if not thousands of times from all over the application.

Use delegates instead

At this point it is clear that we need to refactor this class to improve its performance and still retain its original functionality, that is to provide a generic way to compare object by the value of one of their properties.

The key is to find a better way to retrieve the value of a property from any type of object without having to use reflection.

Well, since we do know the type of the objects we are comparing through the generic parameter T, we could let the caller specify which value to compare the objects with by. This can be done by having the caller pass a reference to a method, which would return that value when invoked inside of the Compare method. Let’s try it and see how it works.

Implementing the solution in .NET 2.0

Since the application was on .NET 2.0, we need to define our own delegate type that will allow callers to pass the reference to a method returning the comparable value . Here is the complete implementation of the refactored PropertyComparer class:

using System;
using System.Collections.Generic;

namespace Thoughtology.Samples.Collections
{
    public class PropertyComparer : IComparer
    {
        public delegate IComparable ComparableValue(T arg);

        public PropertyComparer(ComparableValue propertySelector)
        {
            this.PropertySelector = propertySelector;
        }

        public ComparableValue PropertySelector { get; set; }

        public int Compare(T x, T y)
        {
            if (this.PropertySelector == null)
            {
                throw new InvalidOperationException("PropertySelector cannot be null");
            }

            IComparable firstValue = this.PropertySelector(x);
            IComparable secondValue = this.PropertySelector(y);

            return firstValue.CompareTo(secondValue);
        }
    }
}

Our delegate, called ComparableValue, takes an object of the generic type T as input and returns a value to compare that object by.

The comparison itself is than performed by the returned value itself, by invoking the IComparable.CompareTo method on it (see line 27).

All primitive types in .NET implement the [IComparable][13] interface. Custom objects can easily be compared in a semantically meaningful way by manually implementing that same interface.

The caller can now invoke the Sort method by specifying the property to compare the items by with an anoymous delegate:

customerList.Sort(new PropertyComparer(delegate(Customer c)
    {
        return c.FullName;
    });

Notice how the property name is no longer passed a a string. Instead it is actually invoked on the object providing compile-time type checking.

Alternative implementation in .NET 3.5

This same solution can be implemented slightly differently in .NET 3.5 by taking advantage of the built in Func delegate type:

using System;
using System.Collections.Generic;

namespace Thoughtology.Samples.Collections
{
    public class PropertyComparer : IComparer
    {
        public PropertyComparer(Func propertySelector)
        {
            this.PropertySelector = propertySelector;
        }

        public Func PropertySelector { get; set; }

        public int Compare(T x, T y)
        {
            if (this.PropertySelector == null)
            {
                throw new InvalidOperationException("PropertySelector cannot be null");
            }

            IComparable firstValue = this.PropertySelector(x);
            IComparable secondValue = this.PropertySelector(y);

            return firstValue.CompareTo(secondValue);
        }
    }
}

Great, this saved us exactly one line of code. Don’t worry, things get much nicer on the caller’s side where the anonymous delegate is substituted by a much more compact lambda expression:

customerList.Sort(new PropertyComparer(c => c.FullName));

The results

Now that we put reflection out of the picture, it is a good time to run a simple test harness to see how the new comparison strategy performs. For this purpose we will sort an increasingly large collection of objects with the two PropertyComparer implementations and compare how long it takes to complete the operation. Here are the results in a graph:

SortingPerformanceChart

As you see, by using delegates the sorting algorithm stays on the linear O(n). On the other hand with reflection it quickly jumps over in the exponential O(cn) space, where c is the time it takes to make a single comparison.

Lessons learned

This exercise teaches three general guidelines that can be applied when programming in .NET:

Reflection is expensive. Use it sparingly and avoid it whenever possible in code that is executed very often, such as loops.
Generic delegates allow to build flexible code in a fast and strongly-typed fashion. This can be achieved by letting callers “inject” custom code into an algorithm by passing a delegate as argument to a method. The code referred to by the delegate will then be executed at the appropriate stage in the algorithm inside the method.
When reflection is used to dynamically invoke members on a class, the same thing can be achieved by using generic delegates instead, like demonstrated in this article. This technique is widely used by modern isolation frameworks such as Rhino Mocks, Moq and TypeMock Isolator.

/Enrico

August 19, 2009 Posted in technology

Building an HTPC

Almost exactly one year ago, I seriously started considering the problem of having the digital content I care about, mostly made up of music and pictures, scattered around different computers. htpc At home I often found myself thinking “I wish I could watch this movie on the TV instead of sitting in front of a tiny monitor”. At a friend’s house I would sometimes say “I can’t show you the pictures of our last trip right now because they are on my other laptop”. On top of that I started to have the creepy feeling that that not everything was backed up properly and on a regular basis, since it resided on different machines. This had become both annoying and worrying.

That’s how I got interested in Home Theater PC or HTPC for short.

My goal was to be able collect and organize all of my digital content in the form of music, pictures and movies in one central place, more precisely the living room, and to make it available to other PCs in the house as well as enjoying it on the TV’s big screen.

After looking at a couple of commercial products in that category (particularly Apple Mac mini and Apple TV) I realized the most common thing to do for a geek like me was to go off and build my own HTPC. This way I could pick and choose the hardware parts to build a machine that matches my specific needs.

The requirements

A computer that wishes to rightfully be called an HTPC must have the following basic characteristics:

Silent
Low power consumption
TV connectivity
Large storage capacity

On top of that, my personal requirements were:

Be able to play High Definition (HD) movies at high resolution (1080p)
Be able to play some occasional 3D game
Do not look like a computer but rather like some Audio-Video piece of equipment

The hardware

Based on these requirements and my budget, I came up with the following hardware configuration:

Component	Part
Motherboard	Gigabyte GA-MA69GM-S2H
CPU	AMD Athlon X2 Dual-Core 4850e 2.5 GHz
Cooling	Scythe Ninja Mini CPU Cooler
Memory	Kingston DDR2 PC6400 2048MB
Storage	Western Digital 500 GB SATA
Graphics Card	MSI GeForce 8600GT 256MB DDR3
Sound Card	Integrated
Case	Antec Fusion 430 Silver

There are some key points here that lead my decisions I should probably explain.

First of all I decided to go with the cheaper AMD Athlon X2 CPU over an Intel Core 2 Duo, since the performance gain I would get from the Intel processor wasn’t really that important to me to justify the higher price. Moreover the 4850e uses just 45W of electricity, which contributes in keeping the CPU cool and the power consumption low.

My choice of motherboard was based on a couple of factors:

The Antec Fusion V2 case (really slick by the way), has only room for a Mini-ATX size motherboard
It has integrated High Definition Audio sound chip with support for 7.1 channels and DTS (Digital Theater Systems), which basically means great audio for music and movies
It also has a decent ATI Radeon X1250 graphics chip with HDMI and TV-out ports integrated, which is nice to have in case my graphics card fails

I wanted this computer to be silent, and since I’m not a huge fan of water cooling, I figured the best way to keep the volume down would be to keep as few fans as possible. For this reason I substituted the stock CPU cooler that comes with the AMD processor with a Scythe Ninja Mini heat sink (shown in the picture below). This would allow me to cool the CPU without needing a fan. Moreover its low profile fits well in the Antec Fusion case.

As a matter of personal preference, the graphics card had to be an NVIDIA GeForce. This particular MsiGeForce8600GTHeatpipe model not only provides a nice price/performance balance, but is also extremely silent thanks to its fan-less passive cooling through a heat pipe. The downside is that once installed in the case it takes up the equivalent space of two cards, due to the large heat sink on the backside.

The case was the most important (and expensive) piece of the whole configuration. I have to say the Antect Fusion 430 is a great case for an HTPC. AntecFusion430 As far as aesthetics go, it makes a computer look like a fancy hi-fi amplifier with a shiny aluminum front panel. Moreover it has some nice features like an LCD screen with support for IR remotes and even a volume nod, contributing to the overall experience.

On the inside, it is designed to keep the hardware cool without being loud. It has two big fans positioned on one side of the case blowing cool air from the outside on the CPU and the graphics card, which are the hottest components in the system.

In this picture you can see the two fans on the left side directing the air flow towards the two giants heat sinks mounted on the CPU and GPU.

The software

After the build was done, I immediately installed Windows Vista Home Premium and the required device drivers on it to see how it performed.

Here is how Vista rates the system:

Playing HD movies encoded with the H.264 codec at 1080p on to a 40’’ flat HDTV is no problem at all. I use Cyberlink PowerDVD 9 which supports the NVIDIA PureVideo® feature to offload part of the rendering off to the GPU.

I have to admit I was a little worried that the two fans mounted in the Antec case weren’t enough to keep the system from overheating, especially when HTPC is inside of a closet under the TV.

So I decided to run the excellent Prime95 tool to stress test the system and watch the CPU and GPU temperature with CPU-Z and GPU-Z respectively. The screenshots below show the temperature measured at the two CPU cores when the system is idle (on top) and when running under load (below):

CoreTempIdle CoreTempLoad

It seems that the passive cooling is doing a pretty good job at keeping the CPU and GPU at low temperatures, even when the system is put under heavy load.

Conclusion

So far I’ve been pretty satisfied with the HTPC I’ve built. It fits well into the living room thanks to its specially designed case and it’s silent enough that I can’t even tell when it’s turned on or off (OK, I can always look at the power led on the front panel). Also it does everything I need it to without issues.

Having a PC working as media center instead of a proprietary custom device such as the Apple TV, definitely is the most flexible choice in terms of what software you can run. It also allows you to tweak the system to your preference, which is a requirement in itself for anyone with a passion for technology.

/Enrico

« Older Posts Newer Posts »