Sound Code: 2011

Sunday, 18 December 2011

NAudio 1.5 Released

I have finally got round to releasing the long overdue version 1.5 of NAudio. There are loads of bugfixes and improvements, so I recommend upgrading if possible. Here’s the highlights.

Now available on NuGet!
Numerous bugfixes mean we are now working fully in x64 as well as x86, so NAudio.dll is now marked as AnyCPU. (You can still force x86 by marking your own executable as x86 only.)
WaveOutEvent – a new WaveOut mode with event callback, highly recommended instead of WaveOut with function callbacks
24 bit ASIO driver mode (LSB)
Float LSB ASIO driver mode
WaveFileWriter has had a general code review and API cleanup
Preview of new ISampleProvider interface making it much easier to write custom 32 bit IEEE (float) audio pipeline components, without the need to convert to byte[]. Lots of examples in NAudioDemo of using this and more documentation will follow in future.
Several ISampleProvider implementations to get you started. Expect plenty more in future NAudio versions:

PanningSampleProvider
MixingSampleProvider
MeteringSampleProvider
MonoToStereoSampleProvider
NotifyingSampleProvider
Pcm16BitToSampleProvider
Pcm8BitToSampleProvider
Pcm24BitToSampleProvider
SampleChannel
SampleToWaveProvider
VolumeSampleProvider
WaveToSampleProvider

Added AiffFileReader courtesy of Giawa
AudioFileReader to simplify opening any supported file, easy volume control, read/reposition locking
BufferedWaveProvider uses CircularBuffer instead of queue (less memory allocations)
CircularBuffer is now thread-safe
MP3Frame code cleanup
MP3FileReader throws less exceptions
ASIOOut bugfixes for direct 16 bit playback
Some Demos added to NAudioDemo to give simple examples of how to use the library

NAudioDemo has an ASIO Direct out form, mainly for testing the AsioOut class at different bit depths (still recommended to convert to float before you get there).
NAudioDemo has simple MP3 streaming form (play MP3s while they download)
NAudioDemo has simple network streaming chat application
NAudioDemo playback form uses MEF to make it much more modular and extensible (new output drivers, new file formats etc)
NAudioDemo can play aiff

GSM 6.10 ACM codec support
DSP Group TrueSpeech ACM codec support
Fully managed G.711 a-law and mu-law codecs (encode & decode)
Fully managed G.722 codec (encode & decode)
Example of integration with NSpeex
Fix to PlaybackStopped using SyncContext for thread safety
Obsoleted IWavePlayer.Volume (can still set volume on WaveOut directly if you want)
Improved FFT display in WPF demo
WaveFileReader - tolerate junk after data chunk
WaveOut constructor detects if no sync context & choose func callbacks
WaveOut function mode callbacks hopefully chased out the last of the hanging bugs (if in a WaveOutWrite at same time as WaveOutReset, bad things happen - so need locks, but if WaveOutReset called during a previous func callback that is about to call waveOutWrite we deadlock)
Now has an msbuild script allowing me to more easily create releases, run tests etc
Now using Mercurial for source control, hopefully making bug fixing old releases and accepting user patches easier. n.b. this unfortunately means all old submitted patches are no longer available for download on the CodePlex page.
WPF Demo enhancements:

WPF Demo is now .NET 4, allowing us to use MEF, and will be updated hopefully with more examples of using NAudio.
WPF Demo uses windowing before FFT for a more accurate spectrum plot
WPF Demo has visualization plugins, allowing me to trial different drawing mechanisms
WPF Demo has a (very basic) drum machine example

I also intend that this will be the last NAudio that targets .NET 2.0 (1.6 will be .NET 3.5). Let me know if you have any objections.

Hope you have fun using it, and do send me the links to any cool stuff you make with it.

Tuesday, 29 November 2011

Creating Zip Files with IronPython

I’ve been working on automating some of the parts of the release process for NAudio, as I always seem to forget something or make a mistake. One of the tasks was to create a “demo” zip file containing the demo applications together with their supporting files.

I initially attempted to do this with MSBuild and the Zip task from MSBuild Community Extensions, but I ended up double-adding a number of files, as well as struggling to get exactly the right folder structure.

This is exactly the sort of task that Python excels at, and Python comes with the zipfile module built in, meaning that the script I wrote is not IronPython specific. Here’s what I came up with:

import zipfile
import os

folders = ['AudioFileInspector','NAudioDemo','NAudioWpfDemo']
files = {}

def exclude(filename):
    return filename.endswith('.pdb') or ('nunit' in filename)

for folder in folders:
    fullpath = folder + "\\bin\\debug\\"
    for filename in os.listdir(fullpath):
        if not exclude(filename):
            files[filename] = fullpath + filename

zip = zipfile.ZipFile("BuildArtefacts\\test.zip", "w")

for filename, fullpath in files.iteritems():
    if os.path.isdir(fullpath):
        for subfile in os.listdir(fullpath):
            zip.write(fullpath + "\\" + subfile, filename + "\\" + subfile)
    else:
        zip.write(fullpath, filename)

zip.close()

There's not a lot to it really. I first build up a dictionary containing the files I want in my zip, using the filename to exclude duplicates. Then I use the write method on zipfile to specify the file I want to add, and the folder it belongs in.

My Python skills are a bit rusty, so the code above would probably benefit from being refactored a little, but as you can see, it is very easy, and much simpler than fighting MSBuild to make it do what I want.

Monday, 11 July 2011

Screencast: Modular WPF with MEF & MVVM Tutorial Part 2

In Part 1, I showed how to create a very simple modular WPF application, and introduced MEF to allow us to easily add modules without changing any existing code.

In part 2, I introduce the MVVM pattern to show how we can make our applications more easily unit testable, by separating the view from the business logic. I also create some unit tests using the Microsoft unit testing framework and moq, and show how to perform code coverage analysis.

Incidentally, I normally use NUnit and TestDriven.NET instead of the Microsoft unit testing framework, which I find much simpler to use, and also supports code coverage analysis using NCover or the Visual Studio code coverage.

I do plan to follow this up with a third video, in which I replace the modules ListBox with some buttons to switch modules and show some of the typical problems you might run into when using MVVM.

Saturday, 9 July 2011

10 C# keywords you should be using

Most developers who learn C# pick up the basic keywords quite quickly. Within a few weeks of working with a typical codebase you’ll have come across around a third of the C# keywords, and understand roughly what they do. You should have no trouble explaining what the following keywords mean:

public, private, protected, internal, class, namespace, interface, get, set, for, foreach .. in, while, do, if, else, switch, break, continue, new, null, var, void, int, bool, double, string, true, false, try, catch

However, while doing code reviews I have noticed that some developers get stuck with a limited vocabulary of keywords and never really get to grips with some of the less common ones, and so miss out on their benefits. So here’s a list, in no particular order, of some keywords that you should not just understand, but be using on a semi-regular basis in your own code.

is & as

Sometimes I come across a variation of the following code, where we want to cast a variable to a different type but would like to check first if that cast is valid:

if (sender.GetType() == typeof(TextBox))
{
   TextBox t = (TextBox)sender;
   ...
}

While this works fine, the is keyword could be used to simplify the if clause:

if (sender is TextBox)
{
   TextBox t = (TextBox)sender;
   ...
}

We can improve things further by using the as keyword, which is like a cast, but doesn’t throw an exception if the conversion is not valid – it just returns null instead. This means we can write code in a way that doesn’t require the .NET framework to check the type of our sender variable twice:

TextBox t = sender as TextBox;
if (t != null)
{
   ...
}

I feel obliged to add that if your code contains a lot of casts, you are probably doing something wrong, but that is a discussion for another day.

using

Most developers are familiar with the using keyword for importing namespaces, but a surprising number do not make regular use of it for dealing with objects that implement IDisposable. For example, consider the following code:

var writer = new StreamWriter("test.txt");
writer.WriteLine("Hello World");
writer.Dispose();

What we have here is a potential resource leak if there is an exception thrown between opening the file and closing it. The using keyword ensures that Dispose will always be called if the writer object was successfully created.

using (var writer = new StreamWriter("test.txt"))
{
   writer.WriteLine("Hello World");
}

Make it a habit to check whether the classes you create implement IDisposable, and if so, make use of the using keyword.

finally

Which brings us onto our next keyword, finally. Even developers who know and use using often miss appropriate scenarios for using a finally block. Here’s a classic example:

public void Update()
{   
    if (this.updateInProgress)
    {    
        log.WriteWarning("Already updating");    
        return;
    } 
    this.updateInProgress = true;
    ...
    DoUpdate();
    ...
    this.updateInProgress = false;

}

The code is trying to protect us from some kind of re-entrant or multithreaded scenario where an Update can be called while one is still in progress (please ignore the potential race condition for the purposes of this example). But what happens if there is an exception thrown within DoUpdate? Now we are never able to call Update again because our updateInProgress flag never got unset. A finally block ensures we can’t get into this invalid state:

public void Update()
{   
    if (this.updateInProgress)
    {    
        log.WriteWarning("Already updating");    
        return;
    } 
    try
    {
        this.updateInProgress = true;
        ...
        DoUpdate();
        ...
    }
    finally
    {
        this.updateInProgress = false;
    }
}

readonly

OK, this one is a fairly simple one, and you could argue that code works just fine without it. The readonly keyword says that a field can only be written to from within the constructor. It’s handy from a code readability point of view, since you can immediately see that this is a field whose value will never change during the lifetime of the class. It also becomes a more important keyword as you begin to appreciate the benefits of immutable classes. Consider the following class:

public class Person
{
    public string FirstName { get; private set; }
    public string Surname { get; private set; }

    public Person(string firstName, string surname)
    { 
        this.FirstName = firstName;
        this.Surname = surname;
    }
}

Person is certainly immutable from the outside – no one can change the FirstName or Surname properties. But nothing stops me from modifying those properties within the class. In other words, my code doesn’t advertise that I intend this to be an immutable class. Using the readonly keyword, we can express our intent better:

public class Person
{
    private readonly string firstName;
    private readonly string surname;

    public string FirstName { get { return firstName; } }
    public string Surname { get { return surname; } }

    public Person(string firstName, string surname)
    { 
        this.firstName = firstName;
        this.surname = surname;
    }
}

Yes, it’s a shame that this second version is a little more verbose than the first, but it makes it more explicit that we don’t want firstName or surname to be modified during the lifetime of the class. (Sadly C# doesn’t allow the readonly keyword on properties).

yield

This is a very powerful and yet rarely used keyword. Suppose we have a class that searches our hard disk for all MP3 files and returns their paths. Often we might see it written like this:

public List<string> FindAllMp3s()
{
   var mp3Paths = List<string>();
   ... 
   // fill the list
   return mp3Paths;
}

Now we might use that method to help us search for a particular MP3 file we had lost:

foreach(string mp3File in FindAllMp3s())
{
   if (mp3File.Contains("elvis"))
   {
       Console.WriteLine("Found it at: {0}", mp3File);
       break;
   }  
}

Although this code seems to work just fine, it’s performance is sub-optimal, since we first find every MP3 file on the disk, and then search through that list. We could save ourselves a lot of time if we checked after each file we found and aborted the search at that point.

The yield keyword allows us to fix this without changing our calling code at all. We modify FindAllMp3s to return an IEnumerable<string> instead of a List. And now every time it finds a file, we return it using the yield keyword. So with some rather contrived example helper functions (.NET 4 has already added a method that does exactly this) our FindAllMp3s method looks like this:

public IEnumerable<string> FindAllMp3s()
{
   var mp3Paths = List<string>();
   
   for (var dir in GetDirs())
   {
       for (var file in GetFiles(dir))
       {
           if (file.EndsWith(".mp3")
           {
               yield return file;
           }
       }
   }      
}

This not only saves us time, but it saves memory too, since we now don’t need to store the entire collection of mp3 files in a List.

It can take a little while to get used to debugging this type of code since you jump in and out of a function that uses yield repeatedly as you walk through the sequence, but it has the power to greatly improve the design and performance of the code you write and is worth mastering.

select

OK, this one is cheating since this is a whole family of related keywords. I won’t attempt to explain LINQ here, but it is one of the best features of the C# language, and you owe it to yourself to learn it. It will revolutionise the way you write code. Download LINQPad and start working through the tutorials it provides.

interface

So you already know about this keyword. But you probably aren’t using it nearly enough. The more you write code that is testable and adheres to the Dependency Inversion Principle, the more you will need it. In fact at some point you will grow to hate how much you are using it and wish you were using a dynamic language instead. (dynamic is itself another very interesting new C# keyword, but I feel that the C# community is only just beginning to discover how we can best put it to use).

throw

You do know you are allowed to throw as well as catch exceptions right? Some developers seem to think that a function should never let any exceptions get away, and so contain a generic catch block which writes an error to the log and returns giving the caller no indication that things went wrong.

This is almost always wrong. Most of the time your methods should simply allow exceptions to propagate up to the caller. If you are using the using keyword correctly, you are probably already doing all the cleanup you need to.

But you can and should sometimes throw exceptions. An exception thrown at the point you realise something is wrong with a good error message can save hours of debugging time.

Oh, and if you really do need to catch an exception and re-throw it, make sure you use do it the correct way.

goto

Only joking, pretend you didn’t see this one. Just because a keyword is in the language, doesn’t mean it is a good idea to use it. out and ref usually fall into this category too – there are better ways to write your code.

Friday, 8 July 2011

Screencast: Modular WPF with MEF & MVVM Tutorial Part 1

A while ago I began to write a blog tutorial about how to create a modular WPF application using MVVM and MEF. I have decided to present it as a screencast instead, so here is the first part. It uses WPF, but a lot of the techniques I show apply equally well to Silverlight.

My approach is not to start off using MVVM and MEF (or any of the more fully featured WPF frameworks), but to introduce these libraries and design patterns at the point at which they make sense. Hopefully this will make it clearer why we might actually want to use them in the first place.

In this first video I create a very simple application with a menu allowing you to switch between two modules, and show how MEF enables us to work around violations of the “Open Closed Principle”.

I’ve got at least one more episode planned which hopefully will appear shortly and introduce MVVM to make our application more easily testable. If there is demand, I may also post my notes and upload my mercurial repository to bitbucket.

Another reason for presenting this as a screencast is that I am planning to create some NAudio screencasts in the future, and so I need to get some practice in. This one is a little raw as I didn’t do a practice beforehand, but I have tried to keep things flowing along at a reasonable pace. I also know the sound quality isn’t brilliant. Let me know if you think the resolution is acceptable.

Wednesday, 6 July 2011

One Language to Rule them All

You may have heard the story of the “tower of Babel”. The story goes that one day, the people of the earth decided to build a great tower. It would be a monument to the greatness of humanity. But God objected to their pride and intervened to thwart their building program. His technique was not to strike the tower down with an earthquake, or strike the builders down with illness. Instead, he opted for a simple yet effective solution: he “confused the language” and the building project was soon abandoned. Inability to communicate doomed the project to failure.

This ancient tale is a remarkably fitting parable for the state of modern programming. Even the smallest of miscommunications, such as requirements not properly understood, or two components in the system using different units of measure can cause the premature demise of entire software projects.

The analogy works on a wider scale too. Imagine what we as a software development community could build if we shared a common programming language. All the wastage of ‘porting’ libraries from one framework to another would be eliminated. Instead of reinventing the same development tools over and over for every new language, we could focus on genuine innovation.

The Last Programming Language

And this is the point made by Robert C Martin at the NDC 2011 conference, in a fascinating talk entitled “Clojure – The Last Programming Language”.

He begins by arguing that we’ve already fully explored the domain of programming languages. In other words, we have experimented with all the types of languages that there are. Or to put it another way, the new languages we keep inventing are just refinements or rearrangements of ideas from existing languages. It’s a slightly depressing thought in some ways – expect no new revolutionary paradigms – we’ve tried it all already.

Whether or not he is right about his first claim, his proposal for the development community to standardise on a single language certainly grabbed my attention. Before dismissing it out of hand as a pipe dream, it is worth pondering the many benefits it would bring.

You would not need to hire a C#/Java/Ruby/C programmer. You’d just need to hire a programmer.
All platforms and frameworks would be built in that language. No need for all this NUnit, JUnit, PyUnit, Runit business. We’d build on top of what had gone before instead of continually having to reinvent it for every new language.
Articles with example code would be immediately understandable to all developers

Can we agree on anything?

But could developers really agree on one programming language? The idea seems almost preposterous. Uncle Bob counters that other industries have gone through the same transition (e.g. biologists, chemists, medicine, mathematics). Eventually the benefits of a lingua franca win through despite the attachment people inevitably feel to their native tongue.

He also points out that this already happened once with C, which is one of those few languages that can genuinely be used to develop on almost every platform. This for me is the big sticking point in seeing Uncle Bob’s vision become a reality. The one language must be able to develop for every platform, server, client, embedded, handheld. It must be usable for every type of development – desktop apps, websites, games, services, device drivers, nuclear power stations. I’m not sure such a language exists.

Features of the final language

But let’s suspend disbelief for a moment and ask what features the final language should have. Uncle Bob’s wishlist included features such as:

Not controlled by a corporation.
Garbage collected. (I have mixed feelings on this one given .NET’s inability to do low latency audio well)
Polymorphic
Runs on a virtual machine
Able to modify itself at runtime like Ruby (homoiconic)
Pared down syntax
Functional
Hybrid, i.e. Multi-paradigm (support, don’t enforce paradigms)
Simple
Provide access to existing frameworks
Structured (no goto)
Fast
Textual
Dynamically typed (the talk includes an interesting discussion on this).

His proposed language was Clojure. It seems an interesting enough language, although I can’t say I warm to all the parentheses.

A step in the right direction…

In any case, I think that if his vision were to become a reality, we would first have to pick a virtual machine. The two most obvious candidates are the JVM and the CLR, both of which now support a whole host of languages (Clojure can run on either). The CLR may even have the upper hand due to products like Mono which makes it a genuine open source option and available on a very broad range of platforms, although I suspect its ties with Microsoft would generate a lot of resistance.

Libraries that are compiled for a virtual machine like the CLR effectively appear as native libraries for all languages that compile to the same byte code, meaning that there is at least some reduction in the amount of reinvention and relearning required when you switch languages – the libraries can come with you even if the language syntax doesn’t. Another benefit is that often the byte code for a VM can be translated into other languages by a tool, allowing at least some level of automated translation between languages. Maybe someone could invent a browser plugin that automatically converts any code you view in a web-page into the language of your choice – a kind of Google translate for programming languages.

What do you think? Could we agree on a virtual machine, and then get to work on picking a smaller subset of languages to program it with? Or is Uncle Bob’s idea a pipedream? Will we ever get to one common language or will God step in and mix it all up again, just to keep us humble.

Tuesday, 5 July 2011

Test Resistant Code #5–Threading

I want to wrap up my test-resistant code series with one final type of code that proves hard to test, and that is multi-threaded code. We’ll just consider two scenarios, one that proves easy to test, another that proves very complex.

Separate out thread creation

A common mistake is to include the code that creates a thread (or queues a background worker) in the same class that contains the code for the actual work to be performed by the thread. This is a violation of “separation of concerns” In the following trivial example, the function we really need to unit test, DoStuff, is private, and the public interface is not helpful for unit testing.

public void BeginDoStuff()
{
    ThreadPool.QueueUserWorkItem((o) => DoStuff("hello world"));
}

private void DoStuff(string message)
{
    Console.WriteLine(message);
}

Fixing this is not hard. We separate the concerns by making the DoStuff method a public member of a different class, leaving the original class simply to manage the asynchronous calling and reporting of results (which you may find can be refactored into a more generic threading helper class).

Locks and race conditions

But what about locking? Consider a very simple circular buffer class I wrote for NAudio. In the normal use case, one thread writes bytes to it while another reads from it. Here’s the current code for the Read and Write methods (which I’m sure could be refactored down to something much shorter):

/// <summary>
/// Write data to the buffer
/// </summary>
/// <param name="data">Data to write</param>
/// <param name="offset">Offset into data</param>
/// <param name="count">Number of bytes to write</param>
/// <returns>number of bytes written</returns>
public int Write(byte[] data, int offset, int count)
{
    lock (lockObject)
    {
        int bytesWritten = 0;
        if (count > buffer.Length - this.byteCount)
        {
            count = buffer.Length - this.byteCount;
        }
        // write to end
        int writeToEnd = Math.Min(buffer.Length - writePosition, count);
        Array.Copy(data, offset, buffer, writePosition, writeToEnd);
        writePosition += writeToEnd;
        writePosition %= buffer.Length;
        bytesWritten += writeToEnd;
        if (bytesWritten < count)
        {
            // must have wrapped round. Write to start
            Array.Copy(data, offset + bytesWritten, buffer, writePosition, count - bytesWritten);
            writePosition += (count - bytesWritten);
            bytesWritten = count;
        }
        this.byteCount += bytesWritten;
        return bytesWritten;
    }
}

/// <summary>
/// Read from the buffer
/// </summary>
/// <param name="data">Buffer to read into</param>
/// <param name="offset">Offset into read buffer</param>
/// <param name="count">Bytes to read</param>
/// <returns>Number of bytes actually read</returns>
public int Read(byte[] data, int offset, int count)
{
    lock (lockObject)
    {
        if (count > byteCount)
        {
            count = byteCount;
        }
        int bytesRead = 0;
        int readToEnd = Math.Min(buffer.Length - readPosition, count);
        Array.Copy(buffer, readPosition, data, offset, readToEnd);
        bytesRead += readToEnd;
        readPosition += readToEnd;
        readPosition %= buffer.Length;

        if (bytesRead < count)
        {
            // must have wrapped round. Read from start
            Array.Copy(buffer, readPosition, data, offset + bytesRead, count - bytesRead);
            readPosition += (count - bytesRead);
            bytesRead = count;
        }

        byteCount -= bytesRead;
        return bytesRead;
    }
}

The fact that I take a lock for the entirety of both methods makes me confident that the internal state will not get corrupted by one thread calling Write while another calls Read; the threads simply have to take it in turns. But what if I had a clever idea for optimising this code that only involved me locking for part of the time. Maybe I want to do the Array.Copy’s outside the lock since they potentially take the longest. How could I write a unit test that ensured my code remained thread-safe?

Short of firing up two threads reading and writing with random sleep times inserted here and there, I’m not sure I know how best to prove the correctness of this type of code. Locking issues and race conditions can be some of the hardest to track down bugs. I once spent a couple of weeks locating a bug that only manifest itself on a dual processor system (back in the days when those were few and far between). The code had been thoroughly reviewed by all the top developers at the company and yet no one saw the problem.

Here’s another example, based on some code I saw in a product I worked on. A method kicks off two threads to do some long-running tasks and attempts to fire a finished event when both have completed. We want to ensure that the SetupFinished event always fires, and only fires once. You might be able to spot a race condition by examining the code, but how would we write a unit test to prove we had fixed it?

private volatile bool eventHasBeenRaised;

public void Init()
{    
    ThreadPool.QueueUserWorkItem((o) => Setup1());
    ThreadPool.QueueUserWorkItem((o) => Setup2());    
}

private void Setup1()
{
    Thread.Sleep(500);
    RaiseSetupFinishedEvent();
}

private void Setup2()
{
    Thread.Sleep(500);
    RaiseSetupFinishedEvent();
}

private void RaiseSetupFinishedEvent()
{
    if (!eventHasBeenRaised)
    {
        eventHasBeenRaised = true;
        SetupFinished(this, EventArgs.Empty);
    }
}

The only tool for .NET I have heard of that might begin to address this shortcoming is Microsoft CHESS. It seems a very promising tool although it seems to have stalled somewhat – the only integration is with VS2008; VS2010 is not supported. I’d love to hear of other tools or clever techniques for unit testing multi-threaded code effectively. I haven’t delved too much into the C# 5 async stuff yet, but I’d be interested to know how well it plays with unit tests.

Monday, 4 July 2011

Visualizing TFS Repositories with Gource

I recently came across the gource project, which creates stunning visualizations of your source control history. It doesn’t include built-in support for TFS repositories, but its custom log file format makes it quite simple to work with.

Since the project I worked on recently hit 1,000,000 lines of code (yes, I know that means it needs to be mercilessly refactored), I thought I would create a gource visualization. My example code below has a regular expression to allow me to filter out the bits in source control I am interested in. It also deals with the fact that we imported our solution from SourceSafe, so it needs to get the real commit dates for early items out of the comments.

public void CreateGourceLogFile(string outputFile)
{
    TfsTeamProjectCollection tpc = new TfsTeamProjectCollection(new Uri("http://mytfsserver:8080/"));
    tpc.EnsureAuthenticated();
    VersionControlServer vcs = tpc.GetService<VersionControlServer>();
    int latestChangesetId = vcs.GetLatestChangesetId();
    Regex regex = new Regex(sourceFileRegex); // optional - a regular expression to match source files you want to include in the gource visualisation
    Regex sourceSafeImportComment = new Regex(@"^\{\d\d/\d\d/\d\d\d\d \d\d:\d\d:\d\d\}");
    int lines = 0;
    using (var writer = new StreamWriter(outputFile))
    {
        for (int changesetId = 1; changesetId < latestChangesetId; changesetId++)
        {
            var changeset = vcs.GetChangeset(changesetId);
            var devEdits = from change in changeset.Changes
                           where
                           ((change.ChangeType & ChangeType.Edit) == ChangeType.Edit
                           || (change.ChangeType & ChangeType.Add) == ChangeType.Add
                           || (change.ChangeType & ChangeType.Delete) == ChangeType.Delete)
                           && regex.IsMatch(change.Item.ServerItem)
                           select change;
            foreach (var change in devEdits)
            {
                DateTime creationDate = changeset.CreationDate;
                var commentMatch = sourceSafeImportComment.Match(changeset.Comment);
                if (commentMatch.Success)
                {
                    creationDate = DateTime.ParseExact(commentMatch.Value,"{dd/MM/yyyy HH:mm:ss}", CultureInfo.InvariantCulture);
                }

                int unixTime = (int)(creationDate - new DateTime(1970, 1, 1)).TotalSeconds;
                writer.WriteLine("{0}|{1}|{2}|{3}", unixTime, changeset.Committer,
                    GetChangeType(change.ChangeType), change.Item.ServerItem);
            }
        }
    }
}

private string GetChangeType(ChangeType changeType)
{
    if ((changeType & ChangeType.Edit) == ChangeType.Edit)
    {
        return "M";
    }
    if ((changeType & ChangeType.Add) == ChangeType.Add)
    {
        return "A";
    }
    if ((changeType & ChangeType.Delete) == ChangeType.Delete)
    {
        return "D";
    }
    throw new ArgumentException("Unsupported change type");
}

Once you have your log file created, just run gource to see an amazing replay of the history of your project:

gource custom.log

Here’s a screenshot:

Sunday, 3 July 2011

Silverlight Music Rating App for KVR One Synth Challenge

After Spotify announced that you only got 10 hours free listening a month, I went in search of some new and interesting sources of music, and one of the things I stumbled across was the monthly KVR One Synth Challenge. Basically the idea is that you have to create an entire track using a single freeware virtual instrument – a different one each month. Despite this limitation, the quality of submissions is consistently impressive.

Voting is done by the community – you have to list your top five tracks. Each month I would download all the tracks and gradually eliminate them from a playlist, making notes on what I liked about each one as I went. This made me wonder if I could create a simple rating tool in Silverlight to help me do this – storing my comments and ratings, and allowing me to easily eliminate tracks from my shortlist.

So I created a simple application called “MusicRater”. At the moment the paths for each KVR OSC contest are hard-coded and I update it each month to point to the new one, but I have plans to make it more configurable in the future. The application makes use of my Silverlight star rating control.

Here’s a screenshot:

You can try it out here, and install it as an out of browser app if that is convenient for you. As of the time of posting, the version at this address is set up to return the tracks for OSC 29, which uses a synth called String Theory, which poses an interesting challenge to the contestants since it is a physical modelling synth. If you enjoy listening, make sure you head over to the voting thread and cast your votes.

Saturday, 2 July 2011

Yahtzee Kata in IronPython

I did another simple kata in IronPython recently, to refresh my memory since I haven’t done much with Python recently. I used my AutoTest for IronPython utility again, and again found myself wanting to invent an equivalent of NUnit’s [TestCase] attribute for Python. The kata is to implement the Yahtzee scoring rules, although the specific instructions I followed describe a different scoring scheme than the most familiar one (seems to be a Scandinavian version).

import unittest

#helpers
def Count(dice, number):
    return len([y for y in dice if y == number])

def HighestRepeated(dice, minRepeats):
    unique = set(dice)
    repeats = [x for x in unique if Count(dice, x) >= minRepeats]
    return max(repeats) if repeats else 0

def OfAKind(dice, n):
    return HighestRepeated(dice,n) * n

def SumOfSingle(dice, selected):
    return sum([x for x in dice if x == selected])

#strategies
def Chance(dice):
    return sum(dice)

def Pair(dice):
    return OfAKind(dice, 2)

def ThreeOfAKind(dice):
    return OfAKind(dice, 3)

def FourOfAKind(dice):
    return OfAKind(dice, 4)
    
def SmallStraight(dice):
    return 15 if tuple(sorted(dice)) == (1,2,3,4,5) else 0

def LargeStraight(dice):
    return 20 if tuple(sorted(dice)) == (2,3,4,5,6) else 0

def Ones(dice):
    return SumOfSingle(dice,1)

def Twos(dice):
    return SumOfSingle(dice,2)

def Threes(dice):
    return SumOfSingle(dice,3)

def Fours(dice):
    return SumOfSingle(dice,4)

def Fives(dice):
    return SumOfSingle(dice,5)

def Sixes(dice):
    return SumOfSingle(dice,6)

def Yahtzee(dice):
    return 50 if len(dice) == 5 and len(set(dice)) == 1 else 0

class YahtzeeTest(unittest.TestCase):
    testCases = (
        ((1,2,3,4,5), 1, Ones),
        ((1,2,3,4,5), 2, Twos),
        ((3,2,3,4,3), 9, Threes),
        ((3,2,3,4,3), 0, Sixes),
        ((1,2,3,4,5), 0, Pair), # no pairs found
        ((1,5,3,4,5), 10, Pair), # one pair found
        ((2,2,6,6,4), 12, Pair), # picks highest
        ((2,3,1,3,3), 6, Pair), # only counts two
        ((2,2,6,6,6), 18, ThreeOfAKind), 
        ((2,2,4,6,6), 0, ThreeOfAKind), # no threes found
        ((5,5,5,5,5), 15, ThreeOfAKind), # only counts three
        ((6,2,6,6,6), 24, FourOfAKind), 
        ((2,6,4,6,6), 0, FourOfAKind), # no fours found
        ((5,5,5,5,5), 20, FourOfAKind), # only counts four
        ((1,2,5,4,3), 15, SmallStraight),
        ((1,2,5,1,3), 0, SmallStraight),
        ((6,2,5,4,3), 20, LargeStraight),
        ((1,2,5,1,3), 0, LargeStraight),
        ((5,5,5,5,5), 50, Yahtzee),
        ((1,5,5,5,5), 0, Yahtzee), 
        ((1,2,3,4,5), 15, Chance),
        )

    def testRunAll(self):
        for (dice, expected, strategy) in self.testCases:
            score = strategy(dice)
            self.assertEquals(expected, score, "got {0} expected {1}, testing with {2} on {3}".format(score, expected, strategy.__name__, dice))
        print 'ran {0} test cases'.format(len(self.testCases))
        
if __name__ == '__main__':
    unittest.main()

Friday, 1 July 2011

Blogger Users Beware - Google+ Can Delete Your Blog Images

I was given an invite to Google+ today. I thought it would be interesting to see what the fuss is about, so I signed up. It asked me if I wanted to import any pictures from Picasa. I didn’t think I had any pictures on there – I certainly never visit the site, but I clicked yes anyway just in case there was some stuff I had forgotten about on there. And indeed there was. It imported a couple of random photos I had taken of door hinges to email to my dad, plus a whole bunch of screenshots.

Obviously, that’s not the kind of stuff I want on my Google+ profile, so I deleted the photo albums from my profile page. What I hadn’t realised is that Picassa was where Windows Live Writer had been storing all the images for this blog. And instead of just deleting the albums from my Google+ profile, it also deleted them from Picassa. Not into some kind of recycle bin. Deleted permanently.

Clearly Google have committed a cardinal sin of user interface design – don’t make it easy to accidentally and irrevocably delete all your data. And the data stored on Picassa should only be deleteable from Picassa, not from another completely separate website, even if it is run by the same company.

They have also committed the cardinal sin of customer support – utter unresponsiveness. I guess they can get away with this because they are so big and I’m not paying them any money. Even if they can’t recover my data, a response of some kind would be nice. I’ve already come across several other people who’ve done similar things, leaving blogs that have been running several years completely broken.

Update: they have now responded to say:

Our team can't undelete your photos, but we'll make it clearer that your Picasa photos themselves are displayed on Google+ and not copied. It's the same backend, so photos you upload in Picasa are visible from Google+, and vice versa.

The upshot of that is that this blog is missing all its screenshots and diagrams, almost 100 of them. Some of them I might be able to recover from my PC, although my usual way of taking screenshots and adding them through Windows Live Writer only involves the clipboard so there is no physical file on my machine that I can use to recover the images with. I can only apologise, particularly if you came here to read the NAudio documentation, only to find it missing its architecture diagrams.

Thursday, 30 June 2011

Test-Resistant Code #4–Third Party Frameworks

This one might seem like a repeat of my first post in this series, where I talked about dependencies being the biggest barrier to unit testing, and the challenge of creating abstraction layers for everything just to make things testable. However, even if we manage to isolate our third party framework entirely behind an abstraction layer, it still doesn’t eliminate a few conundrums when it comes to actually testing what we have done. I’ll just focus on two example scenarios:

Magic Invocations

Writing code that consumes a third party API can be one of the most difficult developer experiences. You write the code you think ought to work, based on the somewhat unreliable documentation, and it fails with some bizarre error. You then spend the next few days trying every possible combination of parameters, ordering of method calls, and scouring the web for any help you can get. Eventually you stumble across the “magic invocation” – the perfect combination of method calls and parameters that makes it work. You might have no clue why, but you are just relieved to be finally done with this part of the development experience, and ready to move onto something else.

As an example, consider this code I wrote once that attempts to find set the recording level for a given sound card input. It’s pretty hacky stuff, with a different codepath for different versions of Windows. It “works on my machine”, but I’m sure there are some systems on which it does the wrong thing.

private void TryGetVolumeControl()
{
    int waveInDeviceNumber = waveIn.DeviceNumber;
    if (Environment.OSVersion.Version.Major >= 6) // Vista and over
    {
        var mixerLine = waveIn.GetMixerLine();
        //new MixerLine((IntPtr)waveInDeviceNumber, 0, MixerFlags.WaveIn);
        foreach (var control in mixerLine.Controls)
        {
            if (control.ControlType == MixerControlType.Volume)
            {
                this.volumeControl = control as UnsignedMixerControl;
                MicrophoneLevel = desiredVolume;
                break;
            }
        }
    }
    else
    {
        var mixer = new Mixer(waveInDeviceNumber);
        foreach (var destination in mixer.Destinations)
        {
            if (destination.ComponentType == MixerLineComponentType.DestinationWaveIn)
            {
                foreach (var source in destination.Sources)
                {
                    if (source.ComponentType == MixerLineComponentType.SourceMicrophone)
                    {
                        foreach (var control in source.Controls)
                        {
                            if (control.ControlType == MixerControlType.Volume)
                            {
                                volumeControl = control as UnsignedMixerControl;
                                MicrophoneLevel = desiredVolume;
                                break;
                            }
                        }
                    }
                }
            }
        }
    }
}

Having finally made this wretched thing work, does it matter whether I have any “unit tests” for it or not? I could write an “integration test” for this code which actually opens an audio device for recording and attempts to set the microphone level. But to know whether it worked or not would not be automatically verifiable. You would have to manually examine the recorded audio to see if the volume had been adjusted.

What about real “unit tests”? Is it possible to unit test this? Well in one sense, yes. I could wrap an abstraction layer around all the third party framework calls and the operating system version checking. Then I could test that my foreach loops are indeed picking out the “mixer control” I think they should in both cases, and that the desired level was set on that control.

But supposing I did write that unit test. What would it prove? It would prove little more than my code does what it does. It doesn’t prove that it does the right thing. The best it can do is demonstrate that the behaviour that I know “works on my machine”, is still working. In other words I could create mocks that return data that mimics my own soundcard’s controls and ensure that the algorithm always picks the same mixer control for that particular test case.

However, I this that this sort of testing fails the cost-benefit analysis. It requires a lot of abstraction layers to be created just for the purpose of creating a test that has dubious value. Tests should verify that requirements are being met, not check up on how they are implemented.

Writing Plugins

Another common scenario is when you are writing code that will be called by a third party framework – as a plugin of sorts. In this case, you are often inheriting from a base class. The public interface has already been decided for you, for better or worse. Also, depending on the framework, you may not be entirely sure what the possible inputs to your overridden functions might be, or in what order they might be called.

The approach I tend to take here is to put as little code in the derived class as possible. Instead it calls out into my own, more testable classes. This works quite well, and leaves you with the choice of how much effort you want to go to to test the derived plugin class itself, which may be best left to a proper integration test with the real system.

One place this approach doesn’t work well for me is with my own framework, NAudio. In NAudio, you often create small classes that inherit from IWaveProvider or WaveStream. These return audio out of a Read method in a byte array, which is great for the soundcard, but not at all friendly for writing unit tests against. I could again move my logic out into a separate, testable class, but this doubles my number of classes and reduces performance (which is a very important consideration in streaming audio). Add to that that audio effects and processors tend to be hard to automatically verify, and I wonder whether this is another case where the effort to write everything in a strictly TDD manner doesn’t really pay off.

Anyway, I’m nearly at the end of this series now. Just one more form of test-resistant code left to discuss, before I move on to share some ideas I’ve had on how to better automate the testing of some of this code.

Wednesday, 29 June 2011

Test-Resistant Code #3–Algorithms

This is the third part in a mini-series on code that is hard to test (and therefore hard to develop using TDD). Part 1: Test-Resistant Code and the battle for TDD, Part 2: Test-Resistant Code – Markup is Code Too

You may have seen Uncle Bob solve the prime factors kata using TDD. It is a great demonstration of the power of TDD to allow you to think through an algorithm in small steps. In a recent talk at NDC2011 entitled, “The Transformation Priority Premise” he gave some guidelines to help prevent creating bad algorithms. In other words, following the right rules you can use TDD to make quick sort, instead of bubble sort (I tried this briefly and failed miserably). It seems likely that all sorts of algorithms can be teased out with TDD (although I did read of someone trying to create a Sudoku solver this way and finding it got them nowhere).

However suitable or not TDD is for tackling those three problems (prime factors, list sorting, sudoku), they share one common attribute: they are easily verifiable algorithms. I know the expected outputs for all my test cases.

Not all algorithms are like that – we don’t always know what the correct output is. And as clever as the transformation priority premise might be for iterating towards optimal algorithms, I am not convinced that it is an appropriate attack vector for many of the algorithms we typically require in real-world projects. Let me explain with a few simple examples of algorithms I have needed for various projects.

Example 1 – Shuffle

My first example is a really basic one. I needed to shuffle a list. Can this be done with TDD? The first two cases are easy enough – we know what the result should be if we shuffle an empty list, or shuffle a list containing one item. But when I shuffle a list with two items, what should happen? Maybe I could have a unit test that kept going until it got both possibilities. I could even shuffle my list of two items 100 times and check that the number of times I got each combination was within a couple of standard deviations of the expected answer of 50. Then when I pass my list of three items in, I can do the same again, checking that all 6 possible results can occur, followed by more tests checking that the distribution of shuffles is evenly spread.

So it is TDDable then. But really, is that a sensible way to approach a task like this? The statistical analysis required to prove the algorithm requires more development effort than writing the algorithm in the first place.

But there is a deeper problem. Most sensible programmers know that the best approach to this type of problem is to use someone else’s solution. There’s a great answer on stackoverflow that I used. But that violates the principles of TDD: instead of writing small bits of code to pass simple tests, I jumped straight to the solution. Now we have a chunk of code that doesn’t have unit tests. Should we retro-fit them? We’ll come back to that.

Example 2 – Low Pass Filter

Here’s another real-world example. I needed a low pass filter that could cut all audio frequencies above 4kHz out of an audio file sampled at 16kHz. The ideal low pass filter has unity gain for all frequencies below the cut-off frequency, and completely removes all frequencies above. The only trouble is, such a filter is impossible to create – you have to compromise.

A bit of research revealed that I probably wanted a Chebyshev filter with a low pass-band ripple and a fairly steep roll-off of 18dB per octave or more. Now, can TDD help me out making this thing? Not really. First of all, it’s even harder than the shuffle algorithm to verify in code. I’d need to make various signal generators such as sine wave and white noise generators, and then create Fast Fourier Transforms to measure the frequencies of audio that had been passed through the filter. Again, far more work than actually creating the filter.

But again, what is the sane approach to a problem like this? It is to port someone else’s solution (in my case I found some C code I could use) and then perform manual testing to prove the correctness of the code.

Example 3 – Fixture Scheduler

My third example is of an algorithm I have attempted a few times. It is to generate fixture lists. In many sports leagues, each team must play each other team twice a season – once home, once away. Ideally each team should normally alternate between playing at home and then away. Two home or away games in a row is OK from time to time, three is bad, and four in a row is not acceptable. In addition it is best if the two games between the same teams are separated by at least a month or two. Finally, there may be some additional constraints. Arsenal and Tottenham cannot both play at home in the same week. Same for Liverpool and Everton, and for Man Utd and Man City.

What you end up with is a problem that has multiple valid solutions, but not all solutions are equally good. For example, it would not go down too well if one team’s fixtures went HAHAHAHA while another’s went HHAAHHAAHHAA.

So we could (and should) certainly write unit tests that check that the generated fixtures meet the requirements without breaking any of the constraints. But fine tuning this algorithm is likely to at least be a partially manual process, as we tweak various aspects of it and then compare results to see if we have improved things or not. In fact, I can imagine the final solution to this containing some randomness, and you might run the fixture generator a few times, assigning a “score” to each solution and pick the best.

Should you use TDD for algorithms?

When you can borrow a ready-made and reliable algorithm, then retro-fitting a comprehensive unit test suite to it may actually be a waste of time better spent elsewhere.

Having said that, if you can write unit tests for an algorithm you need to develop yourself, then do so. One of TDD’s key benefits is simply that of helping you come up with a good public API, so even if you can’t work out how to thoroughly validate the output, it can still help you design the interface better. Of course, a test without an assert isn’t usually much of a test at all, it’s more like a code sample. But at the very least running it assures that it doesn’t crash, and there is usually something you can check about its output. For example, the filtered audio should be the same duration as the input, and not be entirely silent.

So we could end up with tests for our algorithm that don’t fully prove that our code is correct. It means that the refactoring safety net that a thorough unit test suite offers isn’t there for us. But certain classes of algorithms (such as my shuffle and low pass filter examples) are inherently unlikely to change. Once they work, we are done. Business rules and customer specific requirements on the other hand, can change repeatedly through the life of a system. Focus on testing those thoroughly and don’t worry too much if some of your algorithms cannot be verified automatically.

Tuesday, 28 June 2011

How to Unit Test Silverlight Apps

I recently tried writing some unit tests for a Silverlight 4 project. The first barrier I ran into is that the typical way of creating unit tests – create a DLL, reference NUnit and your assembly to test, doesn’t work, since you can’t mix Silverlight assemblies with standard .NET assemblies.

The next hurdle is that fact that the Silverlight developer tools do not include a unit testing framework, and searching for Silverlight unit testing tools on the web can lead you down a few dead ends. It turns out that the thing you need is the Silverlight Toolkit. If you are just doing Silverlight 4, you only need the April 2010 toolkit.

Once this is installed, you get a template that allows you to add a “Silverlight Unit Test Application” to your project. This is a separate Silverlight application that contains a custom UI just to run unit tests.

Now you can get down to business writing your unit tests. You have to use the Microsoft unit testing attributes, i.e. [TestClass], [TestMethod], [TestInitialize]. You can also use the [Tag] attribute, which is the equivalent of NUnit’s [Category] to group your unit tests, allowing you to run a subset of tests easily.

It has a fairly nice GUI (though not without a few quirks) that displays the outcome of your tests for you, and allows you to copy the test results to your clipboard:

Unfortunately it doesn’t support Assert.Inconclusive (gets counted as a failure), but apart from that it works as expected.

Testing GUI Components

Another interesting capability that the Silverlight Unit Testing framework offers is that it can actually host your user controls and run tests against them.

Unfortunately some of the documentation that is out there is a little out of date, so I’ll run through the basics here.

You start off by creating a test class that inherits from the SilverlightTest base class. Then in a method decorated with the TestInitialize attribute, create an instance of your user control and add it to the TestPanel (a property on the base class, note that it has changed name since a lot of older web tutorials were written).

[TestInitialize]
public void SetUp()
{
    var control = new MyUserControl();
    this.TestPanel.Children.Add(control);
}

Note that if you don’t do this in the TestInitialize method and try to do it in your test method itself, the control won’t have time to actually load.

There is a nasty gotcha. If you user control uses something from a resource dictionary, then the creation of your control will fail in the TestInitialize method, but for some reason the test framework ploughs on and calls the TestMethod anyway, resulting in a confusing failure message. You need to get the contents of your app.xaml into the unit test application.

I already had my resource dictionaries split out into multiple files, which helped a lot. I added the resource xaml files to the test application as linked files, and then just referenced them as Merged dictionaries in my app.xaml:

<Application.Resources>
    <ResourceDictionary>
        <ResourceDictionary.MergedDictionaries>
            <ResourceDictionary Source="MarkRoundButton.xaml"/>
        </ResourceDictionary.MergedDictionaries>
    </ResourceDictionary>
</Application.Resources>

Now in your actual test method, you can now perform tests on your user control, knowing that it has been properly created and sized.

[TestMethod]
public void DisplayDefaultSize()
{
    Assert.IsTrue(tc.ActualWidth > 0);
}

There are also some powerful capabilities in there to allow you to run “asynchronous” tests, but I will save that for a future blog post as I have a few interesting ideas for how they could be used.

Monday, 27 June 2011

Test Resistant Code #2–Markup is Code Too

After blogging about one of the biggest obstacles to doing TDD, I thought I’d follow up with another form of test resistant code, and that is markup.

Now calling markup ‘code’ might seem odd, because we have been led to believe that these are two completely different things. After all applications = code + markup right? It’s the code that has the logic and all the important stuff to test in. Markup is just structured data.

But of course, we all know that you can have bugs in markup. 100% test coverage of your code alone does not guarantee that your customer will have a happy experience. In fact, I would go so far as to say, my experience with Silverlight and WPF programming is that the vast majority of the bugs are to do with me getting the markup wrong in some way. 100% unit coverage of my ViewModels is nice, but doesn’t really help me solve the most challenging problems I face. If I declare an animation in XAML, I have to watch it to know if I got it right. If I get it wrong, I see nothing, and I have no idea why.

Exhibit A – XAML

Here’s a bunch of XAML I wrote for a Silverlight star rating control I made recently:

<UserControl x:Class="MarkHeath.StarRating.Star"
    xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
    xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
    xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
    mc:Ignorable="d"
    d:DesignHeight="34" d:DesignWidth="34" Foreground="#C0C000">
    <Canvas x:Name="LayoutRoot">
        <Canvas.RenderTransform>
            <ScaleTransform x:Name="scaleTransform" />
        </Canvas.RenderTransform>
        <Path x:Name="Fill" Fill="{Binding Parent.StarFillBrush, ElementName=LayoutRoot}" 
                Data="M 2,12 l 10,0 l 5,-10 l 5,10 l 10,0 l -7,10 l 2,10 l -10,-5 l -10,5 l 2,-10 Z" />

        <Path x:Name="HalfFill" Fill="{Binding Parent.HalfFillBrush, ElementName=LayoutRoot}"                
                Data="M 2,12 l 10,0 l 5,-10 v 25 l -10,5 l 2,-10 Z M 34,34" />

        <Path x:Name="Outline" Stroke="{Binding Parent.Foreground, ElementName=LayoutRoot}"     
                StrokeThickness="{Binding Parent.StrokeThickness, ElementName=LayoutRoot}" 
                StrokeLineJoin="{Binding Parent.StrokeLineJoin, ElementName=LayoutRoot}"
                Data="M 2,12 l 10,0 l 5,-10 l 5,10 l 10,0 l -7,10 l 2,10 l -10,-5 l -10,5 l 2,-10 Z" />
        <!-- width and height of this star is 34-->
    </Canvas>
</UserControl>

Does this XAML have any ‘bugs’ in it? Well, not that I know of, but in the process of developing it, I ran into a bunch of problems. For a long time the top level element was a Grid rather than a Canvas, but strange things were happening when I resized it. I also resisted putting the ScaleTransform in there for a while, convinced I could do something with the Stretch properties on the Path objects to get the resizing to happen automatically (it was the half star that spoiled it).

Was this ‘real’ development? I think it was. XAML is after all just a way of writing creating objects and setting properties on them with a syntax that is more convenient for designer tools to work with. I could create exactly the same effect by creating a UserControl class, and setting its Content property to an instance of Canvas and so on.

Could I have developed this using TDD? Hmmm. Well you can always write a test for a piece of code. The trouble comes when you try to decide whether that test has passed or not. For this code, the only sane way of validating it was for me to visually inspect it at various sizes and see if it looked right.

Exhibit B – Build Scripts

I recently had the misfortune of having to do some work on MSBuild scripts. Don’t get me wrong, MSBuild is a powerful and feature rich build framework. It’s just that I had to resort to books and StackOverflow every time I attempted the simplest of tasks.

It is not long before it becomes clear that a build script is a program, and not simply structured data. In MSBuild, you can declare variables, call subroutines (‘Targets’), and write if statements (‘Conditions’). A build script is a program with a very important task – it creates the application you ship to customers. Can there be bugs in a build script? Yes. Can those bugs affect the customer? Yes. Would it be useful to be able to test the build script too? Yes.

Which is why felt a bit jealous when I watched a recent presentation from Shay Friedman on IronRuby, and discovered that with rake you can write your build scripts in Ruby. I’ve not learned the Ruby language myself yet, but I would love my build processes to be defined using IronPython instead of some XML monstrosity. Not only would it be far more productive, it would be testable too.

Exhibit C – P/Invoke

My final example might take you by surprise. As part of NAudio, I write a lot of P/Invoke wrappers. It is a laborious, error-prone process. A bug in one of those wrappers can cause bad things to happen, like crash your application, or even reboot your PC if your soundcard drivers are not well behaved.

Now although these wrappers are written in C#, they are not ‘code’ in the classic sense. They are just a bunch of class and method definitions. There is no logic and there are no expressions in there at all. In fact, it is the attributes that they are decorated with that do a lot of the work. Here’s a relatively straightforward example that someone else contributed recently:

[StructLayout(LayoutKind.Explicit)]
struct MmTime
{
    public const int TIME_MS = 0x0001;
    public const int TIME_SAMPLES = 0x0002;
    public const int TIME_BYTES = 0x0004;

    [FieldOffset(0)]
    public UInt32 wType;
    [FieldOffset(4)]
    public UInt32 ms;
    [FieldOffset(4)]
    public UInt32 sample;
    [FieldOffset(4)]
    public UInt32 cb;
    [FieldOffset(4)]
    public UInt32 ticks;
    [FieldOffset(4)]
    public Byte smpteHour;
    [FieldOffset(5)]
    public Byte smpteMin;
    [FieldOffset(6)]
    public Byte smpteSec;
    [FieldOffset(7)]
    public Byte smpteFrame;
    [FieldOffset(8)]
    public Byte smpteFps;
    [FieldOffset(9)]
    public Byte smpteDummy;
    [FieldOffset(10)]
    public Byte smptePad0;
    [FieldOffset(11)]
    public Byte smptePad1;
    [FieldOffset(4)]
    public UInt32 midiSongPtrPos;
}

Can classes like this be written with TDD? Well sort of, but the tests wouldn’t really test anything, except that I wrote the code I thought I should write. How would I verify it? I can only prove these wrappers really work by actually calling the real Windows APIs – i.e. by running integration tests.

Now as it happens I can (and do) have a bunch of automated integration tests for NAudio. But it gets worse. Some of these wrappers I can only properly test by running them on Windows XP and Windows 7, in a 32 bit process and a 64 bit process, and with a bunch of different soundcards configured for every possible bit depth and byte ordering. Maybe in the future, virtualization platforms and unit test frameworks will become so integrated that I can just use attributes to tell NUnit to run the test on a whole host of emulated operating systems, processor architectures, and soundcards. But even that wouldn’t get me away from the fact that I actually have to hear the sound coming out of the speakers to know for sure that my test worked.

Has TDD over-promised?

So are these examples good counter-points to TDD? Or are they things that TDD never promised to solve in the first place? Uncle Bob may be able to run his test suite and ship Fitnesse if it goes green without checking anything else at all. But I can’t see how I could ever get to that point with NAudio. Maybe the majority of code we do in markup can’t be tested without manual tests. And if they can’t be tested without manual tests, does TDD even make sense as a practice for developing them in the first place?

For TDD to truly “win the war”, it needs needs to come clean about what problems it can solve for us, and what problems are left unsolved. Some test-resistant code (such as tightly coupled code) can be made testable. Other test-resistant code will always need an amount of manual test and verification. The idea that we could get to the point of “not needing a QA department” does not seem realistic to me. TDD may mean a smaller QA department, but we cannot get away from the need for real people to verify things with their eyes and ears, and to verify them on a bunch of different hardware configurations that cannot be trivially emulated.

Friday, 24 June 2011

Test resistant code and the battle for TDD

I have been watching a number of videos from the NDC 2011 conference recently, and noticed a number of speakers expressing the sentiment that TDD has won.

By “won”, they don’t mean that everyone is using it, because true TDD practitioners are still very much in the minority, Even those who claim to be doing TDD are in reality only doing it sometimes. In fact, I would suggest that even the practice of writing unit tests for everything, let alone writing them first, is far from being the norm in the industry.

Of course, what they mean is that the argument has been won. No prominent thought-leaders are speaking out against TDD; its benefits are clear and obvious. The theory is that, it is only a matter of time before we know no other way of working.

Or is it?

There is a problem with doing TDD in languages like C# that its most vocal proponents are not talking enough about. And that is that a lot of the code we write is test resistant. By test resistant, I don’t mean “impossible to test”. I just mean that the effort required to shape it into a form that can be tested is so great that even if we are really sold on the idea of TDD, we give up in frustration once we actually try it.

I’ve got a whole list of different types of code that I consider to be “test-resistant”, but I’ll just focus in on one for the purposes of this post. And that is code that has lots of external dependencies. TDD is quite straightforward if you happen to be writing a program to calculate the prime factors of a number; you don’t have any significant external dependencies to worry about. The same is largely true if you happen to be writing a unit testing framework or an IoC container. But for many, and quite probably the majority of us, the code we write interacts with all kinds of nasty hard to test external stuff. And that stuff is what gets in our way.

External dependencies

External dependencies come in many flavours. Does your class talk to the file-system or to a database? Does it inherit from a UserControl base class? Does it create threads? Does it talk across the network? Does it use a third party library of any sort? If the answer to any of these questions is yes, the chances are your class is test-resistant.

Such classes can of course be constructed and have their methods called by an automated testing framework, but those tests will be “integration” tests. Their success depends on the environment in which they are run. Integration tests have their place, but if large parts of our codebase can only be covered by integration tests, then we have lost the benefits of TDD. We can’t quickly run the tests and prove that our system is still working.

Suppose we draw a dependency diagram of all the classes in our application arranged inside a circle. If a class has any external dependencies we’ll draw it on the edge of the circle. If a class only depends on other classes we wrote, we’ll draw it in the middle. We might end up with something looking like this:

In my diagram, the green squares in the middle are the classes that we will probably be able to unit test without too much pain. They depend only on other code we have written (or on trivial to test framework classes like String, TimeSpan, Point etc).

The orange squares represent classes that we might be able to ‘unit’ test, since file systems and databases are reasonably predictable. We could create a test database to run against, or use temporary files that are deleted after the test finishes.

The red squares represent classes that will be a real pain to test. If we must talk to a web-service, what happens when it is not available? If we are writing a GUI component, we probably have to use some kind of unwieldy automation tool to create it and simulate mouse-clicks and keyboard presses, and use bitmap comparisons to see if the right thing happened.

DIP to the rescue?

But hang on a minute, don’t we have a solution to this problem already? It’s called the “Dependency Inversion Principle”. Each and every external dependency should be hidden behind an interface. The concrete implementers of those interfaces should write the absolute minimal code to fulfil those interfaces, with no logic whatsoever.

Now suddenly all our business logic has moved inside the circle. The remaining concrete classes on the edge still need to be covered by integration tests, but we can verify all the decisions, algorithms and rules that make up our application using fast, repeatable, in-memory unit tests.

All’s well then. External dependencies are not a barrier to TDD after all. Or are they?

How many interfaces would you actually need to create to get to this utopian state where all your logic is testable? If the applications you write are anything like the ones I work on, the answer is, a lot.

IFileSystem

Let’s work through the file system as an example. We could create IFileSystem and decree that every class in our application that needs to access the disk goes via IFileSystem. What methods would it need to have? A quick scan through the application I am currently working on reveals the following dependencies on static methods in the System.IO namespace:

File.Exists
File.Delete
File.Copy
File.OpenWrite
File.Open
File.Move
File.Create
File.GetAttributes
File.SetAttributes
File.WriteAllBytes
File.ReadLines
File.ReadAllLines
Directory,Exists
Directory.CreateDirectory
Directory.GetFiles
Directory.GetDirectories
Directory.Delete
DriveInfo.GetDrives

There’s probably some others that I missed, but that doesn’t seem too bad. Sure it would take a long time to go through the entire app and make everyone use IFileSystem, but if we had used TDD, then IFileSystem could have been in there from the beginning. We could start off with just a few methods, and then add new ones in on an as-needed basis.

The trouble is, our abstraction layer needs to go deeper than just wrappers for all the static methods on System.IO. For example, Directory.GetFiles returns a FileInfo object. But the FileInfo class has all kinds of methods on it that allow external dependencies to sneak back into our classes via the back door. What’s more, it’s a sealed class, so we can’t fake it for our unit tests anyway. So now we need to create IFileInfo for our IFileSystem to return when you ask for the files in a folder. If we keep going in this direction it won’t be long before we end up writing an abstraction layer for the entire .NET framework class library.

Interface Explosion

This is a problem. You could blame Microsoft. Maybe the BCL/FCL should have come with interfaces for everything that was more than just a trivial data transfer object. That would certainly have made life easier for unit testing. But this would also add literally thousands of interfaces. And if we wanted to apply the “interface segregation principle” as well, we’d end up needing to make even more interfaces, because the ones we had were a bad fit for the classes that need to consume them.

So for us to do TDD properly in C#, we need to get used to the idea of making lots of interfaces. It will be like the good old days of C/C++ programming all over again. For every class, you also need to make a header / interface file.

Mocks to the Rescue?

Is there another way? Well I suppose we could buy those commercial tools that have the power to replace any concrete dependency with a mock object. Or maybe Microsoft’s Moles can dig us out of this hole. But are these frameworks solutions to problems we shouldn’t be dealing with in the first place?

There is of course a whole class of languages that doesn’t suffer from this problem at all. And that is dynamic languages. Mocking a dependency is trivial with a dynamically typed language. The concept of interfaces is not needed at all.

This makes me think that if TDD really is going to win, dynamically typed languages need to win. The limited class of problems that a statically typed language can protect us from can quite easily be detected with a good suite of unit tests. It leaves us in a kind of catch 22 situation. Our statically typed languages are hindering us from embracing TDD, but until we are really doing TDD, we’re not ready to let go of the statically typed safety net.

Maybe language innovations like C# 4’s dynamic or the coming compiler as a service in the next .NET will afford enough flexibility to make TDD flow more naturally. But I get the feeling that TDD still has a few battles to win before the war can truly be declared as over.

read the follow-on articles here:

Saturday, 18 June 2011

SOLID Code is Mergeable Code

Not every developer I speak to is convinced of the importance of adhering to the “SOLID” principles. They can seem a bit too abstract and “computer sciency” and disconnected from the immediate needs of delivering working software on time. “SOLID makes writing unit tests easier? So what, the customer didn’t ask for unit tests, they asked for an order billing system”.

But one of the very concrete benefits of adhering to SOLID is that it eases merging for projects that require parallel development on multiple branches. Many of the commercial products I have worked on have required old versions to be supported with bug fixes and new features for many years. It is not uncommon to have more than 10 active branches of code. As nice as it would be to simply require every customer to upgrade to the latest version, that is not possible in all commercial environments.

One customer took well over a year to roll out our product to all their numerous sites. During that time we released two new major versions of our software. Another customer requires a rigorous and lengthy approval testing phase before they will accept anything new at all. The result is that features and bug fixes often have to be merged through multiple branches, some of which have diverged significantly over the years.

So how do the SOLID principles aid merging?

The Single Responsibility Principle states that each class should only have a single reason to change. If you have a class that constantly requires changing on every branch, creating regular merge conflicts, the chances are it violates SRP. If each class has only a single responsibility there is only a merge conflict if both branches genuinely need to modify that single responsibility.

Classes that adhere to the Open Closed Principle don’t need to be changed at all (except to fix bugs). Instead, they can be extended from the outside. Two branches can therefore extend the same class in completely different ways without any merge conflict at all – each branch simply adds a new class to the project.

Designs that violate the Liskov Substitution Principle result in lots of switch and if statements littered throughout the code, that must be modified every time we introduce a new subclass into the system. In the canonical ‘shapes’ example, when the Triangle class is introduced on one branch, and the Hexagon class is created on another, the merge is trivial if the code abides by LSP. If the code doesn’t, you can expect a lot of merge conflicts as every place in the code that uses the Shape base class is likely to have been modified in both branches.

The Interface Segregation Principle is in some ways the SRP for interfaces. If you adhere to ISP, you have many small, focused interfaces instead of few large, bloated interfaces. And the more your codebase is broken down into smaller parts, the lower the chances of two branches both needing to change the same files.

Finally, the Dependency Inversion Principle might not at first glance seem relevant to merges. But every application of DIP is in fact also a separation of concerns. The concern of creating your dependencies is separated away from actually using them. So applying DIP is always a step in the direction of SRP, which means you have smaller classes with more focused responsibilities. But DIP has another power that turns out to be very relevant for merging.

Code that adheres to the Dependency Inversion Principle is much easier to unit test. And a good suite of unit tests is an invaluable tool when merging lots of changes between parallel branches. This is because it is quite easily possible for merged code to compile and yet still be broken. When merges between branches are taking place on a weekly basis, there is simply not enough time to run a full manual regression test each time. Unit tests can step into the breach and give a high level of confidence that a merge has not broken existing features. This benefit alone repays the time invested in creating unit tests.

In summary, SOLID code is mergeable code. SOLID code is also unit testable code, and merges verified by unit tests are the safest kind of merges. If your development process involves working on parallel branches, you will save yourself a lot of pain by mastering the SOLID principles, and protecting legacy features with good unit test coverage.

Thursday, 16 June 2011

Essential Developer Principles #2–Don’t Reinvent the Wheel

For some reason, developers have a tendency towards a “not invented here” mentality. Sometimes it is simply because we are unaware of any pre-existing solutions that address out problem. But very often, we suspect that it will be far too complicated to use a third-party framework, and much quicker and simpler to write our own. After all, if we write our own we can make it work exactly how we want it to.

Reinventing the Framework

The arguments for reinventing the wheel initially seem compelling, mainly because most programming problems seem a lot simpler at first than they really are.

So we head off to create our own chart library, our own deployment mechanism, our own XML parser, our own 3D rendering engine, our own unit test framework, our own database, and so on.

Months later we realise that our own custom framework is incomplete, undocumented and bug ridden. Instead of working on features that our customers have actually asked for, all our effort and attention has been diverted into maintaining these supporting libraries.

It is almost always the right choice to take advantage of a mature, well documented, existing framework, instead of deciding to create a bespoke implementation.

Repeatedly Reinventing

Another variation on the reinventing the wheel problem is when a single (often enterprise scale) application contains multiple solutions to the same problem. For example, there might be two separate bits of code devoted to reading and writing CSV files. Or several different systems for messaging between threads. This usually happens because the the logic is so tightly coupled to its context that we can’t reuse it even if we want to.

... except when ...

As with most developer principles, this one has some exceptions. I think there are three main cases in which it is OK to reinvent the wheel. These are…

1. Reinvent to learn

We don’t need another blog engine, but we do need more developers who know how to build one. Likewise, we probably don’t need yet another ORM, or yet another IoC container, or yet another MVVM or MVC framework. But the process of building one, even if it is never completed, is an invaluable learning exercise. So go ahead, invent your own version control system, encryption algorithm or operating system. Just don’t use it in your next commercial product unless you have a lot of spare time on your hands.

2. Reinvent a simpler wheel

There are times when the existing offerings are so powerful and feature-rich that it seems overkill to drag a huge framework into your application just to use a fraction of its powers. In these cases it might make sense to make your own lean, stripped down, focused component that does only what you need.

However, beware of feature creep, and beware of problems that seem simple until you begin to tackle them. A good example of creating simpler alternatives is Micro-ORMs, which are just a few hundred lines of code, stripped down to the bare essentials of what is actually needed for the task at hand.

3. Reinvent a better wheel

There are those rare people who can look at the existing frameworks, see a limitation with them, imagine a better alternative, and build it. It takes a lot of time to pull this off, so it doesn’t make sense if you only need it for one project. If however, you are building a framework to base all the applications you ever write on, there is a chance that it will pay for itself over time.

An example of this in action is FubuMVC, an alternative to ASP.NET MVC that was designed to work exactly the way its creators wanted. The key is to realise that they have gone on to build a lot of commercial applications on that framework.

In summary, don’t reinvent the wheel unless you have a really good reason to. You’ll end up wasting a lot of time and resources on development that is only tangentially related to your business goals.

Wednesday, 15 June 2011

Essential Developer Principles #1–Divide and Conquer

With this post I am starting a new series in which I intend to cover a miscellany of developer ‘principles’. They are in no particular order, and I’m not promising any great regularity but I am intending to use some of the material I present here as part of the developer training where I work, so please feel free to suggest improvements or offer counterpoints.

The essence of software development is divide and conquer. You take a large problem, and break it into smaller pieces. Each of those smaller problems is broken into further pieces until you finally get down to something you can actually solve with lines of code.

If you can’t break problems down into constituent parts, you don’t have what it takes to be a programmer. Apparently some who purport to be developers can’t even solve trivial problems like FizzBuzz. They can’t see that to solve FizzBuzz (the “big” problem), all they need to do is solve a few much simpler problems (count from 1 to 100, see if a number is divisible by three, see if a number is divisible by 5, print a message). Once you have done that, the pieces aren’t too hard to put together into a solution to the big problem.

Cutting the Cake

But being a good developer is not just about being able to decompose problems. If you need to cut a square birthday cake into four equal pieces there are several ways of doing it. You could cut it into four squares, four triangles, or four rectangles, to name just a few of the options. Which is best? Well that depends on whether there are chocolate buttons on top of the cake. You’d better make sure everyone gets an equal share of those too or there will be trouble.

A good developer doesn’t just see that a problem can be broken down; they see several ways to break it down and select the most appropriate.

For example, you can slice your ASP.NET application up by every web page having all its logic and database access in the code behind – vertical slices if you like. Or you can slice it up in a different way and have your presentation, your business logic, and your database access as horizontal slices. Both are ways of taking a big problem and cutting it into smaller pieces. Hopefully I don’t need to tell you which of those is a better choice.

Keep Cutting

Alongside the mistake of cutting a problem up in the wrong way likes the equally dangerous mistake of failing to cut the problem up small enough.

In the world of .NET you might decide that for a particular problem you need two applications – a client and a server. Then you decide that the client needs two assemblies, or DLLs. And inside each of those modules you create a couple of classes.

All this is well and good, but it means that your tree of hierarchy has only got three levels. As your implementation progresses, something has to receive all the new code you will write. Unless you are willing to break the problem up further, what will happen is that those classes on the third level will grow to contain hundreds of long and complicated methods. In other words, our classes bloat:

It doesn’t need to be this way. We can almost always divide the problem that a class solves into two or more smaller, more focussed problems. Understanding the divide and conquer principle means we keep breaking the problem down until we have classes that adhere to the “single responsibility principle” – they do just one thing. These classes will be made up of a small number of short methods. Essentially, we add extra levels to our hierarchy, each one breaking the problem down smaller until we reach an appropriately simple, understandable, and testable chunk:

I’ve not shown it on my diagram, but an important side benefit is that a lot of the small components we create when we break our problems up like this turn out to be reusable. We’ve actually made some of the other problems in our application easier to solve, by virtue of breaking things down to an appropriate level of granularity. There are more classes, but less code.

And that’s it. Divide and conquer is all it takes to be a programmer. But to be a great programmer, you need to know where to cut, and to keep on cutting until you’ve got everything into bite-sized chunks.