Tuesday, 14 September 2010

Codebase Reformation

I am currently looking into how the architecture of a very large software product (now over 1 million lines of code) can be improved to make it more maintainable and extensible into the future. Inevitably on a project of its size, many of the assumptions and expectations held at the beginning have not proved to be correct. The product has been extended in ways that no one could have foreseen. Sometimes large amounts of technical debt has been introduced as we rushed to meet a vital deadline. New technologies have emerged (it was started as a .NET 1.1 product) which are more suited to the task than anything that was around back then. And additionally, those of us on the development team have grown as programmers during the time. What looked like good code to us back then now looks decidedly poor.

So my task over the last few months has been to produce a document of recommendations about what architectural changes should be made to improve things. I have a good idea of where we are now, and where we ought to be going. But there is one thing that concerns me, and it is summed up well in the following quote:

The reformer is always right about what is wrong. He is generally wrong about what is right. —G.K. Chesterton

In other words, it is one thing to look at a codebase and observe all the things that are wrong about it. It is another thing entirely to know what the correct solution is. Experience tells us that simply adopting a new technology (“let’s use IoC containers”, “let’s use WPF”) will typically solve one set of problems but introduce another set.

The correct approach in my view is to recognise that we are trying to move between two moving targets. In other words, “where we are” is always moving, since any living codebase is being continually worked on. But also, “where we want to be” is also a moving target, as we grow in our understanding of what the system needs to do, and what constitutes a well-architected system.

Bearing this in mind, it is a mistake therefore to imagine that you can, or should, attempt to “fix” the architecture of a large system in one gigantic refactoring session. There may be a case for making certain major changes to prepare the way for specific new features, and address significant areas of technical debt, but in my view, the best approach to codebase reformation is continual refactoring, allowing our vision of where we are heading to be modified as our horizons expand.

Wednesday, 14 July 2010

Running NUnit tests on .NET 4 assemblies with MSBuild Part 2

Yesterday, I posted a way to run NUnit tests on .NET 4 assembles with MSBuild. Although it worked, I was not 100% happy with the way of going about it. Thanks to a comment from Travis Laborde, I can now present a simpler way, making use of the exec task in MSBuild and the /framework switch on nunit-console.exe. Here’s the project file syntax:

<Target Name="Test2" DependsOnTargets="Build">
  <exec 
     command="&quot;C:\Program Files\NUnit 2.5.5\bin\net-2.0\nunit-console.exe&quot; /framework=4.0.30319 @(TestAssembly)">
     WorkingDirectory="."
  </exec>
</Target>

Tuesday, 13 July 2010

Running NUnit tests on .NET 4 assemblies with MSBuild

If you want to run NUnit tests with MSBuild, then you need to download and install the MSBuild Community tasks. This allows you to create a unit test target like this:

<ItemGroup>
  <TestAssembly Include="ClientBin\*Tests.dll" />
</ItemGroup>
<Target Name="Test" DependsOnTargets="Build">
  <NUnit Assemblies="@(TestAssembly)"
         WorkingDirectory="."
         ToolPath="C:\Program Files\NUnit 2.5.5\bin\net-2.0"
         />
</Target>

(Most examples don’t show the need to specify a ToolPath, but I have found I need it, perhaps because it isn’t in my Path?)

The trouble is, if the assemblies containing the unit tests have been build with the .NET 4 framework, you will get the following error:

C:\Program Files\NUnit 2.5.5\bin\net-2.0\nunit-console.exe /nologo ClientBin\
Nice.Inform.Client.Tests.dll
ProcessModel: Default    DomainUsage: Single
Execution Runtime: net-2.0
Unhandled Exception:
System.BadImageFormatException: Could not load file or assembly 'D:\TFS\Trial
\Inform5\ClientBin\Client.Tests.dll' or one of its dependencies.
This assembly is built by a runtime newer than the currently loaded runtime a
nd cannot be loaded.
File name: 'D:\TFS\Trial\Inform5\ClientBin\Client.Tests.dll'

What is needed is to ensure that nunit-console.exe runs against the .NET 4 framework. The way I achieved this was to make a copy of the net-2.0 folder that comes with NUnit 2.5.5 and rename it to net-4.0 (n.b. make sure the lib folder comes along too as nunit-console.exe depends on its contents). Then, I edited the nunit-console.exe.config file to have the following contents (the key bit is to add the supportedRuntime setting):

<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <!-- Set the level for tracing NUnit itself -->
  <!-- 0=Off 1=Error 2=Warning 3=Info 4=Debug -->
  <system.diagnostics>
    <switches>
       <add name="NTrace" value="0" />
    </switches>
  </system.diagnostics>

  <startup>
    <supportedRuntime version="v4.0"/>
  </startup>

  <runtime>
    <!-- We need this so test exceptions don't crash NUnit -->
    <legacyUnhandledExceptionPolicy enabled="1" />

    <!-- Look for addins in the addins directory for now -->
    <assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
      <probing privatePath="lib;addins"/>
   </assemblyBinding>
  </runtime>
</configuration>

Now all that is needed is to update the ToolPath in the MSBuild script to point to the new location

<Target Name="Test" DependsOnTargets="Build">
  <NUnit Assemblies="@(TestAssembly)"
        WorkingDirectory="."
        ToolPath="C:\Program Files\NUnit 2.5.5\bin\net-4.0"
   />
</Target>

I’d be interested in hearing if there is an easier way of solving this problem though, as I don’t really want to have to require all developers to manually perform these steps on their machine.

Thursday, 13 May 2010

Converting MP3 to WAV with NAudio

One of the more common support requests with NAudio is how to convert MP3 files to WAV. Here’s a simple function that will do just that:

public static void Mp3ToWav(string mp3File, string outputFile)
{
    using (Mp3FileReader reader = new Mp3FileReader(mp3File))
    {
        using (WaveStream pcmStream = WaveFormatConversionStream.CreatePcmStream(reader))
        {
            WaveFileWriter.CreateWaveFile(outputFile, pcmStream);
        }
    }
}

… and that’s all there is to it.

Notes:

  1. There is no need to wrap pcmStream with a BlockAlignReductionStream since we are not repositioning the MP3 file, just reading it from start to finish.
  2. It uses the ACM MP3 decoder that comes with Windows to decompress the MP3 data.
  3. To be able to pass the MP3 data to the ACM converter we need to do some parsing of the MP3 file itself (specifically the MP3 frames and ID3 tags). Unfortunately, there are some MP3 files that NAudio cannot parse (notably the sample ones that come with Windows 7). My attempts at tracking down the cause of this problem have so far failed. (the basic issue is that after reading the ID3v2 tag, we don’t end up in the right place in the MP3 file to read a valid MP3Frame – see MP3FileReader.cs). Please let me know if you think you have a solution to this.

Wednesday, 12 May 2010

LINQ to General Election Part 2 – LINQ to XLS

In my previous post, I demonstrated how to run LINQ queries against the UK general election data the Guardian newspaper in JSON format. They additionally made a Google spreadsheet available which not only has the winners in every constituency but the number of votes for each candidate that stood.

Now it might be possible to run LINQ queries directly against the Google spreadsheet using the Google Data APIs (perhaps an experiment for another day), but I chose instead to download the spreadsheet as a Microsoft Excel spreadsheet and run my LINQ queries against that instead.

First of all, I needed a LINQ to Excel provider and found linqtoexcel. Unfortunately, it requires you to reference several DLLs, but once you have done so, it is pretty straightforward to use.

Mapping the Spreadsheet

One nice feature of linqtoexcel is that it makes it easy to map your spreadsheet data onto a strongly typed class. By default it will simply look for properties matching the column names. Here’s my election result class:

public class Result
{
    public string Seat { get; set; }
    public string Candidate { get; set; }
    public string Party { get; set; }
    public int Vote { get; set; }
    public double VotePercent { get; set; }
    public string Winner { get; set; }
}

Now I can open the XLS file using the ExcelQueryFactory class and I also need to add a custom mapping since the title of the VotePercent column is not a valid C# identifier. Note that you don’t have to map all the fields. I have just picked out the six that I needed for my queries.

var repo = new ExcelQueryFactory();
repo.FileName = @"..\..\..\Data\General election 2010 results.xls";
repo.AddMapping<Result>(x => x.VotePercent, "%Vote");
Querying the Data

Having done this, we are now in a position to perform any queries. We can specify the worksheet name from which our data comes (in this case “FULL CONSTITUENCY DATA”). Having the use of strongly typed classes makes the LINQ much more readable than in the JSON example.

var unluckyLosers = from p in repo.Worksheet<Result>("FULL CONSTITUENCY DATA")
                       where (p.Party != p.Winner) && (p.VotePercent > 42)
                       orderby p.VotePercent ascending
                       select p;
Console.WriteLine("Unlucky Losers:");
foreach (var winner in unluckyLosers)
{
    Console.WriteLine(String.Format("{2}%: {0} ({1})", winner.Candidate, winner.Party, winner.VotePercent));
}

Unfortunately, there seems to be a bug in that the orderby clause seems to be ignored. I tried a few different sort fields but with no success. The output from the above query is as follows:

Unlucky Losers:
44.52%: King, Nick (C)
45.51%: Connor, Rodney (Ind)
43.79%: Kelley, Claire (LD)
42.11%: Dismore, Andrew (Lab)
42.01%: Harris, Evan (LD)
42.81%: Kramer, Susan (LD)
42.55%: Throup, Maggie (C)
44.51%: Rees-Mogg, Annunziata (C)
42.36%: Stroud, Philippa (C)
42.2%: Formosa, Mark (C)
42.53%: Heathcoat-Amory, David (C)
43.08%: Tod, Martin (LD)

Patently, they have not been sorted in the requested order. I have filed a bug against linqtoexcel. Hopefully it can be resolved soon. Update: A fix was released shortly after my bug report - linqtoexcel 1.3.70 works fine.

Tuesday, 11 May 2010

LINQ to General Election Part 1 – LINQ to JSON

I discovered this week that the Guardian newspaper is making the UK General Election results available in JSON format. Their politics API gives access to all kinds of results, party and candidate information.

What I wanted to do was to be able to quickly and easily perform some of my own queries on the election results. Having not worked with JSON, I needed something that could parse it and allow me to run LINQ queries on it. My search led me to Json.NET, which is a very easy to use library that allows you to parse JSON into an object that you can run LINQ queries against.

Visualising JSON

When working with JSON, you need to have some kind of tool that will let you explore the structure of the JSON data. I found a handy utility called JsonViewer that parses it into a tree format. Alternatively, you can use this online JSON formatter.

Parsing JSON

The first step is to download the JSON and parse it into a JObject:

string url = "http://www.guardian.co.uk/politics/api/general-election/2010/results/json";
WebClient wc = new WebClient();
string json = wc.DownloadString(url); 
JObject o = JObject.Parse(json);

LINQ Queries

Now it is straightfoward to write queries against the object. Notice that you can use Children() to select all the children under a specific node (useful for arrays).

var unpopularWinners = from winningMp in o["results"]["called-constituencies"].Children()["result"]["winning-mp"]
    where (decimal)winningMp["votes-as-percentage"] < (100 / 3M)
    orderby (decimal)winningMp["votes-as-percentage"] ascending
    select winningMp;

To get the values out of the instances of JToken returned, you have to remember to cast them to the appropriate type (e.g. int, string, decimal):

Console.WriteLine("Unpopular Winners:");
foreach (var winner in unpopularWinners)
{
    Console.WriteLine("{0:F1}% {1} ({2})", (decimal)winner["votes-as-percentage"], (string)winner["name"], (string)winner["party"]["name"]);
}

LINQ to JSON seems to support the full range of LINQ operations, including grouping:

var seatsPerParty = from winningMp in o["results"]["called-constituencies"].Children()["result"]["winning-mp"]
    group winningMp by (string)winningMp["party"]["name"] into g
    orderby g.Count() descending
    select new { Party = g.Key, Seats = g.Count(), TotalVotes = g.Sum(c => (int)c["votes-as-quantity"]) 
};

Unfortunately, the Guardian do not provide one JSON file that contains the votes for all candidates, although it looks like you can get that information if you follow another link for each one. However, they did make the full results available in a Google spreadsheet, so I plan to follow up shortly with another post on how to perform LINQ queries against Excel data.

Tuesday, 20 April 2010

Technical Debt Interest Rates

I’ve been thinking a lot about “technical debt” recently. The metaphor is a helpful one – every poor design decision, every compromise, every quick and dirty hack results in technical debt which must be repaid at some point in the future.

Or must it? The reality is that some technical debt, possibly quite a large percentage of it, never actually incurs substantial repayments. This is quite different from real-world financial debt, which must always eventually be paid in full.

Consider the following example. You rush to get feature X out the door. A customer is paying big money for it, and you end up making all sorts of compromises to get it done in time. The code is a mess, a maintenance nightmare. You dread the day when you have to revisit it to make a tweak or add another feature. But that day never comes. For whatever reason, no support cases come in, no enhancements are requested. Maybe the customer never used it at all. What has happened is that you have effectively been let off your debt. Ironically, the decision to do it the “quick way” rather than the “right way” turned out to be a smart move.

At the opposite end of the spectrum there is that code change you rushed in to meet a relatively unimportant milestone. You knew at the time it wasn’t quite the best way, but it worked, and that’s all that really matters, right? The trouble comes when on top of that rather shaky foundation is built layer upon layer of new code, all inheriting its inherent weaknesses and design flaws. Maybe at first the interest rate on that debt is quite low – it doesn’t slow you down too much. But several years later, once it is far too late to go back on the original decision, you begin paying dearly for the technical debt you introduced.

The basic trouble is, when you introduce technical debt you typically have no idea what interest rate you are borrowing at. If you are lucky, its 0% – and your shortcut won’t cost you a penny. In fact, it could save the company time and money. In other cases it will be a modest 5-10%. Future development is slowed down a little by your past indiscretions, but not by much. But sometimes you take out a debt whose interest rate is astronomical (or more commonly, starts out small but grows rapidly). It is not uncommon for it to reach to levels above 500%. In other words, new development is taking five times longer than it ought to simply because of the mess that the codebase is in.

I think technical debt is inevitable. Developers, let alone management, simply do not have the foresight to see the full future implications of all their decisions. But the moment you detect that you are building heavily on top of code that contains technical debt, alarm bells should be ringing. Address it early or it will be too late.