Sound Code: March 2014

Friday, 28 March 2014

How to record and play audio at the same time with NAudio

Quite often I get questions from people who would like to play audio that they are receiving over the network, or recording from the microphone, but also want to save that audio to WAV at the same time. This is actually quite easy to achieve, so long as you think in terms of a “signal chain” (which is something I talk about a lot in my audio courses on Pluralsight).

Basically, the usual strategy I recommend for playing audio that you receive over the network or from the microphone is to put it into a BufferedWaveProvider. You fill it with (PCM) audio as it becomes available, and then in its Read method, it returns the audio, or silence if the buffer is empty.

Normally, you’d pass the BufferedWaveProvider directly to the IWavePlayer device (such as WaveOut, but to implement save to WAV, we’ll first wrap it in a new signal chain component that we’ll create for this purpose. We’ll call it “SavingWaveProvider” and it will implement IWaveProvider. In it’s Read method, it will read from it’s source wave provider (the BufferedWaveProvider) in our case, and write to a WAV file before we pass it on.

We’ll dispose the WaveFileWriter if we read 0 bytes from the source wave provider, which should normally indicate we have reached the end of playback. But we also make the whole class Disposable, since BufferedWaveProvider is set up to always return the number of bytes we asked for in Read, so it will never reach the end itself.

Here’s the code for SavingWaveProvider:

class SavingWaveProvider : IWaveProvider, IDisposable
{
    private readonly IWaveProvider sourceWaveProvider;
    private readonly WaveFileWriter writer;
    private bool isWriterDisposed;

    public SavingWaveProvider(IWaveProvider sourceWaveProvider, string wavFilePath)
    {
        this.sourceWaveProvider = sourceWaveProvider;
        writer = new WaveFileWriter(wavFilePath, sourceWaveProvider.WaveFormat);
    }

    public int Read(byte[] buffer, int offset, int count)
    {
        var read = sourceWaveProvider.Read(buffer, offset, count);
        if (count > 0 && !isWriterDisposed)
        {
            writer.Write(buffer, offset, read);
        }
        if (count == 0)
        {
            Dispose(); // auto-dispose in case users forget
        }
        return read;
    }

    public WaveFormat WaveFormat { get { return sourceWaveProvider.WaveFormat; } }

    public void Dispose()
    {
        if (!isWriterDisposed)
        {
            isWriterDisposed = true;
            writer.Dispose();
        }
    }
}

And here’s how you use it to both play and save audio at the same time (note this is very simplified WPF app with two buttons and no checks that you don’t press Start twice in a row etc). The key is that we pass the BufferedWaveProvider into the constructor of the SavingWaveProvider, and then pass that to waveOut.Init. Then all we need to do is make sure we dispose SavingWaveProvider so that the WAV file header gets written correctly:

public partial class MainWindow : Window
{
    private WaveIn recorder;
    private BufferedWaveProvider bufferedWaveProvider;
    private SavingWaveProvider savingWaveProvider;
    private WaveOut player;

    public MainWindow()
    {
        InitializeComponent();
    }

    private void OnStartRecordingClick(object sender, RoutedEventArgs e)
    {
        // set up the recorder
        recorder = new WaveIn();
        recorder.DataAvailable += RecorderOnDataAvailable;

        // set up our signal chain
        bufferedWaveProvider = new BufferedWaveProvider(recorder.WaveFormat);
        savingWaveProvider = new SavingWaveProvider(bufferedWaveProvider, "temp.wav");

        // set up playback
        player = new WaveOut();
        player.Init(savingWaveProvider);

        // begin playback & record
        player.Play();
        recorder.StartRecording();
    }

    private void RecorderOnDataAvailable(object sender, WaveInEventArgs waveInEventArgs)
    {
        bufferedWaveProvider.AddSamples(waveInEventArgs.Buffer,0, waveInEventArgs.BytesRecorded);
    }

    private void OnStopRecordingClick(object sender, RoutedEventArgs e)
    {
        // stop recording
        recorder.StopRecording();
        // stop playback
        player.Stop();
        // finalise the WAV file
        savingWaveProvider.Dispose();
    }
}

This technique isn’t only for saving audio that you record or receive over the network. It’s also a great way to get a copy of the audio you just played, which is very handy when you want to troubleshoot audio issues and want to get a copy of the exact audio that was sent to the soundcard. You can even insert many of these at different places in your signal chain, to hear what the audio sounded like earlier in the signal chain.

Friday, 14 March 2014

Detecting Mouse Hover Over ListBox Items in WPF

I’m a big fan of using MVVM in WPF and for the most part it works great, but still I find myself getting frustrated from time to time that data binding tasks that ought to be easy seem to require tremendous feats of ingenuity as well as encyclopaedic knowledge of the somewhat arcane WPF data binding syntax. However, my experience is that you can usually achieve whatever you need to if you create an attached behaviour.

In this instance, I wanted to detect when the mouse was hovered over an item in a ListBox, so that my ViewModel could perform some custom actions. The ListBox was of course bound to items in a ViewModel, and the items had their own custom template. You may know that unfortunately you can’t bind a method on your ViewModel to an event handler, or this would be straightforward.

My solution was to create an attached behaviour that listened to both the MouseEnter and MouseLeave events for the top level element in my ListBoxItem template. Both will call the Execute method of the ICommand you are bound to. When the mouse enters the ListBoxItem, it passes true as the parameter, and when it exits, it passes false. Here’s the attached behaviour:

public static class MouseOverHelpers
{
    public static readonly DependencyProperty MouseOverCommand =
        DependencyProperty.RegisterAttached("MouseOverCommand", typeof(ICommand), typeof(MouseOverHelpers),
                                                                new PropertyMetadata(null, PropertyChangedCallback));

    private static void PropertyChangedCallback(DependencyObject dependencyObject, DependencyPropertyChangedEventArgs args)
    {
        var ui = dependencyObject as UIElement;
        if (ui == null) return;

        if (args.OldValue != null)
        {
            ui.RemoveHandler(UIElement.MouseLeaveEvent, new RoutedEventHandler(MouseLeave));
            ui.RemoveHandler(UIElement.MouseEnterEvent, new RoutedEventHandler(MouseEnter));
        }

        if (args.NewValue != null)
        {
            ui.AddHandler(UIElement.MouseLeaveEvent, new RoutedEventHandler(MouseLeave));
            ui.AddHandler(UIElement.MouseEnterEvent, new RoutedEventHandler(MouseEnter));
        }
    }

    private static void ExecuteCommand(object sender, bool parameter)
    {
        var dp = sender as DependencyObject;
        if (dp == null) return;

        var command = dp.GetValue(MouseOverCommand) as ICommand;
        if (command == null) return;

        if (command.CanExecute(parameter))
        {
            command.Execute(parameter);
        }
    }

    private static void MouseEnter(object sender, RoutedEventArgs e)
    {
        ExecuteCommand(sender, true);
    }

    private static void MouseLeave(object sender, RoutedEventArgs e)
    {
        ExecuteCommand(sender, false);
    }

    public static void SetMouseOverCommand(DependencyObject o, ICommand value)
    {
        o.SetValue(MouseOverCommand, value);
    }

    public static ICommand GetMouseOverCommand(DependencyObject o)
    {
        return o.GetValue(MouseOverCommand) as ICommand;
    }
}

And here’s how you would make use of it in a ListBoxItem template:

<ControlTemplate TargetType="ListBoxItem">
    <Border Name="Border"
           my:MouseOverHelpers.MouseOverCommand="{Binding MouseOverCommand}">
        <Image Source="{Binding ImageSource}" Width="32" Height="32" Margin="2,0,2,0"/>
    </Border>
</ControlTemplate>

This is essentially a specialised version of the generic approach described here for binding an ICommand to any event. My version simply saves you needing a separate command for MouseEnter and MouseLeave.

Friday, 7 March 2014

Python Equivalents of LINQ Methods

In my last post, I looked at how Python’s list comprehensions and generators allow you to achieve many of the same tasks that you would use LINQ for in C#. In this post, we’ll look at Python equivalents for some of the most popular LINQ extension methods. We’ll mostly be looking at Python’s built-in functions and itertools module.

For these examples, our test data will be a list of fruit. But all of these techniques work with any interable, including the output of generator functions. Here’s our Python test data

fruit = ['apple', 'orange', 'banana', 'pear', 
         'raspberry', 'peach', 'plum']

Which of course in C# is

var fruit = new List<string>() { "apple", "orange",
 "banana", "pear", "raspberry", "peach", "plum" };

Any & All

LINQ’s Any method allows you to test whether any of the items in a sequence fulfil a certain requirement, while All checks if all of them do. Python’s built-in functions are named the same, so it’s really straightforward. Let’s see if any of our fruit contain the letter “e”, then see if all of them do:

>>> any("e" in f for f in fruit)
True
>>> all("e" in f for f in fruit)
False

in LINQ:

fruit.Any(f => f.Contains("e"));
fruit.All(f => f.Contains("e"));

Min & Max

Again, Python has built-in functions similarly named to LINQ. Let’s find the minimum and maximum fruit lengths:

>>> max(len(f) for f in fruit)
9
>>> min(len(f) for f in fruit)
4

which are the equivalents of:

fruit.Max(f => f.Length);
fruit.Min(f => f.Length);

Take, Skip, TakeWhile & SkipWhile

LINQ’s Take and Skip methods are very useful for paging data, or limiting the amount you process, and TakeWhile and SkipWhile come in handy from time to time as well (TakeWhile can be a good way of checking for user cancellation).

Take and Skip can be implemented using the itertools islice function. We can specify an end index, or a start and end index. If the end index is None, that means keep going to the end of the iterable. I’d prefer methods actually called “skip” and “take” as I think that makes for more readable code, but they could be easily created if needed.

Here’s Take(2) and Skip(2) implemented with Python. Since islice returns a generator function, I turn it into a list for debugging purposes:

>>> from itertools import islice
>>> list(islice(fruit, 2))
['apple', 'orange']
>>> list(islice(fruit, 2, None))
['banana', 'pear', 'raspberry', 'peach', 'plum']

islice does have the benefit though of letting you combine a skip and a take into one step rather than chaining them like you would in C#:

fruit.Skip(2).Take(2);

with islice:

>>> list(islice(fruit, 2, 4))
['banana', 'pear']

The itertools module does include a “takewhile” method and for LINQ’s SkipWhile, it’s “dropwhile”. With these functions, you might want to use Python’s lambda syntax, which is a rare example of where the Python is less succinct than C#.

>>> from itertools import takewhile
>>> list(takewhile(lambda c: len(c) < 7, fruit))
['apple', 'orange', 'banana', 'pear']
>>> from itertools import dropwhile
>>> list(dropwhile(lambda c: len(c) < 7, fruit))
['raspberry', 'peach', 'plum']

Here’s the same TakeWhile and SkipWhile in C#:

fruit.TakeWhile (f => f.Length < 7);
fruit.SkipWhile (f => f.Length < 7);

First, FirstOrDefault, & Last

With LINQ you can easily get the first item from an IEnumerable. This throws an exception if the sequence is empty, so FirstOrDefault can be used alternatively. With Python, the “next” method can be used on an iterable (but not on a list). Let’s use Python to get the first fruit starting with “p” and to return a default value when our generator looking for the first fruit starting with “q” doesn’t find any elements.

>>> next(f for f in fruit if f.startswith("p"))
'pear'
>>> next((f for f in fruit if f.startswith("q")), "none")
'none'

There does not seem to be any built-in Python function to implement LINQ’s “Last” or “LastOrDefault” methods, but you could quite easily create one. Here’s a fairly rudimentary one:

>>> def lastOrDefault(sequence, default=None):
...     lastItem = default
...     for s in sequence:
...         lastItem = s
...     return lastItem
...
>>> lastOrDefault((f for f in fruit if f.endswith("e")))
'orange'
>>> lastOrDefault((f for f in fruit if f.startswith("x")), "no fruit found")
'no fruit found'

You could do the same if you really needed the LINQ “Single” or “SingleOrDefault” methods, which also have no direct equivalent.

Count

The LINQ Count extension method lets you count how many items are in a sequence. For example, how many fruit begin with ”p”?

fruit.Count(f => f.StartsWith("p"))

Probably the most logical expectation would be that Python’s “len” function would do the same, but you can’t call len on an iterable. There is a neat trick though you can use with the “sum” built-in function.

>>> sum(1 for f in fruit if f.startswith("p"))
3

Select & Where

We saw in the last blog post that a list comprehension already includes the capabilities of LINQ’s Select and Where, but there may be times you want to them to be available as functions. Python’s “map” and “filter” function take an iterable and a lamba and return an iterator (this is Python 3 only – in Python 2 they returned lists). Here’s a couple of simple examples of them in action, with the output turned into a list for debug purposes:

>>> list(map(lambda x: x.upper(), fruit))
['APPLE', 'ORANGE', 'BANANA', 'PEAR', 'RASPBERRY', 'PEACH', 'PLUM']
>>> list(filter(lambda x: "n" in x, fruit))
['orange', 'banana']

GroupBy

At first glance it might appear that itertools groupby method behaves the same as LINQ’s GroupBy, but there is a gotcha. Python’s groupby expects the incoming data to be sorted by the key, so you have to call sorted first. This example shows us first trying to group without sorting (resulting in two “p” groups), and then doing it the right way. We’re grouping by first letter of the fruit, and I’m using a helper method to print out the contents of the grouped data:

>>> def printGroupedData(groupedData):
...     for k, v in groupedData:
...         print("Group {} {}".format(k, list(v)))
...
>>> from itertools import groupby
>>> keyFunc = lambda f: f[0]
>>> printGroupedData(groupby(fruit, keyFunc))
Group a ['apple']
Group o ['orange']
Group b ['banana']
Group p ['pear']
Group r ['raspberry']
Group p ['peach', 'plum']
>>> sortedFruit = sorted(fruit, key=keyFunc)
>>> printGroupedData(groupby(sortedFruit, keyFunc))
Group a ['apple']
Group b ['banana']
Group o ['orange']
Group p ['pear', 'peach', 'plum']
Group r ['raspberry']

OrderBy

As we saw above, the “sorted” built-in function in Python can be used to order a sequence. It returns a list, but this is understandable since to implement OrderBy it must iterate through the entire sequence first. Here we sort the fruit by their string length:

>>> sorted(fruit, key=lambda x:len(x))
['pear', 'plum', 'apple', 'peach', 'orange', 'banana', 'raspberry']

Distinct

As far as I can tell there isn’t a built-in function in Python to emit a distinct iterable sequence, but the easiest way is probably to just construct a set. If you wanted to create a generator function, allowing you to abort early before reaching the end of a sequence, you could create your own helper method:

def distinct(sequence):
    seen = set()
    for s in sequence:
        if not s in seen:
            seen.add(s)
            yield s

Zip

The last example I’ll look at is the Zip method. In Python there is an equivalent zip function, and it is actually a little simpler as it assumes you want a tuple, rather than LINQ’s where you need to explicitly create a result selector function. It actually supports zipping more than two sequences together which is nice. As with LINQ’s Zip, the resulting sequence is the length of the shortest. Here’s a quick example of the Python zip function in action:

>>> recipes = ['pie','juice','milkshake']
>>> list(zip(fruit,recipes))
[('apple', 'pie'), ('orange', 'juice'), ('banana', 'milkshake')]
>>> list(f + " " + r for f,r in zip(fruit,recipes))
['apple pie', 'orange juice', 'banana milkshake']

Conclusion

As can be seen, most of the main LINQ extension methods have fairly close Python equivalents, and those that don’t could be quite easily recreated. I don’t pretend to be an expert on Python, so if I’ve missed any cool tricks, let me know in the comments.

Thursday, 6 March 2014

Python List Comprehensions and Generators for C# Developers

If you’re a C# programmer and you’ve used LINQ, you’ll know how powerful it is to allow you to manipulate sequences of data in all kinds of interesting ways, without needing to write for loops. Python has similar capabilities, using what are called “list comprehensions” and “generators”. In this post, I’ll demonstrate how they work, showing them side by side with roughly equivalent C# code.

List Comprehensions

A list comprehension in Python allows you to create a new list from an existing list (or as we shall see later, from any “iterable”).

Let’s start with a simple example at the Python REPL. Here we create a list, that contains the square of each number returned by the range function (which in this case returns 0,1,2,…9)

>>> [x*x for x in range(10)]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

This is equivalent to a C# LINQ statement that takes a range (using Enumerable.Range), selects the square (using Select), and then turns the whole thing into a list (using ToList):

Enumerable.Range(0, 10).Select(x => x*x).ToList();

Python list comprehensions also allow you to filter as you go, by inserting an “if” clause. Here, we’ll only take the squares of odd numbers:

>>> [x*x for x in range(10) if x%2]
[1, 9, 25, 49, 81]

This is equivalent to chaining a Where clause into our LINQ statement:

Enumerable.Range(0, 10).Where(x => x%2 != 0)
    .Select(x => x*x).ToList();

You can actually have two “for” clauses inside your list comprehension, so you could create some coordinates as a tuple like this:

>>> coords = [(x,y) for x in range(4) for y in range(4)]
[(0, 0), (0, 1), (0, 2), (0, 3), 
 (1, 0), (1, 1), (1, 2), (1, 3), 
 (2, 0), (2, 1), (2, 2), (2, 3), 
 (3, 0), (3, 1), (3, 2), (3, 3)]

The same effect can be achieved using the SelectMany clause in LINQ:

Enumerable.Range(0,4).SelectMany(x => Enumerable.Range(0,4)
    .Select(y => new Tuple<int,int>(x,y))).ToList();

You can see that the LINQ gets a little cumbersome at this point, although you can use the alternative syntax:

from x in Enumerable.Range(0,4)
from y in Enumerable.Range(0,4)
select new Tuple<int,int>(x,y)

Here's another Python list comprehension with two for expressions, making a list of all the spaces on a chessboard

>>> [x + str(y+1) for x in "ABCDEFGH" for y in range(8)]
['A1', 'A2', 'A3', 'A4', 'A5', 'A6', 'A7', 'A8', 
 'B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B8',
 'C1', 'C2', 'C3', 'C4', 'C5', 'C6', 'C7', 'C8', 
 'D1', 'D2', 'D3', 'D4', 'D5', 'D6', 'D7', 'D8', 
 'E1', 'E2', 'E3', 'E4', 'E5', 'E6', 'E7', 'E8', 
 'F1', 'F2', 'F3', 'F4', 'F5', 'F6', 'F7', 'F8', 
 'G1', 'G2', 'G3', 'G4', 'G5', 'G6', 'G7', 'G8', 
 'H1', 'H2', 'H3', 'H4', 'H5', 'H6', 'H7', 'H8']

And in C#, you'd do something like:

"ABCDEFGH".SelectMany(x => Enumerable.Range(1,8)
    .Select(y => x+y.ToString())).ToList()

Dictionaries and Sets

You don't actually have to create lists. Python lets you use a similar syntax to create a set (no duplicate elements), or a dictionary. Here we'll start with a list of fruit, then use a list comprehension to make a list of string lengths. Then we'll make a set of unique fruit lengths, and we'll finally make a dictionary keyed on fruit name, and containing the length as a value:

>>> fruit = [‘apples’,’oranges’,’bananas’,’pears’]
>>> [len(f) for f in fruit]
[6, 7, 7, 5]
>>> {len(f) for f in fruit}
set([5, 6, 7])
>>> {f:len(f) for f in fruit}
{‘bananas’:7, ‘oranges’:7, ‘pears’:5, ‘apples’:6}

We can create the set of unique lengths in C# by creating a HashSet, passing in our LINQ statement to its constructor. And you can use LINQ's ToDictionary extension method to make the equivalent dictionary of strings to lengths:

var fruit = new [] { "apples", "oranges", "bananas", "pears" };
fruit.Select(f => f.Length).ToList();
new HashSet<int>(fruit.Select(f => f.Length));
fruit.ToDictionary(f => f, f => f.Length);

Generators

Python generators are essentially the same concept as a C# method that returns an IEnumerable<T>. In fact, the syntax for creating them is very similar – you just need to use the yield keyword. Here’s a generator function that returns the names of my children:

def generateChildren():
    yield "Ben"
    yield "Lily"
    yield "Joel"
    yield "Sam"
    yield "Annie"

And here’s the same thing in C#:

public IEnumerable<string> GenerateChildren() 
{
    yield return "Ben";
    yield return "Lily";
    yield return "Joel";
    yield return "Sam";
    yield return "Annie";
}

Like with C#, Python generators uses lazy evaluation. This means that they could return infinite sequences. And it also means that it is not until we actually evaluate them that we will get any errors. This code example:

def generateNumbers():
    yield 2/2
    yield 3/1
    yield 4/0 # will cause a ZeroDivisionError
    yield 5/-1

numbersGenerator = generateNumbers()
print("Numbers Generator", numbersGenerator)
try:
    numbers = [n for n in numbersGenerator]
    print("Numbers", numbers)
except ZeroDivisionError:
    print("oops")

Generates the following output:

Numbers Generator <generator object 
    generateNumbers at 0x0000000002ADD4C8>
oops

Python provides a method called “next” that allows you to step through the outputs from a generator one by one. Let’s try that with our children generator function:

>>> children = generateChildren()
>>> next(children)
'Ben'
>>> next(children)
'Lily'
>>> next(children)
'Joel'
>>> next(children)
'Sam'
>>> next(children)
'Annie'
>>> next(children)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

You’ll notice that calling next after we have reached the end gives us a StopIteration exception. C#’s closest equivalent to the Python next function is getting the enumerator and stepping through with MoveNext:

var children = GenerateChildren().GetEnumerator();
children.MoveNext();
Console.WriteLine(children.Current);
children.MoveNext();
Console.WriteLine(children.Current);
children.MoveNext();
Console.WriteLine(children.Current);    
children.MoveNext();
Console.WriteLine(children.Current);    
children.MoveNext();
Console.WriteLine(children.Current);
children.MoveNext();
Console.WriteLine(children.Current);

This produces the following output (the last item is repeated because we didn’t check the return code of MoveNext which indicates whether we reached the end of the enumeration).

Ben
Lily
Joel
Sam
Annie
Annie

In practice in C# it is fairly rare to use the enumerator directly. When you have an IEnumerable<T> you typically use it in a foreach loop or with some of the LINQ extension methods.

The Python list comprehension syntax also allows us to create new generators from existing generators. For example:

>>> (x*x for x in range(10))
<generator object <genexpr> at 0x0000000002ADD750>

This allows you to compose complex generators out of simple statements, creating a pipeline very much like you can with chained LINQ extension methods.

Conclusion

As you can see, Python list comprehensions and generators provide the same power that you are used to with C# and LINQ, and with a syntax that is more compact in most cases. Look out for a follow-up post shortly where I will demonstrate how many of the standard LINQ extension methods such as Any, All, Max, Min, Take, Skip, TakeWhile, GroupBy, First, FirstOrDefault, and OrderBy can be achieved in Python.

Tuesday, 4 March 2014

How to Calculate Code Churn using TFS

Your source control repository can provide a lot of valuable insight into your project. In particular, it can highlight source files that are being edited far too often. Files that are being changed on a regular basis (or by many different developers) indicates there might be one of these problems with your code:

lots of bugs
too many responsibilities (failure to adhere to the “Single Responsibility Principle”)
not extensible enough (failure to adhere to the “Open Closed Principle”)

Counting changes to source control files is often called “code churn”. It’s usually defined as the sum of all added, modified and deleted lines. Most source control systems will have some way of letting you get at this information, and I might post about how to extract it for other systems at a later date, but for now, here’s two approaches you can take if you’re using TFS.

Using the TFS API

The TFS API allows you to get details of each changeset individually. Unfortunately the lines added, deleted and modified aren’t included (or at least I can’t find them), but on the whole you’ll find that simply counting the number of modifications to each file is good enough at identifying trouble areas. I also like to count how many changes each user has made.

Here’s a simple class that demonstrates how to source control statistics from TFS using the API. You pass the URL of your collection (e.g. http://myserver:8080/tfs/MyCollection) into the constructor. Then to get the churn or user statistics you need to specify the path within that collection that you want to examine (e.g. $/MyProject/). You’ll see that I am filtering out only changes to the files I am interested in (e.g. only C# files), and I perform a regex on the path of each item, so changes to the same file in different branches are counted together. You’d need to customise that for whatever branching strategy you are using. I’ve also made it cancellable, as this can take a long time to run if you have a long history.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Net;
using System.Text.RegularExpressions;
using System.Threading;
using Microsoft.TeamFoundation.Client;
using Microsoft.TeamFoundation.VersionControl.Client;

namespace TfsAnalysis
{
    class TfsAnalyser
    {
        private readonly VersionControlServer vcs;

        public TfsAnalyser(string url)
        {
            var collection = ConnectToTeamProjectCollection(url, null);

            vcs = (VersionControlServer)collection.GetService(typeof(VersionControlServer));
        }

        public TfsAnalyser(string url, string user, string password, string domain)
        {

            var networkCredential = new NetworkCredential(user, password, domain);
            var collection = ConnectToTeamProjectCollection(url, networkCredential);
            
            vcs = (VersionControlServer) collection.GetService(typeof (VersionControlServer));
        }

        /// <summary>
        /// Gets User Statistics (how many changes each user has made)
        /// </summary>
        /// <param name="path">Path in the form "$/My Project/"</param>
        /// <param name="cancellationToken">Cancellation token</param>
        public IEnumerable<SourceControlStatistic> GetUserStatistics(string path, CancellationToken cancellationToken)
        {
            return GetChangesetsForProject(path, cancellationToken).GroupBy(c => c.Committer).Select(g =>
                new  SourceControlStatistic { Key = g.Key, Count = g.Count() } ).OrderByDescending(s => s.Count);
        }

        /// <summary>
        /// Gets Churn Statistics (how many times has each file been modified)
        /// </summary>
        /// <param name="path">Path in the form "$/My Project/"</param>
        /// <param name="cancellationToken">Cancellation token</param>
        public IEnumerable<SourceControlStatistic> GetChurnStatistics(string path, CancellationToken cancellationToken)
        {
            return GetChangesetsForProject(path, cancellationToken)
                .Select(GetChangesetWithChanges)
                .SelectMany(c => c.Changes) // select the actual changed files
                .Where(c => c.Item.ServerItem.Contains("/Source/")) // filter out just the files we are interested in
                .Where(c => c.Item.ServerItem.EndsWith(".cs"))
                .Where(c => ((int)c.ChangeType & (int)ChangeType.Edit) == (int)ChangeType.Edit) // don't count merges
                .Select(c => Regex.Replace(c.Item.ServerItem, @"^.+/Source/", "")) // count changes to the same file on different branches
                .GroupBy(c => c)
                .Select(g =>
                new SourceControlStatistic { Key = g.Key, Count = g.Count() }).OrderByDescending(s => s.Count); 
        }

        private Changeset GetChangesetWithChanges(Changeset c)
        {
            return vcs.GetChangeset(c.ChangesetId, includeChanges: true, includeDownloadInfo: false);
        }

        private IEnumerable<Changeset> GetChangesetsForProject(string path, CancellationToken cancellationToken)
        {
            return vcs.QueryHistory(path, RecursionType.Full).TakeWhile(changeset => !cancellationToken.IsCancellationRequested);
        }

        private TfsTeamProjectCollection ConnectToTeamProjectCollection(string url, NetworkCredential networkCredential)
        {
            var teamProjectCollection = new TfsTeamProjectCollection(new Uri(url), networkCredential);
            teamProjectCollection.EnsureAuthenticated();
            return teamProjectCollection;
        }
    }
}

Having got this information, I usually write it out to a CSV file to analyse it offline. It can yield very interesting results. One one project I discovered several files that were being modified on average more than once a week over the lifetime of the project (10 years).

Using the TFS Warehouse Database

There is however a quicker, and potentially easier way to get at the churn statistics, and that is to go direct to the Tfs_Warehouse database yourself. This does mean you need admin rights to access the database. You also lose the ability to differentiate between a regular edit commit, and a merge commit (although you could use the API to get this information). However it provides counts of lines added, removed and modified, and runs much quicker. The inspiration for this technique came from the Code Churn Analyser CodePlex project (and updated to use what seems to be the newer schema in the Tfs_Warehouse database rather than the older TfsWarehouse). I’ve modified their SQL slightly as in our system, Changesets weren’t always connected to WorkItems. Here’s a SQL statement that grabs the code churn:

SELECT 
[FactCodeChurn].CodeChurnSK ChurnId,
DimFile.[FileName] [File]],
DimFile.FilePath [FilePath],
DimFile.FileExtension [FileExtension],
DimChangeset.ChangesetID ChangesetId,
DimChangeset.ChangesetTitle ChangesetTitle,
DimPerson.PersonSK PersonId,
DimPerson.Name PersonTitle,
[FactCodeChurn].LastUpdatedDateTime Date,
[FactCodeChurn].LinesAdded LinesAdded,
[FactCodeChurn].LinesModified LinesModified,
[FactCodeChurn].LinesDeleted LinesDeleted
FROM [FactCodeChurn]
JOIN DimChangeset on FactCodeChurn.ChangesetSK = DimChangeset.ChangesetSK
JOIN DimPerson on DimChangeset.CheckedInBySK = DimPerson.PersonSK
JOIN DimFile on FactCodeChurn.FilenameSK = DimFile.FileSK
WHERE DimFile.FileExtension = '.cs'

I decided to use LINQPad to process this information, as it provides nice LINQ to SQL strongly typed objects that greatly speed up development. Once again I filtered out files I wasn’t interested in and used a regular expression to group together changes to the same file on a different branch. This one is able to count the files that have had the most different developers working on them. I use LINQPad’s “Dump” method to output the results of interest, but they could easily be output to a data file:

void Main()
{
    var churns = FactCodeChurns
        .Where (cc => cc.FilenameSKDimFile.FileExtension == ".cs")
    .Select(cc => new { File = Regex.Replace(cc.FilenameSKDimFile.FilePath , @"^.+/Source( Code)?/", ""),
        User = cc.DimChangeset.CheckedInBySKDimPerson.Name, 
        LinesAdded = cc.LinesAdded,
        LinesDeleted = cc.LinesDeleted,
        LinesModifided = cc.LinesModified
    });
    
    var changes = new Dictionary<string, Stats>();
    var users = new Dictionary<string, HashSet<string>>();
    foreach(var churn in churns)
    {
        Stats stats;
        if(!changes.TryGetValue(churn.File, out stats))
        {
            changes[churn.File] = stats = new Stats();
        }
        stats.Usages++;
        stats.LinesAdded += churn.LinesAdded.Value;
        stats.LinesDeleted += churn.LinesDeleted.Value;
        stats.LinesModified += churn.LinesModifided.Value;
        
        HashSet<string> fileUsers;
        if(!users.TryGetValue(churn.File, out fileUsers))
        {
            users[churn.File] = fileUsers = new HashSet<string>();
        }
        fileUsers.Add(churn.User);
    }
    
    changes.Where(kvp => kvp.Value.Usages > 200)
            .OrderByDescending(kvp => kvp.Value.Usages)
            .Select(kvp => new { kvp.Key, kvp.Value.Usages })
            .Dump("Usages");
    changes.Where(kvp => kvp.Value.LinesModified > 5000)
            .OrderByDescending(kvp => kvp.Value.LinesModified)
            .Select(kvp => new { kvp.Key, kvp.Value.LinesModified })
            .Dump("Lines Modified");
    changes.Where(kvp => kvp.Value.LinesAdded > 50000)
            .OrderByDescending(kvp => kvp.Value.LinesAdded)
            .Select(kvp => new { kvp.Key, kvp.Value.LinesAdded })
            .Dump("Lines Added");
    changes.Where(kvp => kvp.Value.LinesDeleted > 40000)
            .OrderByDescending(kvp => kvp.Value.LinesDeleted)
            .Select(kvp => new { kvp.Key, kvp.Value.LinesDeleted })
            .Dump("Lines Deleted");
            
    users.Where(kvp => kvp.Value.Count > 20)
        .OrderByDescending (kvp => kvp.Value.Count)
        .Select(kvp => new { kvp.Key, kvp.Value })
        .Dump("Users");    

}

class Stats
{
    public int Usages { get; set; }
    public int LinesAdded { get; set; }
    public int LinesDeleted { get; set; }
    public int LinesModified { get; set; }
}

As you can see from my code, I only dump the statistics above a certain threshold – when you’ve got tens of thousands of files and commits, you’ll want to do this.

Hope this proves useful to someone. I think Code Churn statistics are a very quick and effective way of highlighting some of the areas of code that may be suffering from too much “technical debt”.