Wednesday, 12 May 2010

LINQ to General Election Part 2 – LINQ to XLS

In my previous post, I demonstrated how to run LINQ queries against the UK general election data the Guardian newspaper in JSON format. They additionally made a Google spreadsheet available which not only has the winners in every constituency but the number of votes for each candidate that stood.

Now it might be possible to run LINQ queries directly against the Google spreadsheet using the Google Data APIs (perhaps an experiment for another day), but I chose instead to download the spreadsheet as a Microsoft Excel spreadsheet and run my LINQ queries against that instead.

First of all, I needed a LINQ to Excel provider and found linqtoexcel. Unfortunately, it requires you to reference several DLLs, but once you have done so, it is pretty straightforward to use.

Mapping the Spreadsheet

One nice feature of linqtoexcel is that it makes it easy to map your spreadsheet data onto a strongly typed class. By default it will simply look for properties matching the column names. Here’s my election result class:

public class Result
{
    public string Seat { get; set; }
    public string Candidate { get; set; }
    public string Party { get; set; }
    public int Vote { get; set; }
    public double VotePercent { get; set; }
    public string Winner { get; set; }
}

Now I can open the XLS file using the ExcelQueryFactory class and I also need to add a custom mapping since the title of the VotePercent column is not a valid C# identifier. Note that you don’t have to map all the fields. I have just picked out the six that I needed for my queries.

var repo = new ExcelQueryFactory();
repo.FileName = @"..\..\..\Data\General election 2010 results.xls";
repo.AddMapping<Result>(x => x.VotePercent, "%Vote");
Querying the Data

Having done this, we are now in a position to perform any queries. We can specify the worksheet name from which our data comes (in this case “FULL CONSTITUENCY DATA”). Having the use of strongly typed classes makes the LINQ much more readable than in the JSON example.

var unluckyLosers = from p in repo.Worksheet<Result>("FULL CONSTITUENCY DATA")
                       where (p.Party != p.Winner) && (p.VotePercent > 42)
                       orderby p.VotePercent ascending
                       select p;
Console.WriteLine("Unlucky Losers:");
foreach (var winner in unluckyLosers)
{
    Console.WriteLine(String.Format("{2}%: {0} ({1})", winner.Candidate, winner.Party, winner.VotePercent));
}

Unfortunately, there seems to be a bug in that the orderby clause seems to be ignored. I tried a few different sort fields but with no success. The output from the above query is as follows:

Unlucky Losers:
44.52%: King, Nick (C)
45.51%: Connor, Rodney (Ind)
43.79%: Kelley, Claire (LD)
42.11%: Dismore, Andrew (Lab)
42.01%: Harris, Evan (LD)
42.81%: Kramer, Susan (LD)
42.55%: Throup, Maggie (C)
44.51%: Rees-Mogg, Annunziata (C)
42.36%: Stroud, Philippa (C)
42.2%: Formosa, Mark (C)
42.53%: Heathcoat-Amory, David (C)
43.08%: Tod, Martin (LD)

Patently, they have not been sorted in the requested order. I have filed a bug against linqtoexcel. Hopefully it can be resolved soon. Update: A fix was released shortly after my bug report - linqtoexcel 1.3.70 works fine.

No comments: