I had a number of people attempt the Lunchtime LINQ Challenge I set yesterday, and so I thought I’d follow up with a post about each problem, and what my preferred approach was. With each problem I saw a wide variety of different approaches, and the majority were correct (apart from the odd off-by-one error). LINQ is very powerful, and chaining operators can let you achieve a lot, but you can risk creating incomprehensible code. So I was looking for answers that were succinct but also readable.
Problem 1: Numbering Players
1. Take the following string "Davis, Clyne, Fonte, Hooiveld, Shaw, Davis, Schneiderlin, Cork, Lallana, Rodriguez, Lambert" and give each player a shirt number, starting from 1, to create a string of the form: "1. Davis, 2. Clyne, 3. Fonte" etc
This one wasn’t too hard, and was designed to highlight the fact that the LINQ Select method lets you pass in a lambda that takes two parameters – the first is the item from the input IEnumerable, and the second is an index (zero based). Here was my solution to this problem:
String.Join(", ", "Davis, Clyne, Fonte, Hooiveld, Shaw, Davis, Schneiderlin, Cork, Lallana, Rodriguez, Lambert" .Split(',') .Select((item, index) => index+1 + "." + item) .ToArray())
Problem 2: Order by Age
2. Take the following string "Jason Puncheon, 26/06/1986; Jos Hooiveld, 22/04/1983; Kelvin Davis, 29/09/1976; Luke Shaw, 12/07/1995; Gaston Ramirez, 02/12/1990; Adam Lallana, 10/05/1988" and turn it into an IEnumerable of players in order of age (bonus to show the age in the output).
This basic problem isn’t too hard to solve, but calculating the age proved rather tricky. Various attempts were made such as dividing the days old by 365.25, but the only one that worked for all test cases came from this StackOverflow answer. The trouble is, inserting this code snippet into a LINQ statement would make it quite cumbersome, so my preference would be to create a small helper method or even an extension method:
public static int GetAge(this DateTime dateOfBirth) { DateTime today = DateTime.Today; int age = today.Year - dateOfBirth.Year; if (dateOfBirth > today.AddYears(-age)) age--; return age; }
With that extension method in place, we can create much more readable code. Note that to sort players by age properly, the OrderBy should operate on the date of birth, rather than on the age.
"Jason Puncheon, 26/06/1986; Jos Hooiveld, 22/04/1983; Kelvin Davis, 29/09/1976; Luke Shaw, 12/07/1995; Gaston Ramirez, 02/12/1990; Adam Lallana, 10/05/1988" .Split(';') .Select(s => s.Split(',')) .Select(s => new { Name = s[0].Trim(), Dob = DateTime.Parse(s[1].Trim()) }) .Select(s => new { Name = s.Name, Dob = s.Dob, Age = s.Dob.GetAge() }) .OrderByDescending (s => s.Dob)
Problem 3: Sum Timespans
3. Take the following string "4:12,2:43,3:51,4:29,3:24,3:14,4:46,3:25,4:52,3:27" which represents the durations of songs in minutes and seconds, and calculate the total duration of the whole album
The main challenges here are first turning the strings into TimeSpans, and then summing TimeSpans, since you can’t use LINQ’s Sum method on an IEnumerable<TimeSpan>. The first can be done with TimeSpan.ParseExact, or you could add an extra “0:” on the beginning to get it into the format that TimeSpan.Parse expects. Several people ignored timespans completely, and simply parsed out the minutes and seconds themselves, and added up the total number of seconds. This is OK, although not as extensible for changes to the input string format such as the introduction of millisecond components. The summing of TimeSpans can be done quite straightforwardly with the LINQ Aggregate method. Here’s what I came up with:
"4:12,2:43,3:51,4:29,3:24,3:14,4:46,3:25,4:52,3:27" .Split(',') .Select(s => TimeSpan.Parse("0:" + s)) .Aggregate ((t1, t2) => t1 + t2)
Problem 4: Generate Coordinates
4. Create an enumerable sequence of strings in the form "x,y" representing all the points on a 3x3 grid. e.g. output would be: 0,0 0,1 0,2 1,0 1,1 1,2 2,0 2,1 2,2
This one is nice and easy. I expected this one to be solved using a SelectMany, but I did get one answer that just used a Select:
Enumerable.Range(0,9) .Select(x => string.Format("{0},{1}", x / 3, x % 3))
This works fine, although it would be a little more involved to change the maximum x and y values. The answer I was expecting from most people just uses two Enumerable.Range and a SelectMany
Enumerable.Range(0,3) .SelectMany(x => Enumerable.Range(0,3) .Select(y => String.Format("{0},{1}",x,y)))
But I did get a few responses that use the alternative LINQ syntax, and although I tend to prefer the chained method calls approach, in this case I think it makes for easier to read code:
from i in Enumerable.Range(0, 3) from j in Enumerable.Range(0, 3) select String.Format("{0}, {1}", i, j);
Problem 5: Swim Length Times
5. Take the following string "00:45,01:32,02:18,03:01,03:44,04:31,05:19,06:01,06:47,07:35" which represents the times (in minutes and seconds) at which a swimmer completed each of 10 lengths. Turn this into an enumerable of timespan objects containing the time taken to swim each length (e.g. first length was 45 seconds, second was 47 seconds etc)
This one was by far the hardest to implement as a single LINQ statement, since you needed to compare the current value with the previous value to calculate the difference. Perhaps the easiest trick is to Zip the sequence with itself delayed by one:
("00:00," + splitTimes).Split(',') .Zip(splitTimes.Split(','), (s,f) => new { Start = TimeSpan.Parse("0:" + s), Finish = TimeSpan.Parse("0:" + f) }) .Select (q => q.Finish-q.Start)
The disadvantage of this approach is that you are enumerating the input sequence twice. Obviously for a string input it does not matter, but in some situations you cannot enumerate a sequence twice. I did receive two ingenious solutions that used the Aggregate function to avoid the double enumeration:
"00:45,01:32,02:18,03:01,03:44,04:31,05:19,06:01,06:47,07:35" .Split(new char[] { ','}) .Select(x => "00:"+x) .Aggregate((acc, z) => acc + "," + TimeSpan.Parse(z.Dump()).Add(new TimeSpan(0,0,- acc.Split(new char[] { ',' }) .Sum(x => (int)TimeSpan.Parse(x).TotalSeconds)))) .Split(new char[] {','}) .Select(x => TimeSpan.Parse(x))
and
.Split(',') .Aggregate("00:00:00", (lapsString, lapTime) => string.Format("{0}-00:{1},00:{1}", lapsString, lapTime), result => result.Substring(0, result.Length - 9)) .Split(',') .Select(lap => DateTime.Parse(lap.Substring(9)) - DateTime.Parse(lap.Substring(0,8)))
But to be honest, I think that this is another case for a helper extension method. Suppose we had a ZipWithSelf method that allowed you to operate on the sequence in terms of the current and previous values:
public static IEnumerable<X> ZipWithSelf<T,X>(this IEnumerable<T> inputSequence, Func<T,T,X> selector) { var last = default(T); foreach(var input in inputSequence) { yield return selector(last, input); last = input; } }
then we could make the LINQ statement read very naturally and only require a single enumeration of the sequence:
"00:45,01:32,02:18,03:01,03:44,04:31,05:19,06:01,06:47,07:35" .Split(',') .Select(x => TimeSpan.Parse("00:"+x)) .ZipWithSelf((a,b) => b - a)
Problem 6: Ranges
6. Take the following string "2,5,7-10,11,17-18" and turn it into an IEnumerable of integers: 2 5 7 8 9 10 11 17 18
This one represents a real problem I had to solve a while ago, and after a few iterations, found that it could be solved quite succinctly in LINQ. The trick is to turn it into an enumeration of start and end values. For the entries that aren’t ranges, just have the same value for start and end. Then you can do a SelectMany combined with Enumerable.Range:
"2,5,7-10,11,17-18" .Split(',') .Select(x => x.Split('-')) .Select(p => new { First = int.Parse(p[0]), Last = int.Parse(p.Last()) }) .SelectMany(r => Enumerable.Range(r.First, r.Last - r.First + 1))