On a project I am working on there is a growing number of files that are in Source Control but are not actually referenced by any .csproj files. I decided to write a quick and dirty command line program to find these files, and at the same time learn a bit of LINQ to XML.
During the course of my development, I ran into a couple of tricky issues. First was how to combine some foreach loops into a LINQ statement, and second was to construct the regex for source file matching. Both I guess I could have solved myself with a bit of time reading books, but I decided to throw them out onto Stack Overflow. Both were answered within a couple of minutes of asking. I have to say this site is incredible, and rather than treating it as a last resort for questions I have reached the end of my resources on, I am now thinking of it more like a super-knowledgeable co-worker who you can just ask a quick question and get a pointer in the right direction.
Here's the final code. I'm sure it could easily be turned into one nested LINQ query and improved on a little, but it does what I need. Feel free to suggest refactorings and enhancements in the comments.
using System.Text; using System.IO; using System.Xml.Linq; using System.Text.RegularExpressions; namespace SolutionChecker { public class Program { public const string SourceFilePattern = @"(?<!\.g)\.cs$"; static void Main(string[] args) { string path = (args.Length > 0) ? args[0] : GetWorkingFolder(); Regex regex = new Regex(SourceFilePattern); var allSourceFiles = from file in Directory.GetFiles(path, "*.cs", SearchOption.AllDirectories) where regex.IsMatch(file) select file; var projects = Directory.GetFiles(path, "*.csproj", SearchOption.AllDirectories); var activeSourceFiles = FindCSharpFiles(projects); var orphans = from sourceFile in allSourceFiles where !activeSourceFiles.Contains(sourceFile) select sourceFile; int count = 0; foreach (var orphan in orphans) { Console.WriteLine(orphan); count++; } Console.WriteLine("Found {0} orphans",count); } static string GetWorkingFolder() { return Path.GetDirectoryName(typeof(Program).Assembly.CodeBase.Replace("file:///", String.Empty)); } static IEnumerable<string> FindCSharpFiles(IEnumerable<string> projectPaths) { string xmlNamespace = "{http://schemas.microsoft.com/developer/msbuild/2003}"; return from projectPath in projectPaths let xml = XDocument.Load(projectPath) let dir = Path.GetDirectoryName(projectPath) from c in xml.Descendants(xmlNamespace + "Compile") let inc = c.Attribute("Include").Value where inc.EndsWith(".cs") select Path.Combine(dir, c.Attribute("Include").Value); } } }
2 comments:
Using a Hashset as the result of FindCSharpFiles() provides a significant speed improvement. Our 1000+ file project is analyzed much much faster.
thanks for the tip Joel. I haven't really got into the hashset class as our project is on .NET 2.0 still. Looks like its a useful class.
Post a Comment