Introduction
The attached program is a line count utility written in C#. This:
- Counts the number of lines in a .NET solution, project or individual code file.
- Works with C#, Visual Basic and C++ (.NET solutions and projects only).
- Works with both Visual Studio 2003/.NET 1.1 and Visual Studio 2005/.NET 2.0 (but needs .NET 2.0 to run).
- Provides a sortable grid of results so you can easily find your biggest projects or biggest code files.
- Caches results at all levels and provides views onto them. For a solution you can see a view of all projects and their sizes, all code files and their sizes, or a view that combines both projects and code files.
- Shows the number of blank lines at every level.
- Shows the number of lines auto-generated by code-generators at every level (e.g. layout code on forms, or typed DataSet code).
- Shows the number of lines of comments at each level.
- Counts the overall number of files in the solution or project.
- Allows the grid to be copied into the clipboard in a format that can be pasted into Excel.
- Comes in a fetching green and white colour scheme.
Background
I was recently asked how many lines of code there are in our current C# project, and how that compared with another similar project. The 'other' project is much bigger in terms of resources (numbers of developers), although it's been running for slightly less time than our project. Our project has had two or three developers working on it for about a year.
I looked around for a line count utility on the Internet, but couldn't really find anything I liked the look of. So I upgraded an old VB6 line count utility I wrote several years ago. I used the VB6 to VB.NET upgrade wizard initially. It still amazes me that the upgrade wizard works at all, but in this case I got a VB.NET project (with VB6-style code) that compiled immediately. With a little work I got it counting code in individual C# projects.
This program told me we had about 180,000 lines of code in our entire C# solution. If you do the maths on that it comes out at about 1500 lines of code per developer per week, or over 300 lines per day.
300 lines per day per developer of production code seemed very high, so I decided I needed a tool that could analyze the data in a little more detail. This program is the result of that. Below I will discuss why our developers (myself included) are nowhere near as productive as the initial analysis suggests, and why.
Using the Code
At start up, the application opens a dialog to allow the user to select the solution, project or code file (*.vb, *.cs) that the program will run on initially. Once a file is selected, the application calculates the line counts for that item and displays the results as below. Here a solution file has been selected and both project files and individual code files are being shown in the resulting grid:
The grid can as usual be sorted by clicking the column headers. Here it has been sorted by the number of lines in individual code files.
Additional functionality is available on both the traditional menus and a context menu. These can be used to hide the code files and show only project files, with one line in the grid per project file (by clearing the check mark alongside 'Show Code Files'):
For a simpler view at code file level, the application can also be used to show code files only (by checking 'Show Code Files' and clearing 'Show Project Files'). The breakdown columns (numbers of blank lines, code designer lines and comments) can also be hidden using the 'Show Breakdown' menu option:
The other functionality on the menus is pretty self-explanatory.
If you want to copy the grid into Excel, you can simply select the entire grid (Ctrl-A), copy to the clipboard (Ctrl-C) and then launch Excel and paste (Ctrl-V). In a later version of the application, I will add a menu option to do all this.
Points of Interest
Design
The design of this application is relatively straightforward. However I have put some more detail on design in an extended version of this article which is available on my blog.
Issues
There are some issues around the counting of auto-generated code with this application, particularly with Visual Studio 2003 projects. In Visual Studio 2005, we have auto-generated code neatly split into partial 'designer' files, which makes it much easier to identify and count. For Visual Studio 2003, I have tried to identify the auto-generated code regions, but have been forced to do this by looking for the #Region
or #region string
s that precede these regions. This probably isn't the most accurate method of identifying this code.
Analysis
The Line Count program showed us that whilst our project does have 180,000 lines of code, 100,000 of them are auto-generated by Microsoft's code generators.
Of the 100,000 auto-generated lines 73,000 are in our data access component. Our application is a low-volume but reasonably complex product, and for ease of development we have extensively used typed DataSets to get our data out of our database. Those 73,000 lines of code are mainly in these typed DataSets. In addition 22,000 auto-generated lines of code (out of the 100,000) are in our presentation layer. As you'd expect, these are mainly auto-generated layout code for our forms and user controls.
So we're down to 80,000 lines of code written by developers. Of this, a further 10,000 lines are blank, and another 10,000 are comments. Even this exaggerates the size of the actual application code as we can see that our unit test project has 16,000 lines of code.
I expect these numbers are not untypical of enterprise .NET applications. I'd be interested in some statistics from other projects.
As for the 'other' project I mentioned above, that has 50,000 lines of code, 10,000 auto-generated, 5,000 blank, 6,000 comments (and no unit tests).
Conclusion
In the end, all this goes to back up something that all developers know instinctively: using lines of code as a metric for the 'size' of an application really doesn't make much sense. Maybe that's why I couldn't find a decent line count program in the first place.
However counting lines can provide some interesting analysis. We can see at a glance which our biggest classes are, and these are clearly candidates for refactoring. Also, if you look closely at the screenshots, you can see that we probably have too much logic in our presentation layer compared to our model layer (middle tier business layer). We knew that already, but the line count statistics bring it home.
As mentioned above, an extended version of this article is available on my blog here.