Introduction
I present to you a tool that is capable of... Counting Lines!
Wow! you say in amazement as you stagger back trying to regain your balance. That's right my friends, I am afraid there is no ground breaking stuff here today.
However having said that, I had to create this tool because I could not easily find anything else out there that would do what I was after. Thus if you bear with me, you might find this code useful to you.
There are two parts to this article you may find interesting. The first is the DirectoryLineCounter
. This is the heart of the article and is a simple class that will recursively extract the number of lines from a subset of files from a given directory.
The second interesting part of the download is the application that uses the DirectoryLineCounter
. This application allowed us to rapidly work out exactly just how much code was contained in the various sections of our repository.
This information was useful to us in identifying where people were creating the most code in our scientific framework. We were hoping to see that the most code effort was being put into the creation of science, but instead we saw that the applications (GUI's) that were utilizing the science framework were where the most lines of code were being recorded.
Line count engine
We were initially using a simple linecounter
(grep/script) to give us the total number of lines in our entire repository, but this didn't really give us any useful information as to what areas of the repository contained the most code. Thus we created this simple class that was capable of recursing into the directories and reporting back the information in a structured way.
The DirectoryLineCounter
class has two static arrays, DirectoryIgnoreNames
and FileSearchPatterns
. These are the directories to ignore (i.e. bin, debug, .cvs, .svn ...), and the file types to count (i.e.. *.cs, *.h, *.vb ...). Having them as static fields was fine for our use because DirectoryLineCounter
was only ever run with the intention of summarizing one directory (and its subdirectories) in the one run. It would be a simple change to make the member fields, and pass them as parameters to the recursive runs.
Once the DirectoryIgnoreNames
and FileSearchPatterns
fields have been set, the DirectoryLineCounter
is able to produce some useful results by calling the countLines()
method. Once complete the DirectoryLineCounter
will contain two counts, one for the lines of code found in the directory it was pointing at (DirectoryLines
), and another count for the total lines found in all subdirectories (SubDirectoriesTotalLines
). The DirectoryLineCounter
also contains a list of DirectoryLineCounter
's that represent all the subdirectories in the initial directory, this array is the SubDirectoryCounters
field.
The last thing that might need explaining is the FilesCompleted
event. This event is fired whenever a DirectoryLineCounter
has finished counting from all the files in its directory. It then passes back in the event the number of files just completed. This was useful for giving the user progress of where the process was at.
The DirectoryLineCounter
was put into a separate project (LineCountEngine) so that it was easy to create many applications from the same project. I intended to write a command line application that would also utilize the LineCountEngine, but this is not likely to happen given the current time constraints.
As for performance, I have no idea what is good, but I can tell you it takes about 10 seconds to summarize our repository of around 650,000 lines. This is not a problem for us.
Line counter application
The line counter application simply made use of the LineCountEngine. You tell it where to start and press go, the application then builds a directory tree with the information the DirectoryLineCounter
returns.
Thanks to Julijan Sribar for the use of his PieChart
component. You can see his article here.