Introduction
One of the key principles behind Agile software development is to commit small and incremental changes into the software frequently. In terms of checking in the code into the source control system, big chunk of code and irregular commits are concerns. Also, aspects like are all developers equally loaded are to be taken into account. Sparsity measurement is inspired by these factors. This article discusses the definition of Sparsity, its implementation in the context of a Team Foundation Server and its application. The article in addition, proposes the code churn matrix based on which the Sparsity is calculated.
Background
Sparsity basically attempts to measure the frequency of code commits taking into account two dimensions: the developers working on the project and the days (the dates) in a span of a Sprint. Sparsity is based on the code churn of the development team and code churn data is summarized in terms of something called code churn matrix. The code churn matrix is the 2 dimensional array with the developers forming the rows and the dates forming the columns and the Sparsity is essentially the measurement of sparsity of this matrix. The sparsity measurement is the percentage of “dot” elements (that is elements with 0 number of commits) in the code churn matrix.
Code Churn Matrix
|
Date 1
|
Date 2
|
…
|
Date n
|
Developer 1
|
2
|
0
|
|
3
|
Developer 2
|
0
|
0
|
|
5
|
…
|
|
|
|
|
Developer m
|
1
|
2
|
|
3
|
A sample implementation is provided with the Team Foundation Server as the source control system. The example program uses C# Team Foundation Server API to collect the user statistics, the user statistics here means which developer checked in how many change sets (the unit of commits in the TFS) on which dates. Once this statistics is gathered, another piece of code consolidates it into the code churn matrix and calculates the Sparsity.
Using the Code
The basic infrastructure to collect the user statistics from the TFS is borrowed from the works of Mark Heath. The TfsAnalyser
class contains the method GetUserStatistics
which gets the user statistics between a start and end date for the specified source folder. The user statistics is modelled by the class SourceControlStatistic
which captures the user name, the change date (=the code check in date) and the number of commits on that date.
public IEnumerable<SourceControlStatistic>
GetChurnStatistics(string path, CancellationToken cancellationToken)
{
return GetChangesetsForProject(path, cancellationToken)
.Select(GetChangesetWithChanges)
.SelectMany(c => c.Changes)
.Where(c => c.Item.ServerItem.Contains("/Source/")
&& !c.Item.ServerItem.Contains("TestAutomation"))
.Where(c => c.Item.ServerItem.EndsWith(".cs") ||
c.Item.ServerItem.EndsWith(".xaml"))
.Where(c => ((int)c.ChangeType & (int)ChangeType.Edit) == (int)ChangeType.Edit)
.Select(c => Regex.Replace(c.Item.ServerItem, @"^.+/Source/", string.Empty))
.GroupBy(c => c)
.Select(g =>
new SourceControlStatistic { Key = g.Key, Count = g.Count() }).OrderByDescending(s => s.Count);
}
Note the following piece of code which filters out the only the source code modifications. You may want to modify the filtering based on your needs, for example, you may want to count the commits of config, XML, image files, etc. Also, you may want to monitor additional folders in addition to the root "Source" folder.
.Where(c => c.Item.ServerItem.Contains("/Source/")
&& !c.Item.ServerItem.Contains("TestAutomation"))
.Where(c => c.Item.ServerItem.EndsWith(".cs") ||
c.Item.ServerItem.EndsWith(".xaml"))
Now comes the part of calculating the code churn matrix and the Sparsity out of it. These tasks are handled by the class FactGenerator
. The overloaded method GenerateCodeChurnFact(DateTime start, DateTime end)
calls the GetUserStatistics
method of the TfsAnalyser
class to gather the user statistics.
var userStat = tfsAnalyzer.GetUserStatistics(SourcePath, new CancellationToken(), start, end);
Then it initializes the empty code churn matrix with the developers as the rows and change dates as the columns.
var changedates = Enumerable.
Range(0, 1 + end.Subtract(start).Days).
Select(offset => start.AddDays(offset));
var names = UserNames.Names;
var churnMatrixMap = new Dictionary<tuple<string, datetime="">, int>();
foreach (var changedate in changedates)
{
Array.ForEach(names, n =>
churnMatrixMap.Add(new Tuple<string, datetime="">(n, changedate), 0));
}
The developer names are provided by the helper class UserNames
. You would need to modify the class to record the names of the developers in your project in the Names
field. You would need to enter the TFS user names without the domain qualifier. For example, if your domain is DOMAIN
then enter only developer1
in place of DOMAIN\developer1.
public static readonly string[] Names =
{
"developer1",
"developer2",
"developern",
};
The following is the piece of code which loads the empty code churn matrix with the actual data by querying the gathered user statistics. It loops through the user statistics and populates an entry in the code churn matrix by using the user name and the change date as the composite key. It basically keeps incrementing a found entry in the matrix by the change count.
foreach (var sourceControlStatistics in userStat)
{
foreach (var sourcecontrolStatistic in sourceControlStatistics)
{
var userName = sourcecontrolStatistic.Key.Replace(@"DOMAIN\", string.Empty); var changedate = sourcecontrolStatistic.ChangeDate; var csCount = sourcecontrolStatistic.Count; var key = new Tuple<string, datetime="">(userName, changedate);
if (churnMatrixMap.ContainsKey(key)) {
churnMatrixMap[key] += csCount;
}
}
}
Finally, the percentage of elements in the code churn matrix is returned as the Sparsity measurement:
return (churnMatrixMap.Count(m => m.Value == 0)*100)/churnMatrixMap.Count;
In the attached source code, you would need to modify the following constants in the FactGenerator
class to use it in your project:
private const string TfsProjectUrl = "http://MyServer:port/MyCollection";
private const string SourcePath = "$/MySource";
The overloaded method GenerateCodeChurnFact(DateTime start, DateTime end, int daySpan)
drives its other version discussed above by splitting the date span specified by the start and end date and the day span. Typically the day span would be your sprint duration (for example 30 days) and the start and end date are the start and end date of a release or increment. The data is written into a CSV file named CodeChurnUserFact_Sparsity.csv in the debug/ output folder of the program. Following is a hypothetical output. In this example, the team had low productivity in terms of Sparsity in the first two sprints and gradually improved in the last two sprints.
DaySlice
|
Sparsity
|
1
|
75
|
2
|
60
|
3
|
50
|
4
|
40
|
Points of Interest
Future work can be to look at the distribution of the non-zero elements of the code churn matrix and also probably the size of the code commits. This would answer the question whether the developers are uniformly loaded and that too on each day? What is the amount of code commit? etc.
History
- 17th January, 2016: Initial version