(untagged)

Sparsity - A Novel Productivity Measurement in Agile Software Development

Prajnan Das

0.00/5 (No votes)

16 Jan 2016

Sparsity is the percentage of dot elements (= 0 code check-ins) in a code churn matrix (developers X dates in a sprint)

Download source - 43 KB

Introduction

One of the key principles behind Agile software development is to commit small and incremental changes into the software frequently. In terms of checking in the code into the source control system, big chunk of code and irregular commits are concerns. Also, aspects like are all developers equally loaded are to be taken into account. Sparsity measurement is inspired by these factors. This article discusses the definition of Sparsity, its implementation in the context of a Team Foundation Server and its application. The article in addition, proposes the code churn matrix based on which the Sparsity is calculated.

Background

Sparsity basically attempts to measure the frequency of code commits taking into account two dimensions: the developers working on the project and the days (the dates) in a span of a Sprint. Sparsity is based on the code churn of the development team and code churn data is summarized in terms of something called code churn matrix. The code churn matrix is the 2 dimensional array with the developers forming the rows and the dates forming the columns and the Sparsity is essentially the measurement of sparsity of this matrix. The sparsity measurement is the percentage of “dot” elements (that is elements with 0 number of commits) in the code churn matrix.

Code Churn Matrix
	Date 1	Date 2	…	Date n
Developer 1	2	0		3
Developer 2	0	0		5
…
Developer m	1	2		3

A sample implementation is provided with the Team Foundation Server as the source control system. The example program uses C# Team Foundation Server API to collect the user statistics, the user statistics here means which developer checked in how many change sets (the unit of commits in the TFS) on which dates. Once this statistics is gathered, another piece of code consolidates it into the code churn matrix and calculates the Sparsity.

Using the Code

The basic infrastructure to collect the user statistics from the TFS is borrowed from the works of Mark Heath. The TfsAnalyser class contains the method GetUserStatistics which gets the user statistics between a start and end date for the specified source folder. The user statistics is modelled by the class SourceControlStatistic which captures the user name, the change date (=the code check in date) and the number of commits on that date.

/// <summary>
/// Gets Churn Statistics (how many times has each file been modified)
/// </summary>
/// <param name="path">Path in the form "$/My Project/"</param>
/// <param name="cancellationToken">Cancellation token</param>
public IEnumerable<SourceControlStatistic> 
GetChurnStatistics(string path, CancellationToken cancellationToken)
{
    return GetChangesetsForProject(path, cancellationToken)
        .Select(GetChangesetWithChanges)
        // select the actual changed files
        .SelectMany(c => c.Changes)
        // filter out just the files we are interested in
        .Where(c => c.Item.ServerItem.Contains("/Source/") 
        && !c.Item.ServerItem.Contains("TestAutomation"))
        // count only source code modifications
        .Where(c => c.Item.ServerItem.EndsWith(".cs") || 
        c.Item.ServerItem.EndsWith(".xaml"))
        // don't count merges
        .Where(c => ((int)c.ChangeType & (int)ChangeType.Edit) == (int)ChangeType.Edit)
        // count changes to the same file on different branches
        .Select(c => Regex.Replace(c.Item.ServerItem, @"^.+/Source/", string.Empty))
        .GroupBy(c => c)
        .Select(g =>
        new SourceControlStatistic { Key = g.Key, Count = g.Count() }).OrderByDescending(s => s.Count);
}

Note the following piece of code which filters out the only the source code modifications. You may want to modify the filtering based on your needs, for example, you may want to count the commits of config, XML, image files, etc. Also, you may want to monitor additional folders in addition to the root "Source" folder.

// filter out just the files we are interested in
.Where(c => c.Item.ServerItem.Contains("/Source/") 
&& !c.Item.ServerItem.Contains("TestAutomation"))
// count only source code modifications
.Where(c => c.Item.ServerItem.EndsWith(".cs") || 
c.Item.ServerItem.EndsWith(".xaml"))

Now comes the part of calculating the code churn matrix and the Sparsity out of it. These tasks are handled by the class FactGenerator. The overloaded method GenerateCodeChurnFact(DateTime start, DateTime end) calls the GetUserStatistics method of the TfsAnalyser class to gather the user statistics.

//// Get the code churn for by the users between start and end date
var userStat = tfsAnalyzer.GetUserStatistics(SourcePath, new CancellationToken(), start, end);

Then it initializes the empty code churn matrix with the developers as the rows and change dates as the columns.

//// Generate the dates by 1 day interval between start and end date
var changedates = Enumerable.
    Range(0, 1 + end.Subtract(start).Days).
    Select(offset => start.AddDays(offset));

//// all developers in the project
var names = UserNames.Names;

//// initialize the empty churn matrix = 
//// users X dates [=each element is = # of code churns (changesets)]
var churnMatrixMap = new Dictionary<tuple<string, datetime="">, int>();

foreach (var changedate in changedates)
{
    Array.ForEach(names, n => 
    churnMatrixMap.Add(new Tuple<string, datetime="">(n, changedate), 0));
}

The developer names are provided by the helper class UserNames. You would need to modify the class to record the names of the developers in your project in the Names field. You would need to enter the TFS user names without the domain qualifier. For example, if your domain is DOMAIN then enter only developer1 in place of DOMAIN\developer1.

/// <summary>
/// Enter the names of the developers working in your project
/// </summary>
public static readonly string[] Names =
{
    "developer1",
    "developer2",
    "developern",
};

The following is the piece of code which loads the empty code churn matrix with the actual data by querying the gathered user statistics. It loops through the user statistics and populates an entry in the code churn matrix by using the user name and the change date as the composite key. It basically keeps incrementing a found entry in the matrix by the change count.

//// populate the churn matrix by parsing the user statistics got between start and end date
foreach (var sourceControlStatistics in userStat)
{
    foreach (var sourcecontrolStatistic in sourceControlStatistics)
    {
        //// replace "DOMAIN\" with your domain name
        var userName = sourcecontrolStatistic.Key.Replace(@"DOMAIN\", string.Empty); //// user name                    
        var changedate = sourcecontrolStatistic.ChangeDate; //// change date
        var csCount = sourcecontrolStatistic.Count; //// # # of code churns (changesets)
        var key = new Tuple<string, datetime="">(userName, changedate);
        if (churnMatrixMap.ContainsKey(key)) //// if the entry exists in the churn matrix update the count
        {
            churnMatrixMap[key] += csCount;
        }
    }
}

Finally, the percentage of elements in the code churn matrix is returned as the Sparsity measurement:

//// calculate sparsity by counting all elements = 0 in the churn matrix
return (churnMatrixMap.Count(m => m.Value == 0)*100)/churnMatrixMap.Count;

In the attached source code, you would need to modify the following constants in the FactGenerator class to use it in your project:

/// <summary>
/// the url to the collection that you are interested in
/// </summary>
private const string TfsProjectUrl = "http://MyServer:port/MyCollection";

/// <summary>
/// the path within the collection that you want to examine
/// </summary>
private const string SourcePath = "$/MySource";

The overloaded method GenerateCodeChurnFact(DateTime start, DateTime end, int daySpan) drives its other version discussed above by splitting the date span specified by the start and end date and the day span. Typically the day span would be your sprint duration (for example 30 days) and the start and end date are the start and end date of a release or increment. The data is written into a CSV file named CodeChurnUserFact_Sparsity.csv in the debug/ output folder of the program. Following is a hypothetical output. In this example, the team had low productivity in terms of Sparsity in the first two sprints and gradually improved in the last two sprints.

DaySlice	Sparsity
1	75
2	60
3	50
4	40

Points of Interest

Future work can be to look at the distribution of the non-zero elements of the code churn matrix and also probably the size of the code commits. This would answer the question whether the developers are uniformly loaded and that too on each day? What is the amount of code commit? etc.

History

17^th January, 2016: Initial version

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here