Introduction
The goal of this tip is to present three different ways of sorting data list extracted from a text file. In addition, it will calculate time it takes to sort data using these different methods and determine the fastest method.
Background
When I first got this requirement, I quickly wrote a method to sort using DataTable
. We have an application that lets user run SQL query similar to Query Designer without ever having to write SQL. It exposes the result list within the application to modify/reformat the list and save it as an extract as specified by the user. But the method I wrote wasn't working within the application because it was written some time back and DataTable
namespace wasn't included. I had to use some basic C# features that would allow my code module to compile/execute within this application. It aroused my interest in figuring out different sorting methods this application allowed. There are various other ways to do this but my focus is DataTable
, Linq, and Tuple
and determine which one was the fastest.
Using the Code
Please download the Checks.txt file to this directory below or change the location of the file within the code.
const string f = @"C:\Users\Public\Documents\Checks.txt";
All the sorted files will be saved to this same location including Test Results file.
There are 7 methods including main
.
public void ProcessExtractUsingDataTable()
This uses data table to sort list from text file and calculates completion time.
public void ProcessExtractUsingLinq()
This uses linq to sort list from text file and calculates completion time.
public void ProcessExtractUsingTuple()
This uses tuple
to sort list from text file and calculates completion time.
public static void ShowResults()
This displays results in console, saves result to text file and calls method to save sorted lists.
public static void SaveSortedExtract()
This saves sorted lists to different text files.
public static int CompareBankids()
This compares tuple
list to sort by bank ids.
static void Main(string[] args)
{
Program p = new Program();
const string f = @"C:\Users\Public\Documents\Checks.txt";
string newFileName = Path.ChangeExtension(f, null);
string[] contents = File.ReadAllLines(f);
if (contents.Length <= 1)
throw new Exception(String.Format("File {0} doesn't have enough rows", f));
string HeaderRow = contents[0];
List<string> arRetVal = contents.Skip<string>(1).ToList();
Here in our main
method, I’m reading the text file and adding them to List<string>
. I’m separating the header row before adding just the data to our list for sorting.
public void ProcessExtractUsingDataTable(List<string> FileRows, string HeaderRow, string newFileName)
This method takes in List<string>
passed from our main
method call.
funcName = System.Reflection.MethodBase.GetCurrentMethod().Name;
I’m using System.Reflection
namespace to get the Base Method Name so I can save each sorted list to separate text file.
This is where I’m starting my timer to calculate total time it takes to create table and sort.
var timer = System.Diagnostics.Stopwatch.StartNew();
DataTable table = new DataTable();
table.Columns.Add("BankID", typeof(long));
table.Columns.Add("DDA", typeof(long));
table.Columns.Add("DepositDate", typeof(string));
table.Columns.Add("TotalAmount", typeof(double));
table.Columns.Add("TotalChecks", typeof(double));
foreach (string rowText in FileRows)
{
string[] strArray = rowText.Split(',');
table.Rows.Add(strArray);
}
DataView dv = table.DefaultView;
dv.Sort = "BankID";
DataTable sortedDT = dv.ToTable();
timer.Stop();
Once DataTable
is sorted, I stopped the timer to calculate elapsed time.
Apparently, it requires bit more lines of code to put it back to List<string>
.
StringBuilder RowAppend = new StringBuilder();
List<string> arRetVal = new List<string>();
foreach (DataRow row in sortedDT.Rows)
{
foreach (DataColumn Col in sortedDT.Columns)
{
RowAppend.Append(row[Col].ToString() + ",").ToString();
}
string test = RowAppend.ToString();
test = test.Remove(test.Length - 1);
arRetVal.Add(test);
RowAppend.Clear();
}
ShowResults(arRetVal, HeaderRow, newFileName,
funcName, timer);
}
Towards the end, I’m passing the sorted list to ShowResults
that will be saved to a text file.
public void ProcessExtractUsingLinq(List<string> FileRows, string HeaderRow, string newFileName)
{
funcName = System.Reflection.MethodBase.GetCurrentMethod().Name;
var timer = System.Diagnostics.Stopwatch.StartNew();
List<string> arRetVal = FileRows.ToDictionary
(
itm => itm.Substring(0, itm.IndexOf(','))
)
.OrderBy(kvp => Convert.ToInt32(kvp.Key))
.Select(kvp => kvp.Value).ToList();
timer.Stop();
ShowResults(arRetVal, HeaderRow, newFileName, funcName, timer);
In this ProcessExtractUsingLinq
method, I’m using extension method from an IEnumerable
to ToDictionary
, sort the BankId
in ascending order and convert it back to list. Overall, I like the Linq process better, because the code is clean and concise. But, it was bit confusing to me at first when I was learning to understand Linq and I’m sure others feel the same way.
Now, this method of sorting using Tuple
is interesting. I didn’t even know about tuple
before my colleague Charlie mentioned it while we were talking. Tuple
is used specifically for sorting operations. It is an ordered list of elements and you’ll have to specify the type of each item in the type parameter list.
If you want to learn more about Tuple
, here are some links:
public void ProcessExtractUsingTuple(List<string> FileRows, string HeaderRow, string newFileName)
{
funcName = System.Reflection.MethodBase.GetCurrentMethod().Name;
var timer = System.Diagnostics.Stopwatch.StartNew();
List<string> arRetVal = new List<string>();
List<Tuple<long, long, string, double, int>> rows =
new List<Tuple<long, long, string, double, int>>();
Here, I’m creating a list of type Tuple
that has 5 items.
foreach (string rowText in FileRows)
{
string[] strArray = rowText.Split(',');
rows.Add(new Tuple<long, long, string, double, int>
(Convert.ToInt64(strArray[0]), Convert.ToInt64(strArray[1]),
strArray[2].ToString(), Convert.ToDouble(strArray[3]), Convert.ToInt32(strArray[4])));
}
Once all the rows from the list are added to the tuple list, Comparision(T)
overloaded sorting method.
rows.Sort(CompareBankids);
timer.Stop();
foreach (Tuple<long, long, string, double, int> row in rows)
{
arRetVal.Add(string.Concat(row.Item1, ",",
row.Item2.ToString(), ",", row.Item3.ToString(), ",", row.Item4, ",", row.Item5));
}
ShowResults(arRetVal, HeaderRow, newFileName, funcName,
timer);
}
public static int CompareBankids(Tuple<long, long, string, double, int> x,
Tuple<long, long, string, double, int> y)
{
return x.Item1 < y.Item1 ? -1 : (x.Item1 > y.Item1 ? 1 : 0);
}
Conclusion
Although this application may not be completely accurate for calculating actual sorting time, I just wanted to present something quick and dirty way to calculate time it takes for each of these 3 methods to complete sorting. Furthermore, .NET uses Quicksort algorithm using Pivot to do its sorting so calculation for actual sorting will probably take the same amount of time. Within the method, the calculation accounts for memory allocation, resources, execution time, etc, which is different for each method. It is obvious that DataTable
is the slowest of the 3 because it is using the most resources. Tuple
method was the fastest of all three.
Method: ProcessExtractUsingLinq took 00:00:0047033 to complete sort.
---------------------------------------------------------------------------
Method: ProcessExtractUsingDataTable took 00:00:0223600 to complete sort.
---------------------------------------------------------------------------
Method: ProcessExtractUsingTuple took 00:00:0020698 to complete sort.
---------------------------------------------------------------------------
For further reading, here is a great article from dotnetperls website on Tuple
vs. KeyValuePair
performance. It also concludes that Tuple
is faster.
History
- 14th February, 2015: Suggestions from @xs2008 applied
- 9th February, 2015: Initial version