Part 1 - Using SQLXAgent
Part 2 - Architecture and Design Decisions
Part 3 - The CSV and Excel Importer Code (this article)
Part 4 - Job Scheduling Code
Part 5 - How Packages Are Run
Part 6 - Interesting Coding
Introduction
This article is the third in a larger series of articles describing the SQLXAgent utility, specifically discussing the code that handles importing of Excel and CSV files. There won't be very many screen shots - just a bunch of code snippets and boring narrative. If you don't care about the subject matter of this article part, feel free to ignore it and go find something else to do. Other coding challenges are discussed in part 4 of this article series.
Note: - The only differences between the code in the files and the code in this article is related to the elimination (via line formatting) of line-length-induced horizontal scroll bars.
Throughout this code, you may notice calls to unfamiliar methods (for example, string.ReplaceNonAlphaNumeric()
). I make heavy use of extension methods, and that's probably what you're seeing. If you're really curious, you can find them in the ObjectExtensions folder within the SQLXCommon project.
NOTE: Code snippets presented in the article may or may not reflect the absolutely latest and greatest version of the code. In the event that it does not exactly match the actual code, it will fairly close. Such is the nature of noticing issues while writing an article.
General Overview
The classes described here are located in the SQLXPkgBase assembly. The class inheritance and dependency hierarchy is illustrated below.
I wanted to bring up the fact that the CSV importer is loosely based on the article I wrote in 2016 - CSV File Parser . I say "loosely based" because writing this code forced me to re-evaluate the way I approached making the imported data accessible to the parent application/assembly.
The Code
In this article we'll be discussing the classes that are primarily involved in the importation of data. They are:
- ImporterBase - The absolute (abstract) base class for CSV and Excel importer classes.
- CSVImportBase - The (abstract) base class for CSV importer classes
- ExcelImportBase - The (abstract) base class for Excel importer classes
- HintItem and HintItemList - describes imported columns
ImporterBase
This is an abstract class that contains most of the properties and functionality for every CSV or Excel importer. Most of the methods are protected so they can be used by the inheriting classes. The inheriting classes are responsible for the actual loading of the raw data, and creating a DataTable
object on which this class performs its processing. Once the desired file is loaded, the inheriting class calls the ProcessData()
method.
</summary>
protected virtual void ProcessData()
{
this.RemoveEmptyRows();
this.NormalizeNames();
this.CreateColumnHints();
this.EstablishStringLengths();
this.BuildPrepQuery();
this.BuildCreateTableQuery();
this.BuildInsertQuery();
}
The methods are described below. I retained the comments in an (probably futile) attempt to keep the narrative text to a minimum.
- RemoveEmptyRows() - If the
DeleteEmptyRows
property is true, any row in the data table object that have all empty/null fields is removed from the data table.
protected virtual void RemoveEmptyRows()
{
this.RemovedRowCount = 0;
if (this.DeleteEmptyRows)
{
for (int i = this.ImportedData.Rows.Count-1; i >= 0; i--)
{
DataRow row = this.ImportedData.Rows[i];
if (row.ItemArray.All(x=>x is DBNull || string.IsNullOrEmpty(x.ToString())))
{
this.ImportedData.Rows.Remove(row);
this.RemovedRowCount++;
}
}
}
}
- NormalizeNames() - This method will normalize column names and the table name. If there is no header row in the source data file, columns will be automatically created using the "Column_n" format, where "n" is the ordinal position of the column within the source data. If the table name was not specifically specified by the programmer, the name of the table into which the data is imported is determined by the file name.
protected virtual void NormalizeNames()
{
if (this.FirstRowIsHeader)
{
DataRow headerRow = this.ImportedData.Rows[0];
for (int i = 0; i < headerRow.ItemArray.Count(); i++)
{
DataColumn excelColumn = this.ImportedData.Columns[i];
string rowColumn = (string)headerRow.ItemArray[i];
excelColumn.ColumnName = (string.IsNullOrEmpty(rowColumn)
? excelColumn.ColumnName
: rowColumn
).Trim().ReplaceNonAlphaNumeric('_');
}
this.ImportedData.Rows.RemoveAt(0);
}
string newTableName = (string.IsNullOrEmpty(this.ImportedTableName))
? string.Concat("Imported_",
Path.GetFileName(this.FileName).Trim(),
((this.FileType == ImportFileType.EXCEL)
? "_"
: ""),
((this.FileType == ImportFileType.EXCEL)
? this.SheetName.Trim()
: "")).ReplaceNonAlphaNumeric('_')
: this.ImportedTableName;
this.ImportedData.TableName = string.Format("Imported_{0}", newTableName);
}
- CreateColumnHints() - This method is responsible for auto-building a "hints" list, which is used by the SQL query code to create the import table with field names and appropriate data types. If the inheriting class has already created a hints list (ostensibly via an overridden version of this method), this method simply exits without doing anything. The column hint related classes are described in the next section.
protected virtual void CreateColumnHints(bool fromCSV=false)
{
if (this.ColumnHints == null || this.ColumnHints.Count == 0)
{
this.ColumnHints = new ItemHints();
for (int i = 0; i < this.ImportedData.Columns.Count; i++)
{
DataColumn col = this.ImportedData.Columns[i];
if (!string.IsNullOrEmpty(col.ColumnName))
{
HintItem hint = new HintItem(){ Name = col.ColumnName, ItemType=null};
foreach(DataRow row in this.ImportedData.Rows)
{
hint.DetermineType(row[col], fromCSV);
if (hint.ItemType.Name.IsLike("String"))
{
break;
}
}
this.ColumnHints.Add(hint);
}
}
}
}
- EstablishStringLengths() - This method tries to establish the max length of
string
type columns. It uses the VarcharLengthMultiple
property to determine the max length. So, using the default multiple of 255 characters, and the max length of all of the values for a given column was 300 characters, the resulting max string size for the column would be 510 characters.
protected virtual void EstablishStringLengths()
{
if (this.ColumnHints != null && this.ColumnHints.Count > 0)
{
for (int i = 1; i < this.ImportedData.Columns.Count; i++)
{
if (this.ImportedData.Columns[i].DataType.Name.IsLike("string"))
{
int maxSize = 0;
foreach(DataRow row in this.ImportedData.Rows)
{
maxSize = Math.Max(maxSize, (row.ItemArray[i] is DBNull)
? 0
: ((string)(row.ItemArray[i])).Length);
}
long rem = 0;
int factor = (int)(Math.DivRem((long)maxSize, (long)this.VarcharLengthMultiple, out rem));
factor += ((rem > 0) ? 1 : 0);
HintItem hint = this.ColumnHints.FirstOrDefault
(x=>x.Name.ToUpper() == this.ImportedData.Columns[i].ColumnName.ToUpper());
if (hint != null)
{
hint.VarcharSize = this.VarcharLengthMultiple * factor;
}
}
}
}
}
- BuildPrepQuery() - This method constructs the database preparation query, or in other words, whether to drop the import table completely or to just truncate it.
protected virtual void BuildPrepQuery()
{
switch (this.PrepAction)
{
case TablePrepAction.Drop :
{
this.PrepQuery = string.Format("DROP TABLE [{0}].[dbo].[{1}];",
this.InitialCatalog,
this.ImportedData.TableName);
}
break;
case TablePrepAction.Truncate :
{
this.PrepQuery = string.Format("TRUNCATE TABLE [{0}].[dbo].[{1}];",
this.InitialCatalog,
this.ImportedData.TableName);
}
break;
}
}
- BuildCreateTableQuery() - This method constructs the query that creates the table. It uses the column hints list discussed above to determine column names and data types. There are two overloads of this method. One creates a table with ALL of the columns in the imported data, and the other creates a table based on the columns specified in a
SqlParameter
array.
protected virtual void BuildCreateTableQuery()
{
StringBuilder query = new StringBuilder();
StringBuilder fields = new StringBuilder();
query.AppendFormat("IF (OBJECT_ID('{0}','U') IS NULL ",
this.ImportedData.TableName).AppendLine();
query.AppendFormat(" CREATE TABLE {0}.dbo.{1}",
this.InitialCatalog,
this.ImportedData.TableName).AppendLine();
query.AppendLine("(");
for (int i = 0; i < ImportedData.Columns.Count; i++)
{
string colName = Convert.ToString(this.ImportedData.Columns[i]);
if (!string.IsNullOrEmpty(colName))
{
HintItem hint = this.ColumnHints.FirstOrDefault(x=>x.Name == colName);
string typeName = (hint == null)
? this.GetDbDataType(this.ImportedData.Columns[i])
: this.GetDbDataType(hint.ItemType.Name);
fields.AppendFormat("{0}[{1}] {2} NULL",
(fields.Length>0) ? "," : "",
colName,
typeName).AppendLine();
}
}
query.Append(fields);
query.Append(");");
this.CreateQuery = query.ToString();
}
protected virtual void BuildCreateTableQuery(SqlParameter[] parameters)
{
StringBuilder query = new StringBuilder();
StringBuilder fields = new StringBuilder();
query.AppendFormat("IF (OBJECT_ID('{0}','U') IS NULL ",
this.ImportedData.TableName).AppendLine();
query.AppendFormat(" CREATE TABLE {0}.dbo.{1}",
this.InitialCatalog,
this.ImportedData.TableName).AppendLine();
query.AppendLine("(");
List<hintitem> hints = this.ColumnHints.Where
(a => parameters.Any(b => b.ParameterName == a.DbValue)).ToList();
foreach(HintItem hint in hints)
{
string typeName = this.GetDbDataType(hint.ItemType.Name);
fields.AppendFormat("{0}{1} {2} NULL",
(fields.Length>0) ? "," : "",
hint.DbName,
typeName).AppendLine();
}
query.Append(fields);
query.Append(");");
this.CreateQuery = query.ToString();
}
- BuildInsertQuery() - This method constructs the insert query that inserts the imported data into the import table. Like the "create table" method above, there are two overloads, one that inserts ALL columns, and one that inserts columns specified in a
SqlParameter
array.
protected virtual void BuildInsertQuery()
{
StringBuilder query = new StringBuilder();
StringBuilder fields = new StringBuilder();
StringBuilder values = new StringBuilder();
foreach(HintItem hint in this.ColumnHints)
{
fields.AppendFormat("{0}{1}",
(fields.Length>0) ? "," : "",
hint.DbName).AppendLine();
values.AppendFormat ("{0}{1}",
(fields.Length>0) ? "," : "",
hint.DbValue).AppendLine();
}
query.AppendFormat("INSERT INTO [{0}].[dbo].[{1}]",
this.InitialCatalog,
this.ImportedData.TableName).AppendLine();
query.AppendLine("(");
query.Append(fields);
query.AppendLine(")").AppendLine("VALUES").AppendLine("(");
query.Append(values);
query.AppendLine(");");
this.InsertQuery = query.ToString();
}
protected virtual void BuildInsertQuery(SqlParameter[] parameters)
{
StringBuilder query = new StringBuilder();
StringBuilder fields = new StringBuilder();
StringBuilder values = new StringBuilder();
List<hintitem> hints = this.ColumnHints.Where
(a => parameters.Any(b => b.ParameterName == a.DbValue)).ToList();
foreach(HintItem hint in hints)
{
fields.AppendFormat("{0}{1}", (fields.Length>0)?",":"", hint.DbName).AppendLine();
values.AppendFormat ("{0}{1}", (fields.Length>0)?",":"", hint.DbValue).AppendLine();
}
query.AppendFormat("INSERT INTO [{0}].[dbo].[{1}]",
this.InitialCatalog,
this.ImportedData.TableName).AppendLine();
query.AppendLine("(");
query.Append(fields);
query.AppendLine(")").AppendLine("VALUES").AppendLine("(");
query.Append(values);
query.AppendLine(");");
this.InsertQuery = query.ToString();
}
The remaining methods in the class are helper methods that determine what the database-equivalant data type is for the hinted columns, and actually calling the SQL queries that perform the database functionality.
HintItem and HintItemList
The HintItem
class allows the importer class to automatically establish appropriate data types for imported columns. The most interesting part of the class is the DetermineType()
method.
If the type is not already determined to be a string, we go ahead and try to parse it into a reasonable type. As you can see, whether we're importing an Excel or CSV file plays a part in type determination. Also, this code only supports one integer type (Int32
). This is probably a deficiency that will eventually come back to bite me in the ass, but it should be easy to address if the need arises.
public void DetermineType(object value, bool fromCSV=false)
{
if (this.ItemType == null || !this.ItemType.Name.IsLike("String"))
{
Type theType = this.ItemType;
if (value is string && !fromCSV)
{
theType = typeof(String);
}
else
{
string valueStr = value.ToString();
DateTime parsedDateTime;
Int32 parsedInt32;
double parsedDouble;
bool parsedBool;
if (DateTime.TryParseExact(valueStr, DateFormatStrings.Formats, CultureInfo.CurrentUICulture, DateTimeStyles.None, out parsedDateTime))
{
theType = typeof(DateTime);
}
else if (Int32.TryParse(valueStr, out parsedInt32))
{
if (theType == null || !theType.Name.IsLike("Double"))
{
theType = typeof(Int32);
}
}
else if (double.TryParse(valueStr, out parsedDouble))
{
theType = typeof(Double);
}
else if (bool.TryParse(valueStr, out parsedBool))
{
theType = typeof(Boolean);
}
else
{
theType = typeof(String);
}
}
this.ItemType = theType;
}
}
ExcelImportBase
The original implementation of this class contained everything in the ImportBase
class, because I was only going to do an importer for Excel.
(Blunt honesty section) - Then, I figured I should do an importer for CSV files as well, and thought, "I'll just refer to the CSV file parser article I wrote last year, and let the programmer be a programmer and mold that article's code to his needs". After writing the Excel importer class, I figured that a certain group of programmers would be offended by my lazy attitude (and the fact that they would have to do some actual "work"), so I decided to mold that article myself (described in the next section). That's why we now have three abstract classes instead of just one.
This class is also abstract, meaning you have to inherit it with another class that YOU have to write. Since this is all about supporting programmer written packages for SQLXAgent, I'm assuming that's why you're going to be doing this, but it can certainly be used in other projects that you might dream up.
Since most of the code required is in the ImportBase
class, there's very little that had to be done in this class. The most interesting part of this class is that I used a NuGet-available library, called ExcelDataReader (and its companion extension library, ExcelDataReader.DataSet), to facilitate the actual load-the-file requirement. This library reads an Excel file into a DataTable
object. But that's all it does. It does not appear to make any attempt to determine appropriate data types, and if I recall, everything is stored in the table as an object
. This was entirely unhelpful, and led me to develop the column hints code discussed above. The following is the entire contents of the class.
public abstract class ExcelImportBase : ImporterBase
{
public ExcelImportBase(string filename, string sheetName, string connectionString):base(filename, sheetName, connectionString)
{
this.FileType = ImportFileType.EXCEL;
this.FileName = filename;
this.SheetName = sheetName;
this.Init();
}
~ExcelImportBase()
{
}
protected override void Init()
{
base.Init();
this.FileType = ImportFileType.EXCEL;
}
public override void Import()
{
if (this.CanImport)
{
this.ImportedData = this.ReadSheet(this.FileName, this.SheetName);
base.ProcessData();
this.UpdateDatabase();
}
}
protected virtual DataTable ReadSheet(string filename, string sheetName)
{
DataSet ds = null;
DataTable data = null;
try
{
using (System.IO.FileStream stream = File.Open(filename, FileMode.Open, FileAccess.Read))
{
using (var reader = ExcelDataReader.ExcelReaderFactory.CreateReader(stream))
{
do
{
if (reader.Name.IsLike(sheetName))
{
ds = reader.AsDataSet();
}
} while (reader.NextResult());
}
if (ds != null)
{
data = ds.Tables[0];
}
}
}
catch (Exception ex)
{
throw new SQLXAgentException(SQLXExceptionEnum.ExcelDataReaderError, ex);
}
return data;
}
protected abstract override SqlParameter[] GetRowParameters(int rowIndex);
}
The only abstract component of the class is the GetRowParameters()
method. This is where you will specify which columns to actually import. Overriding this method with an empty method should cause all columns to be imported.
CSVImportBase
This class closely resembles my original CSV Parser article, and was modified to fit the new application-specific paradigm.
ATTENTION NEW PROGRAMMERS - this is how you make use of CodeProject articles written by someone else. You inspect the code to evaluate its fitness for use in your project, and if necessary, make changes that adapt it to your own usage. A well-written article will be all the guidance you need. Even if it's NOT well-written, take the time to evaluate the code, add necessary comments, and go on with your life. You'll be a better programmer simply by making a little extra effort on your own. And now, back to our story...
I had to retain the code responsible for loading and parsing of the data, but adapted the code to store the results in a DataTable
object for use by the ImportBase
class. Because there was so much involved in loading the file (I'm relying on my own code instead of having a library to do it for me), this class is noticeably larger than the ExcelImportBase
class, and involved additional ExcelImportBase
class, and involved additional CSV-only properties and methods (that were in the article from which I got the code). Instead of boring you with crap I wrote in the old article, I'll simply mention the new method I had to write to store the imported data into a DataTable
object.
protected virtual void AddToDataTable(string[] parts)
{
if (this.ImportedData.Columns.Count == 0)
{
for (int i = 0; i < parts.Length; i++)
{
this.ImportedData.Columns.Add(string.Format("Column_{0}", i+1), typeof(object));
}
}
DataRow row = this.ImportedData.NewRow();
for (int i = 0; i < parts.Length; i++)
{
string part = parts[i];
if (!string.IsNullOrEmpty(this.CurrencySymbol))
{
part = (part.StartsWith(this.CurrencySymbol) || part.EndsWith(this.CurrencySymbol)) ? part.Replace(this.CurrencySymbol, "").Replace(",", "") : part;
}
row[i] = parts[i].Replace("\"","").Trim();
}
this.ImportedData.Rows.Add(row);
}
Once the data table is populated, the ImportBase
code kicks in to perform the rest of the import functionality.
Writing Your Own Importer
Since the three import classes discussed above are abstract, this means you are required to write your own importer class in order to perform an import operation. Since so much of the work is already done for you, this requires trivial effort on the part of the programmer.
Even though the two following classes are nearly identical, in the spirit of completeness, I included the all the code for each class.
Sample Excel Importer
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Data.SqlClient;
using System.Data;
using SQLXCommon;
using SQLXPkgBase;
namespace PkgSampleImportExcel2
{
public class SampleExcelImporter : ExcelImportBase
{
public SampleExcelImporter(string filename, string sheetName, string connectionstring)
: base(filename, sheetName, connectionstring)
{
this.Init();
}
protected override void Init()
{
base.Init();
}
protected override void CreateColumnHints(bool fromCSV=false)
{
base.CreateColumnHints(fromCSV);
}
protected override SqlParameter[] GetRowParameters(int rowIndex)
{
int id;
string comment;
DateTime theDate;
double floatValue;
int intValue;
string ipAddr;
id = (int) this.GetColumnValue(rowIndex, "ID");
comment = (string) this.GetColumnValue(rowIndex, "Comment");
theDate = (DateTime)this.GetColumnValue(rowIndex, "TheDate");
floatValue = (double) this.GetColumnValue(rowIndex, "FloatValue");
intValue = (int) this.GetColumnValue(rowIndex, "IntValue");
ipAddr = (string) this.GetColumnValue(rowIndex, "IPAddress");
SqlParameter[] parameters = new SqlParameter[]
{
new SqlParameter(this.ColumnHints.First(x=>x.Name == "ID" ).DbValue, id),
new SqlParameter(this.ColumnHints.First(x=>x.Name == "Comment" ).DbValue, comment),
new SqlParameter(this.ColumnHints.First(x=>x.Name == "TheDate" ).DbValue, theDate),
new SqlParameter(this.ColumnHints.First(x=>x.Name == "FloatValue").DbValue, floatValue),
new SqlParameter(this.ColumnHints.First(x=>x.Name == "IntValue" ).DbValue, intValue),
new SqlParameter(this.ColumnHints.First(x=>x.Name == "IPAddress" ).DbValue, ipAddr),
};
return parameters;
}
}
}
Sample CSV Importer
using SQLXPkgBase;
using System;
using System.Collections.Generic;
using System.Data.SqlClient;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace PkgSampleImportCSV
{
public class SampleCSVImporter : CsvImportBase
{
public SampleCSVImporter(string filename, string connectionstring)
: base(filename, connectionstring)
{
this.Init();
}
protected override void Init()
{
base.Init();
}
protected override void CreateColumnHints(bool fromCSV=false)
{
base.CreateColumnHints(fromCSV);
}
protected override SqlParameter[] GetRowParameters(int rowIndex)
{
int id;
string comment;
DateTime theDate;
double floatValue;
int intValue;
string ipAddr;
id = (int) this.GetColumnValue(rowIndex, "ID");
comment = (string) this.GetColumnValue(rowIndex, "Comment");
theDate = (DateTime)this.GetColumnValue(rowIndex, "TheDate");
floatValue = (double) this.GetColumnValue(rowIndex, "FloatValue");
intValue = (int) this.GetColumnValue(rowIndex, "IntValue");
ipAddr = (string) this.GetColumnValue(rowIndex, "IPAddress");
SqlParameter[] parameters = new SqlParameter[]
{
new SqlParameter(this.ColumnHints.First(x=>x.Name == "ID" ).DbValue, id),
new SqlParameter(this.ColumnHints.First(x=>x.Name == "Comment" ).DbValue, comment),
new SqlParameter(this.ColumnHints.First(x=>x.Name == "TheDate" ).DbValue, theDate),
new SqlParameter(this.ColumnHints.First(x=>x.Name == "FloatValue").DbValue, floatValue),
new SqlParameter(this.ColumnHints.First(x=>x.Name == "IntValue" ).DbValue, intValue),
new SqlParameter(this.ColumnHints.First(x=>x.Name == "IPAddress" ).DbValue, ipAddr),
};
return parameters;
}
}
}
How to Include in Your Own Projects
I suspect that some of you will decide that you would like to utilize this code in your own projects. To that end, this section is for you. To use these files in your own projects, the absolute easiest way forward is as follows.
In Windows Explorer:
- Copy the SQLXPkgBase folder to your solution folder.
- Copy the SQLXCommon folder to your solution folder.
- Rename the folders and files (optional) to your own naming scheme/preferences
- Copy the 3rdParty folder to your solution folder.
- The only thing you need from the 3rdParty folder (for this task) are the
ExcelDataReader*.*
files.
In Visual Studio:
- Add the Common and PkgBase projects to your solution.
- Resolve broken references in the (renamed) PkgBase project. The ExcelDataReader and SQLXCommon references are the ones that will be broken).
- Resolve naming issues. Most of the problems will arise if you renamed the namespace in the Common project. Change all references to
SQLXCommon
to whatever you called your common namespace.
- You can safely exclude the
SQLXAgentPkgBase
class from the (renamed) PkgBase project.
- There are other ways to prune the code to its bare minimum, but I leave that was an exercise for the programmer.
In Closing
As I cited above, I only wrote the Excel importer because it made sense, given the purpose of a SQLXAgent package, and the fact that a lot of your typical DTSX packages in an enterprise environment involves importing data from an Excel file. (More brutal honesty) On the other hand, I wrote the CSV importer code simply to keep the ungrateful bastards out there in the audience from harping on me for providing what they might deem as an "incomplete solution". I make absolutely no apologies for my attitude in this regard. My general feeling is that if you want to be coddled and taken by the hand, go back to pre-school.
History
- 29 Sep 2017 - Initial publication.