Introduction
The class needs a table on a SQL Server somewhere to import into, a connection to that database, a file to load, the names of the fields to load the data into and the possessions in the file to load. I set this up so that everything can come from application settings, and put a quick WPF file browser front end together to demonstrate how to use the class.
Background
Everywhere I have worked, it has been important to be able to load fixed width files from mainframe reports that come from data sources that can't or at-least aren't provided by web services, ODBC or other sources. Every time there is a new version of Visual Studio, this gets a little easier to write. This is my latest class to do the import.
Using the Code
The simplest way to use my BulkLoad
class is to make setting strings for what needs to be loaded and pass them to the BulkLoadFromStrings
function. In this example I have set up everything as app settings.
This is what the config file looks like:
<configuration>
<configsections>
<sectiongroup name=""userSettings"" publickeytoken="b77a5c561934e089""
culture="neutral," version="2.0.0.0,"
type=""System.Configuration.UserSettingsGroup,">
<section name=""Bulk_Import.Properties.Settings""
publickeytoken="b77a5c561934e089"" culture="neutral," version="2.0.0.0,"
type=""System.Configuration.ClientSettingsSection,"
requirepermission=""false"" allowexedefinition=""MachineToLocalUser"">
</sectiongroup>
</configsections>
<usersettings>
<bulk_import.properties.settings>
<setting name=""TableName"" serializeas=""String"">
<value>drcAlden</value>
</setting>
<setting name=""FieldNamesCSV"" serializeas=""String"">
<value>CLSNumber,YrPeriod,Vendor,Department,BoxCode,StoreNumber,StoreName,
TypeScan,UPC , Manufac,Description,Qty ,Base ,Credit ,GSNumber</value>
</setting>
<setting name=""DefaultDir"" serializeas=""String"">
<value>C:\</value>
</setting>
<setting name=""DefaultFile"" serializeas=""String"">
<value>FixedWidthFile.txt</value>
</setting>
<setting name=""FormatCSV"" serializeas=""String"">
<value>4-8 ,10-14 ,15-22 ,23-26 ,27-36 ,38-48 ,48-77 ,
78-80 ,80-94 ,94-103 ,103-123 ,124-133 ,133-146 ,147-160,161-170</value>
</setting>
</bulk_import.properties.settings>
</usersettings>
<connectionstrings>
<add name=""DamageReclaimEntities"" security="True;MultipleActiveResultSets=False"""
catalog="Database;Integrated" source="Server;Initial"
string=""Data" providername=""System.Data.EntityClient""
connectionstring=
""metadata=res://*/Model1.csdl|res://*/Model1.ssdl|res://*/
Model1.msl;provider=System.Data.SqlClient;provider">
<add name=""Bulk_Import.Properties.Settings.DamageReclaimConnectionString""
security="True"" catalog="Database;Integrated"
source="DBServer;Initial" providername=""System.Data.SqlClient""
connectionstring=""Data">
</connectionstrings>
</configuration>
The app settings were just added by right clicking on the project file though, so, don't worry about having to retype a bunch of perfectly valid XML unless you are into all that.
This BulkLoadFromStrings
function basically calls the class's parse routines on these string
s and uses them to Load the file.
public int BulkLoadFromStrings ( string TableName_in
, string FieldNamesCSV_in
, string FieldPositionCSV_in
, string FileNameAndPath_in
, string SQLConnectionString_in
)
{
int iRecordsLoadedOut = 0;
BulkLoadFields blfldsToLoad;
BulkLoadTable bltblToLoad;
SqlConnection con = null;
Stream strmToLoad = null;
DataTable dtToLoad = null;
int iRecordsPreLoaded = 0;
con = MakeConnection(SQLConnectionString_in);
blfldsToLoad = new BulkLoadFields(FieldNamesCSV_in, FieldPositionCSV_in);
bltblToLoad = MakeBulkLoadTable(TableName_in, con, blfldsToLoad.FieldNames);
strmToLoad = FileToStream(FileNameAndPath_in);
dtToLoad = bltblToLoad.Table;
iRecordsPreLoaded = LoadTable(ref dtToLoad, blfldsToLoad.FieldPositions,
strmToLoad);
strmToLoad.Close();
strmToLoad = null;
bltblToLoad.Table = dtToLoad;
iRecordsLoadedOut = DoBulkLoad(bltblToLoad, con);
AddExportDate(FileNameAndPath_in, con.Database.ToString(),
con.DataSource.ToString());
return iRecordsLoadedOut;
}
There are 2 structures that the class uses to pass things around. The BulkLoadFields
structure is where the collection of field positions and names are stored, and the BulkLoadTable
is simply a datatable with these SQLBulkInsert
maps set up for each of its columns. These maps turned out to be extremely important to getting the Bulk insert to work.
Because the SQLBulkCopy
object seemed to be so sensitive to datatype and data size differences, the best solution seemed to be to create a table to load based on an existing table. I saw other examples wherein data readers did something similar, and it seems like there should be a better way (or at least one less specific to SQL Server) to get a table definition, but selecting the description from SQL's system tables has worked well for me.
public BulkLoadTable MakeBulkLoadTable ( string TableName
, System.Data.SqlClient.SqlConnection con
, string[] aFilesColumns
)
{
BulkLoadTable bltOut;
bltOut = new BulkLoadTable();
DataTable dtOut = null;
SqlBulkCopyColumnMappingCollection sbcMapsOut = null;
System.Data.SqlClient.SqlBulkCopy sbc = null;
SortedList slFields = null;
string[] aFilesFormat = null;
string strFieldDFormatCSV = string.Empty;
string strColumnName = string.Empty;
string strColumnType = string.Empty;
string strFromFile = string.Empty;
string strFile = string.Empty;
SqlDataReader sdr = null;
SqlDbType FieldType = SqlDbType.Variant;
SqlCommand sqlCmd = null;
int iCharacterMaximumLength = 0;
StringBuilder sbInsertFields = null;
StringBuilder sbSelectFields = null;
sqlCmd = new SqlCommand(" select Column_NAme, Data_Type, ISNULL(
Character_Maximum_Length, 0) Max " +
" from information_schema.columns " +
" where table_name like '" + TableName + "'", con);
sqlCmd.CommandType = CommandType.Text;
if (sqlCmd != null)
{
aFilesFormat = strFieldDFormatCSV.Split(char.Parse(","));
if (aFilesFormat != null)
{
if (con.State != ConnectionState.Open)
{
con.Open();
}
sdr = sqlCmd.ExecuteReader(CommandBehavior.CloseConnection);
if (sdr != null)
{
sbSelectFields = new StringBuilder();
foreach (string strColName in aFilesColumns)
{
if (sbSelectFields.Length!=0)
{
sbSelectFields.Append(",");
}
sbSelectFields.Append("[");
sbSelectFields.Append(strColName.Trim());
sbSelectFields.Append("]");
}
SqlDataAdapter sdaForWrite = new SqlDataAdapter(
"Select " + sbSelectFields.ToString() + " from " +
TableName + " where 1= -1", con);
sbc = new System.Data.SqlClient.SqlBulkCopy(con);
sbcMapsOut = sbc.ColumnMappings;
sbInsertFields = new StringBuilder("INSERT INTO ");
sbInsertFields.Append(TableName);
sbInsertFields.Append(" (");
sbInsertFields.Append(sbSelectFields.ToString());
sbInsertFields.Append(") VALUES (");
sbInsertFields.Append(sbSelectFields.Replace(
"[", "@").Replace("]", ""));
sbInsertFields.Append(")");
sdaForWrite.InsertCommand = new SqlCommand();
sdaForWrite.InsertCommand.CommandText = sbInsertFields.ToString();
sbInsertFields = null;
slFields = new SortedList();
if (sdr.HasRows)
{
if (sdr.Read())
{
do
{
strColumnName = sdr["Column_NAme"].ToString().Trim();
if (HasColumn(aFilesColumns, strColumnName))
{
strColumnType = sdr["Data_Type"].ToString();
if (sdr["Max"] != null)
{
iCharacterMaximumLength = (int)sdr["Max"];
}
else
{
iCharacterMaximumLength = 0;
}
switch (strColumnType)
{
case "bigint":
FieldType = SqlDbType.BigInt;
break;
case "int":
FieldType = SqlDbType.Int;
break;
case "smallint":
FieldType = SqlDbType.SmallInt;
break;
case "tinyint":
FieldType = SqlDbType.TinyInt;
break;
case "bit":
FieldType = SqlDbType.Bit;
break;
case "decimal":
FieldType = SqlDbType.Decimal;
break;
case "money":
FieldType = SqlDbType.Money;
break;
case "smallmoney":
FieldType = SqlDbType.SmallMoney;
break;
case "nchar":
FieldType = SqlDbType.NChar;
break;
case "ntext":
FieldType = SqlDbType.NText;
break;
case "nvarchar":
FieldType = SqlDbType.NVarChar;
break;
case "datetime":
FieldType = SqlDbType.DateTime;
break;
case "smalldatetime":
FieldType = SqlDbType.SmallDateTime;
break;
case "char":
FieldType = SqlDbType.Char;
break;
case "varchar":
FieldType = SqlDbType.VarChar;
break;
}
sdaForWrite.InsertCommand.Parameters.Add(
"@" + strColumnName, FieldType,
iCharacterMaximumLength, strColumnName);
sbcMapsOut.Add(new SqlBulkCopyColumnMapping(
strColumnName, strColumnName));
}
}
while (sdr.Read());
}
}
sdr.Close();
DataSet ds = new DataSet();
sdaForWrite.Fill(ds, TableName);
dtOut = ds.Tables[TableName];
sdr = null;
bltOut = new BulkLoadTable(dtOut, sbcMapsOut);
}
}
}
return bltOut;
}
With a table and fields defined, the next step is to load the data from the flat file. Having already loaded the file into a stream, I basically split it up by the end of line character and then loop through the resulting array of strings to parse out the fields from each line.
public int LoadTable ( ref DataTable TableToLoad
, SortedList<string,> FieldPositions_in
, Stream FiletoLoad
)
{
int iRecordsLoadedOut = 0;
string strSQLCon = string.Empty;
string strTableName = string.Empty;
DataRow dr = null;
string[] astrFile = null;
string strFromFile = string.Empty;
string strFld = string.Empty;
string strFile = string.Empty;
int iStart = 0;
int iEnd = 0;
int iFileLen = 0;
StreamReader sr = null;
iFileLen = (int)FiletoLoad.Length;
sr = new StreamReader(FiletoLoad,System.Text.Encoding.UTF8,false,iFileLen);
if (FiletoLoad != null)
{
if (sr != null)
{
strFile = sr.ReadToEnd();
sr.Close();
sr = null;
if (strFile.Contains("\r"))
{
strFile.Replace("\n", "");
astrFile = strFile.Split(Char.Parse("\r"));
}
if (astrFile != null)
{
foreach (string strLine in astrFile)
{
if (strLine != "\n")
{
dr = TableToLoad.NewRow();
foreach (DataColumn col in TableToLoad.Columns)
{
strFld = col.ColumnName;
dr[strFld.Trim()] = DBNull.Value;
iStart = FieldPositions_in[strFld].Start;
iEnd = FieldPositions_in[strFld].End;
if ((iStart <= strLine.Replace("\n", "").Length) &
(iStart + (iEnd - iStart) <=
strLine.Replace("\n", "").Length))
{
strFromFile = strLine.Replace("\n",
"").Substring(iStart, iEnd - iStart).Trim();
if (strFromFile.Length > 0)
{
if (TableToLoad.Columns[strFld.Trim()] != null)
{
if (strFromFile != string.Empty)
{
strFromFile = strFromFile.Replace(
"'", "''");
if (strFromFile != ".")
{
dr[strFld.Trim()] =
strFromFile.Trim().ToString();
}
}
}
}
}
}
TableToLoad.Rows.Add(dr);
}
}
}
}
}
return iRecordsLoadedOut;
}
The datatable that is loaded by LoadTable
is then passed back into the BulkLoadTable
structure so that its columns can stay married up to the corresponding SqlBulkCopyColumnMappingCollection
, and after WriteToServer
is called, the records appear in the new table.
public int DoBulkLoad(BulkLoadTable bct_in, SqlConnection Con_in)
{
int iRecordsLoadeOut = 0;
SqlBulkCopy sbc = null;
SqlBulkCopyColumnMappingCollection sbcMaps = null;
DataTable dtToLoad =null;
string strTableName = null;
dtToLoad = bct_in.Table;
strTableName = dtToLoad.TableName;
sbc = new SqlBulkCopy(Con_in);
sbcMaps = bct_in.Maps;
foreach (SqlBulkCopyColumnMapping map in sbcMaps)
{
sbc.ColumnMappings.Add(map);
}
sbc.BatchSize = dtToLoad.Rows.Count;
sbc.DestinationTableName = strTableName;
sbc.BulkCopyTimeout = 0;
if (Con_in.State != ConnectionState.Open)
{
Con_in.Open();
}
sbc.WriteToServer(dtToLoad, DataRowState.Added);
if (Con_in.State != ConnectionState.Closed)
{
Con_in.Close();
}
iRecordsLoadeOut = dtToLoad.Rows.Count;
return iRecordsLoadeOut;
}
Points of Interest
I broke the load
function out of the rest of this process so that if there are some special validations and/or transformations that need to happen on the individual fields, this can be done right after the data is loaded into the ADO data table, but, before it is sent to the server. Also, at some point, it might make sense to override the load
or make table functions for special tables or other file formats.
Though this example walks through parsing all the settings from strings, there is no reason another app might want to load the field possessions structure differently, or use an FTP sockets stream to get a file instead of reading it from a disk.
Hopefully in a future ADO release, this gets easier and tables will get better at knowing their own schema information.
History
- 3rd March, 2009: Initial post