This article show you how to dynamically build SQL insert statements from a set of data tables inside a data set and run them in parametrized batches to load the data into MySql Server tables with a huge performance than running it row by row.
Introduction
The aim of this code is to dynamically build SQL insert
statements from a set of data tables inside a data set and run them in parametrized batches (by hundreds, thousands, etc.) in order to load the data into a MySql Server tables with a huge performance than running it row by row.
Destination tables of course, must have the same metadata as the source system where the dataset has been populated.
Background
I have developed this code to optimize and improve our current integration software core.
This software integrates and syncs data from different ERP systems and different e-commerce platforms, so its constantly moving data from (usually MS SQL Server and Oracle databases) to MySQL servers databases.
As the software is being constantly parametrized with new requirements or new ERP systems with different structures, data access and destination requires new entities, in addition to the existing ones, so we need a routine to easily adapt to these new columns, tables or systems and bulk load this data into a middle staging area in order to sync it with the e-commerce platform.
Using the Code
The first step is to populate the set of DataTables
with data and add them into the DataSet
.
So, supposing DataTables
are already populated.
GetDataSet()
is a functiĆ³n where you can add all the DataTables
you want to load into MySQL and returns them in a DataSet
.
Each DataTable
will correspond to a entire table definition a subset of columns (defined by the query that populates the dataset
of the source database.
public DataSet GetDataSet()
{
DataSet ds = new DataSet();
DataTable dtProducts = new DataTable();
DataTable dtCustomers = new DataTable();
dtProducts .TableName = "Products";
dtCustomers .TableName = "Customers";
ds.Tables.Add(dtProducts );
ds.Tables.Add(dtCustomers );
return ds;
}
BuildInsertQueriesByBatch()
is the main function of the article. It is responsible for building and concatenating the rows from each DataTable
.
- First parameter is the
DataTable
with the rows to load into MySql. Batch
integer parameter is the number of rows we want to load for each batch, and the key of this function, so depending on its value, the process will get its better performance, so if the DataSet
has 2000 rows and the value of the batch
var is 200... the process will gather these 200 rows and insert them with one instruction, inserting the 2000 rows in 10 instructions reducing the total time tremendously.
The batch
number, must be adjusted depending on some factors, like the length of each row, type of data it has (if there are long strings, batch
number should be shorter), speed of the destination server, etc.
public string BuildInsertQueriesByBatch(DataTable dt, int batch)
{
string sReturnSql = "";
string sTableName = "";
int nPosition = 0;
int nRows = 0;
sTableName = dt.TableName;
sReturnSql += "TRUNCATE TABLE " + sTableName + ";";
foreach (DataRow dr in dt.Rows)
{
sReturnSql += "INSERT INTO " + sTableName + "(";
nPosition = 0;
foreach (DataColumn column in dt.Columns)
{
if (nPosition == 0)
{
sReturnSql += "" + column.ColumnName + " ";
nPosition += 1;
}
else
{
sReturnSql += "," + column.ColumnName + " ";
}
}
sReturnSql += ")";
sReturnSql += "VALUES(";
nPosition = 0;
foreach (DataColumn column in dt.Columns)
{
if (nPosition == 0)
{
sReturnSql += this.DefineInsertType
(column.DataType.ToString(), dr[column.ColumnName].ToString());
nPosition += 1;
}
else
{
sReturnSql += "," + this.DefineInsertType
(column.DataType.ToString(), dr[column.ColumnName].ToString());
}
}
sReturnSql += ");";
nRows++;
if (nRows == batch)
{
ExecuteRequest(sReturnSql);
sReturnSql = "";
nRows = 0;
}
}
ExecuteRequest(sReturnSql);
nRows = 0;
return sReturnSql;
}
DefineInsertType()
function formats the different data types into the right format in order to add them into the insert
statement, based on the MySQL standards.
private object DefineInsertType(string sType, string sReceived)
{
object oReturn;
oReturn = null;
sReceived = this.FormatValue(sReceived.ToString());
if (sType.ToLower().IndexOf("int") != -1)
{
if (sReceived == null || sReceived.ToString() == string.Empty)
{
oReturn = 1;
}
else
{
oReturn = sReceived;
}
}
if (sType.ToLower().IndexOf("system.byte") != -1)
{
if (sReceived == null || sReceived.ToString() == string.Empty)
{
oReturn = 1;
}
else
{
oReturn = sReceived;
}
}
if (sType.ToLower().IndexOf("char") != -1)
{
oReturn = "'" + sReceived + "'";
}
if (sType.ToLower().IndexOf("string") != -1)
{
oReturn = "'" + sReceived + "'";
}
if (sType.ToLower().IndexOf("datetime") != -1)
{
oReturn = "'" + sReceived + "'";
}
if (sType.ToLower().IndexOf("decimal") != -1)
{
oReturn = sReceived.ToString().Replace(",", ".");
}
if (sType.ToLower().IndexOf("uniqueidentifier") != -1)
{
oReturn = "'" + sReceived.ToString().Replace(".", ",") + "'";
}
if (sType.ToLower().IndexOf("image") != -1)
{
oReturn = "'" + sReceived + "'";
}
if (sType.ToLower().IndexOf("money") != -1)
{
if (sReceived == null || sReceived.ToString() == string.Empty)
{
oReturn = 0;
}
else
{
oReturn = sReceived;
}
}
if (sType.ToLower().IndexOf("text") != -1)
{
if (sReceived == null || sReceived.ToString() == string.Empty)
{
oReturn = "''";
}
else
{
oReturn = "'" + sReceived.ToString().Replace("'", "''") + "'";
}
}
if (sType.ToLower().IndexOf("bool") != -1)
{
oReturn = "'" + sReceived + "'";
}
if (sType.ToLower().IndexOf("bit") != -1)
{
oReturn = "'" + sReceived + "'";
}
if (sType.ToLower().IndexOf("uniqueidentifier") != -1)
{
oReturn = "'" + sReceived + "'";
}
if (string.IsNullOrEmpty(sType))
{
oReturn = "'" + sReceived + "'";
}
return oReturn;
}
ExecuteRequest()
function executes the number of insert
statement gathered by the batch var
value.
To run this function, we use the Connector/Net driver for MySQL available on this link:
public int ExecuteRequest(string sRequest)
{
MySql.Data.MySqlClient.MySqlConnection oMySqlConnection;
MySql.Data.MySqlClient.MySqlCommand oSqlCommand;
int i = -1;
oSqlCommand = new MySql.Data.MySqlClient.MySqlCommand();
oMySqlConnection = new MySql.Data.MySqlClient.MySqlConnection("MySqlConnectionString");
try
{
oSqlCommand.Connection = oMySqlConnection;
oSqlCommand.CommandType = System.Data.CommandType.Text;
oSqlCommand.CommandText = sRequest;
if (oMySqlConnection.State == System.Data.ConnectionState.Closed)
oMySqlConnection.Open();
oSqlCommand.ExecuteNonQuery();
oSqlCommand.Connection.Close();
oMySqlConnection.Close();
i = 1;
}
catch (System.Data.SqlClient.SqlException ex)
{
i = -1;
oMySqlConnection.Close();
}
return i;
}
So having all these required functions with the logic to generate the multiple insert
statements, we just need to call the main function BuildInsertQueriesByBatch()
with something like this:
public void ExecuteExportNew()
{
DataSet ds = new DataSet();
ds = GetDataSet();
BuildInsertQueriesByBatch(ds.Tables["Products"], 200);
BuildInsertQueriesByBatch(ds.Tables["Customers"], 100);
}
We call the BuildInsertQueriesByBatch()
function twice, one for each DataTable
. If we had more, we will need to add an instruction for each table from the DataSet
, with the DataTable
name and the desired batch
number value.
Using the Code
Just include the functions in your program and call the functions.
History
- 11th July, 2016: Initial version