Introduction
SQL Server Express Edition is a free version DBMS which does not have Data Synchronization (DTS) feature. These days, most of the low-budget Window application projects are developed with SQL Express edition where this limitation obstructs the user to move further when their application requires DTS. This article provides various methods to resolve this problem.
Method-I: Data Synchronization using TableDiff Utility
We can use TableDiff utility to generate a Transact-SQL script (containing delete
/insert
/update
statements) to fix discrepancies at the destination server to bring the source and destination tables into convergence. Since this utility compares one table at a time, we need to call it in a loop in case we have N number of tables to synchronize. Within the loop, we have to accumulate (i.e., append) the generated Transact-SQL script into a local file (say CompleteFixSQL.sql). At end of the loop, we have a complete script file that needs to be executed at destination server. sqlcmd utility can be used to run that script file (CompleteFixSQL.sql) into the destination server to bring the source and destination tables into convergence.
Implementation of Method-I
We have implemented this method using the following DOS batch files:
GenerateFixSQLScriptMain.bat: This main DOS shell program will loop through a list of tables (to be synchronized) and for each table it will call GenerateFixSQLScriptSub.bat to generate the FixSQL script and append them as a single CompleteFixSQL.sql.
GenerateFixSQLScriptSub.bat: This DOS shell program will compare the source and destination table and generate the FIXSQL script file using TableDiff utility.
Offline_Synchronization.bat: This DOS shell program will run the CompleteFixSQL.sql (generated by the above GenerateFixSQLScriptMain.bat) script using SQLCMD command line utility.
Call_DTS.bat: This DOS shell program will call GenerateFixSQLScriptMain.bat & Offline_Synchronization.bat to generate the script & perform data synchronization respectively.
Extract the above zip file in a folder. Modify the Call_DTS.bat batch file based on your settings (i.e., change the input parameters such as server name, database name, user name, password, comma separated list of tables to be synchronized). Then execute it in DOS prompt.
The program flow of these DOS batch files are not explained in this article as our main intention is to explain other methods (Method II & III).
Drawbacks of Method-I
- Both
TableDiff
& sqlcmd
utilities are external applications that need to be called from client application code. TableDiff
needs to be called N times if we have N number of tables to be synchronized which incurs I/O overhead. Sqlcmd
utility executes statements that are contained in the CompleteFixSQL
script one by one in a sequence manner which is a time consuming process in case we have large data to be synchronized at destination. TableDiff
utility has some limitations. It would not generate FIX script file for LOB datatypes such as text
, ntext
& image
.
Method-II: Data Synchronization using Join Mechanism
A Stored Procedure (SP) with SQL join
mechanism can be used to compare the difference between source and destination tables and then synchronize the destination tables with source table data. To bring the source and destination tables into convergence,
- Find the records that need to be deleted from Destination database table
- Find the records that need to be inserted into Destination database table
- Find the records that need to be updated in the Destination database table
Subsequently, we have to execute delete
, insert
and update
statements in the destination database for the records that are found in the above steps 1, 2 & 3 respectively.
Step 1: Records to be Deleted from Destination Database Table
- Select the records that do not exist in Source database table, but exist in the Destination database table
- Then delete them from the Destination database table
delete from
DestinationDB.dbo.TableName DestinationDBTable
where
not exists
(select
1
from
SourceDB.dbo.TableName SourceDBTable
where
SourceDBTable.PrimaryColumnName1 = DestinationDBTable.PrimaryColumnName1 and
SourceDBTable.PrimaryColumnName2 = DestinationDBTable.PrimaryColumnName2 and
...
SourceDBTable.PrimaryColumnNameN = DestinationDBTable.PrimaryColumnNameN
)
If the table contains an Identity
column, then we can simply use that column rather than primary key column in the join
condition of the WHERE
clause. This will reduce the size of the join
condition especially when the table having composite primary keys and an Identity
column. Also it will improve the performance of the delete
statement.
delete from
DestinationDB.dbo.TableName DestinationDBTable
where
not exists
(select
1
from
SourceDB.dbo.TableName SourceDBTable
where
SourceDBTable.IdentityColumn = DestinationDBTable.IdentityColumn
)
Step 2: Records to be Inserted into Destination Database Table
- Select the records that exist in the Source database table, but do not exist in the Destination database table.
- Then insert them into the Destination database table.
insert into
DestinationDB.dbo.TableName DestinationDBTable
(ColumnList)
select
SourceDBTable.Columnlist
from
SourceDB.dbo.TableName SourceDBTable
where
not exists
(select
1
from
DestinationDB.dbo.TableName DestinationDBTable
where
DestinationDBTable.PrimaryColumnName1 = SourceDBTable.PrimaryColumnName1 and
DestinationDBTable.PrimaryColumnName2 = SourceDBTable.PrimaryColumnName2 and
...
DestinationDBTable.PrimaryColumnNameN = SourceDBTable.PrimaryColumnNameN
)
Column with TimeStamp
datatype should be excluded from the Column list of the above insert
statement as we cannot explicitly set values for TimeStamp
column.
As specified in the Step 1, we can use identity column rather than primary key column as follows:
insert into
DestinationDB.dbo.TableName DestinationDBTable
(ColumnList)
select
SourceDBTable.Columnlist
from
SourceDB.dbo.TableName SourceDBTable
where
not exists
(select
1
from
DestinationDB.dbo.TableName DestinationDBTable
where
DestinationDBTable.IdentityColumn = SourceDBTable. IdentityColumn
)
If the table having identity column then the above insert
statement must be enclosed by the “set identity_insert
on/off” as follows:
set identity_insert TableName On
... above insert statement
set identity_insert TableName off
Step 3: Records to be Updated in the Destination Database Table
- Select the records that are differ from Source & Destination database table.
- Then update them in the Destination database table with the source database table data.
update
TableName
set
ColumnName1 = SourceDBTable.ColumnName1,
ColumnName2 = SourceDBTable.ColumnName2,
...
ColumnNameN = SourceDBTable.ColumnNameN
from
DestinationDB.dbo.TableName DestinationDBTable,
(
select
max(TableName) as TableName, columnlist
from
(
select
'SourceTableName' as TableName, columnlist
from
SourceTableName
union all
select
'DestinationiTableName' as TableName, columnlist
from
DestinationTableName
) AliasName
group by
columnlist
having
count(*) = 1
and max(TableName) = 'SourceTableName'
) SourceDBTable
where
SourceDBTable.PrimaryColumnName1 = DestinationDBTable.PrimaryColumnName1 and
SourceDBTable.PrimaryColumnName2 = DestinationDBTable.PrimaryColumnName2 and
...
SourceDBTable.PrimaryColumnNameN = DestinationDBTable.PrimaryColumnNameN
Column with TimeStamp
datatype should be excluded from the SET
clause of the above update
statement as we cannot explicitly set values for TimeStamp
column.
Column with LOB
datatypes (Text
, nText
& Image
) should be converted to respective Large Value datatypes[Varchar(max)
, nVarchar(max)
& varbinary(max)
] from the queries that are combined by UNION ALL
of the above statement. This is because the UNION ALL
causes the sorting mechanism which prohibits LOB datatypes.
As specified in Steps 1 & 2, we can use identity column rather than primary key column in the WHERE
clause of the update
statement.
Implementation of Method-II
The above core logic is implemented to synchronize one table at a time for a given list of tables. For each destination table, its records will be deleted, inserted and updated by comparing them with the associated source table. This logic works fine as long as there will be no relationship between tables, i.e, no table should be enforced with foreign key constraints. In other words, this logic would not perform DTS for tables with FK references. To overcome this issue, we have simply disabled the FK constraints of all the tables to be synchronized before performing DTS. Then they will be enabled back after DTS. The NOCHECK
/CHECK
clause of the ALTER TABLE
statement is handy to achieve this solution.
The attached zip file ‘Data Synchronization Stored Procedures and Views Part_II.zip’ contains the following view and stored procedures (SP) to implement the DTS:
v_DTS_ColumnInformation
stp_DTS_GetCommaSeperatedColumnString
stp_DTS_GetIdentityOrPrimaryKeyColumnDetails
stp_DTS_SetDestinationColumnWithSourceColumnString
stp_DTS_DataSynchronization
v_DTS_ForeignKey
stp_DTS_EnableDisableForeignKeyConstraint
v_DTS_ColumnInformation
: This view will be used to populate the column details such as Data Type, Primary Key, Null constraint, Identity property, Column size constraint (Length for character datatype, Precision and scale of number datatype).
stp_DTS_GetCommaSeperatedColumnString
: This SP generates various strings comma separated column string for a given table.
stp_DTS_GetIdentityOrPrimaryKeyColumnDetails
: This SP generates various strings for an Identity or Primary key columns of a given table.
stp_DTS_SetDestinationColumnWithSourceColumnString
: This SP generates the SET clause for the update
statement described in the step 3.
stp_DTS_DataSynchronization
: This is the main SP will be used to synchronize the destination tables with the source tables’ data.
v_DTS_ForeignKey
: A view is used to retrieve the foreign key details(both reference and referenced table) of a table.
stp_DTS_EnableDisableForeignKeyConstraint
: A stored procedure is used to switch on/off (i.e., enable/disable) of all foreign key constraints of a given list of tables.
Details of the parameters used in all of the above procedures are described along with the header of each SP.
Advantages of Method-II
- The stored procedure is already compiled and stored within the destination database. Client application code is just needed to call this SP using the connection string.
- Both table Compare & Synchronization will be done at a single query (one per each
delete
, insert
& update
) - Records are processed (deleted, inserted & updated) in bulk manner.
- LOB datatypes are supported.
Method-III: Data Synchronization using Join Mechanism without Enable/Disable Foreign Key Constraints
Method-III will elaborate some of the minor performance issues found in Method-II. Also, it provides a solution to fix them and improve the performance of data synchronization process as well.
Problem# 1. SP Recompilation and its performance bottleneck: During Data synchronization(DTS), records will be inserted/deleted/updated in the specified list of tables (specified in the comma separated table list input parameter) at destination database. This record manipulation is happened in a sequence and if the tables are referenced each other via FK then it will end up with FK error. This is because, before inserting records into child table, their parent table must be inserted with respective records. Similarly, before deleting records from the parent table, their child table must be deleted with respective records. To overcome this ‘Chicken and Egg’ problem, in Method-II DTS version, we have disabled the FK references of all the specified tables before starting synchronization. Then it will be enabled at end of synchronization process.
Right after synchronization, if a Stored Procedure (SP) is executed and if that SP dependence with any of the synchronized table then SQL Server optimizer forces that SP to recompile. This is because the dependent table’s schema has changed as part of enable/disable FK during synchronization. So after synchronization, all SPs’ that are dependent to the synchronized tables will be forced to recompile before they are being executed. This SP recompilation degrades the performance.
Solution: The solution to avoid SP recompilation is eliminating the mechanism of enable/disable foreign keys introduced in Method-II DTS version. Then how do we overcome the FK errors as a result of record manipulation on tables with FK constraints. We must rearrange the given tables in a order in which they will never end-up with ‘Chicken and Egg’ scenario. For example, we have a list of table T7, T2, T1, T4, T5, T3 and T6. They are depending on each other as follows:
Child Table | Parent Table |
T7 | T1 |
T1 | T5 |
T5 | None |
T2 | T1 |
T3 | None |
T6 | None |
T4 | T3 |
We have to construct a hierarchical tree of given list of tables based on their child-parent relationship as follows:
Table | Child To Parent Level |
T7 | 1 |
T2 | 1 |
T6 | 1 |
T4 | 1 |
T1 | 2 |
T3 | 2 |
T5 | 3 |
We can now delete the records from the tables in the ascending order (Child To Parent Level) of above hierarchical tree, while records will be inserted into the tables in the descending order. Update can happen at any order. This will eliminate the possible FK conflict errors during DTS process.
Problem# 2. UNIONALL & GROUPBY mechanism and their performance bottleneck: In the previous Method-II version of DTS, INSERT
/DELETE
record manipulation handled by primary key comparison while UPDATE
handled by UNION ALL
mechanism. If we have large number of records that need to be synchronized and if most of them require UPDATE
s, then the system might require a large memory space to perform UNION ALL
of source/destination records. The subsequent GROUP BY
on all columns worsens the situation further. This degraded the performance of DTS process.
Solution: The solution to replace UNIONALL & GROUPBY mechanism requires a new column TimeStamp in each table (that need to be synchronized) to hold the date & time of record modification. Though the column name is “Time Stamp”, it is a regular “DateTime
” data type column and not the “TimeStamp
” data type. Whenever a record gets modified in the source database, respective record TimeStamp
column has to be updated with current date & time. User has to modify their application code/SP to perform this TimeStamp
updation. During DTS, any target record with less TimeStamp
(i.e., date time) values will be updated with their respective source record values.
However, this enhancement is the optional one and user can still go with UNIONALL
& GROUPBY
mechanism. This is because some users do not want to introduce new “Time Stamp” column in their existing tables to avoid code rework.
Implementation of Method-III
The attached zip file ‘Data Synchronization using Join mechanism without Enable Disable Foreign Key Constraints.zip’ contains the following latest view and stored procedures (SP) to implement the enhanced version of Data Synchronization:
v_DTS_ColumnInformation
– No change stp_DTS_GetCommaSeperatedColumnString
– No change stp_DTS_GetIdentityOrPrimaryKeyColumnDetails
- No change stp_DTS_SetDestinationColumnWithSourceColumnString
- No change stp_DTS_DataSynchronization
– Modified, see the program’s modification history for more details v_DTS_ForeignKey
– No Change <s>stp_DTS_EnableDisableForeignKeyConstraint</s>
– Obsolete fnTableHierarchy
– New stp_DTS_ArrangeTablesInTheirOrderOfRelationships
– New
fnTableHierarchy
: For a given table, this function returns all its dependent tables in their order of “Child To Parent” relationship using recursive CTE (Common Table Expression) mechanism.
stp_DTS_ArrangeTablesInTheirOrderOfRelationships
: This SP returns data set that contains a list of tables arranged in a specified order (Child To Parent Or Parent To Child Or No Order) for a given comma separated list of tables.
Conclusion
In this article, we have seen three different methods to perform Data Synchronization in SQL Express edition database. Among them, Method-I has more disadvantages. Though Method-II can be used to perform DTS, it has its own performance bottlenecks. Method-III has resolved those issues found in Method-II. However, user can choose either Method-II or Method-III based on their requirements.
Assumption
- Both source and destination tables’ schema are identical.
- Both source and destination data sources are different.
- Destination server has a linked server with source server in case both are remotely connected.
- All the SPs and views listed above are stored & compiled in the destination database.
- The main SP
stp_DTS_DataSynchronization
will be executed at destination database. - User calls the main SP
stp_DTS_DataSynchronization
with valid parameters.
Limitation
- Column with
Timestamp
datatype excluded from the data synchronization.
Features of Future Version
- It will be extended to provide Log of the data synchronization process. So that user can know, how many records are deleted/inserted/updated and what are they.
Hint: The OUTPUT
clause can be used to achieve this.
References
- The shortest, fastest, and easiest way to compare two tables in SQL Server: <place w:st="on">
UNION
(http://weblogs.sqlteam.com/jeffs/archive/2004/11/10/2737.aspx) TableDiff
Utility (http://technet.microsoft.com/en-us/library/ms162843.aspx) How to find Tables with Foreign Key Constraints in database (http://blog.sqlauthority.com/2007/09/04/sql-server-2005-find-tables-with-foreign-key-constraint-in-database/)
Acknowledgement
My sincere thanks to all the experts who participated and spent their valuable time discussing the technique of Table Comparison in the Jeff's SQL Server Blog (http://weblogs.sqlteam.com/jeffs/archive/2004/11/10/2737.aspx).
- Jeff: For his
UNION ALL
method to compare tables - Click: For his
NOT EXISTS
method to compare tables - David L. Penton: For his explanation of the issues found in
NOT EXISTS
method - John: For his powerful coding to generate comma separated list with a single
SELECT
- Pinal Dave: For his script to find a foreign key constraints in a database