Data Synchronization in SQL Express

Ganesan Krishnan

4.44/5 (23 votes)

14 Oct 2010CPOL12 min read

9.8K

Data Synchronization in SQL Express

Introduction

SQL Server Express Edition is a free version DBMS which does not have Data Synchronization (DTS) feature. These days, most of the low-budget Window application projects are developed with SQL Express edition where this limitation obstructs the user to move further when their application requires DTS. This article provides various methods to resolve this problem.

Method-I: Data Synchronization using TableDiff Utility

We can use TableDiff utility to generate a Transact-SQL script (containing delete/insert/update statements) to fix discrepancies at the destination server to bring the source and destination tables into convergence. Since this utility compares one table at a time, we need to call it in a loop in case we have N number of tables to synchronize. Within the loop, we have to accumulate (i.e., append) the generated Transact-SQL script into a local file (say CompleteFixSQL.sql). At end of the loop, we have a complete script file that needs to be executed at destination server. sqlcmd utility can be used to run that script file (CompleteFixSQL.sql) into the destination server to bring the source and destination tables into convergence.

Implementation of Method-I

Download Data Synchronization using TableDiff utility - 32.19 KB

We have implemented this method using the following DOS batch files:

GenerateFixSQLScriptMain.bat: This main DOS shell program will loop through a list of tables (to be synchronized) and for each table it will call GenerateFixSQLScriptSub.bat to generate the FixSQL script and append them as a single CompleteFixSQL.sql.

GenerateFixSQLScriptSub.bat: This DOS shell program will compare the source and destination table and generate the FIXSQL script file using TableDiff utility.

Offline_Synchronization.bat: This DOS shell program will run the CompleteFixSQL.sql (generated by the above GenerateFixSQLScriptMain.bat) script using SQLCMD command line utility.

Call_DTS.bat: This DOS shell program will call GenerateFixSQLScriptMain.bat & Offline_Synchronization.bat to generate the script & perform data synchronization respectively.

Extract the above zip file in a folder. Modify the Call_DTS.bat batch file based on your settings (i.e., change the input parameters such as server name, database name, user name, password, comma separated list of tables to be synchronized). Then execute it in DOS prompt.

The program flow of these DOS batch files are not explained in this article as our main intention is to explain other methods (Method II & III).

Drawbacks of Method-I

Both TableDiff & sqlcmd utilities are external applications that need to be called from client application code. TableDiff needs to be called N times if we have N number of tables to be synchronized which incurs I/O overhead.
Sqlcmd utility executes statements that are contained in the CompleteFixSQL script one by one in a sequence manner which is a time consuming process in case we have large data to be synchronized at destination.
TableDiff utility has some limitations. It would not generate FIX script file for LOB datatypes such as text, ntext & image.

Method-II: Data Synchronization using Join Mechanism

A Stored Procedure (SP) with SQL join mechanism can be used to compare the difference between source and destination tables and then synchronize the destination tables with source table data. To bring the source and destination tables into convergence,

Find the records that need to be deleted from Destination database table
Find the records that need to be inserted into Destination database table
Find the records that need to be updated in the Destination database table

Subsequently, we have to execute delete, insert and update statements in the destination database for the records that are found in the above steps 1, 2 & 3 respectively.

Step 1: Records to be Deleted from Destination Database Table

Select the records that do not exist in Source database table, but exist in the Destination database table
Then delete them from the Destination database table

SQL

delete from
    DestinationDB.dbo.TableName DestinationDBTable
where
    not exists
    (select
        1
     from
        SourceDB.dbo.TableName SourceDBTable
     where
        SourceDBTable.PrimaryColumnName1  = DestinationDBTable.PrimaryColumnName1 and
        SourceDBTable.PrimaryColumnName2  = DestinationDBTable.PrimaryColumnName2 and
        ...
        SourceDBTable.PrimaryColumnNameN  = DestinationDBTable.PrimaryColumnNameN
    )

If the table contains an Identity column, then we can simply use that column rather than primary key column in the join condition of the WHERE clause. This will reduce the size of the join condition especially when the table having composite primary keys and an Identity column. Also it will improve the performance of the delete statement.

SQL

delete from
    DestinationDB.dbo.TableName DestinationDBTable
where
    not exists
    (select
        1
     from
        SourceDB.dbo.TableName SourceDBTable
     where
        SourceDBTable.IdentityColumn  = DestinationDBTable.IdentityColumn
    )

Step 2: Records to be Inserted into Destination Database Table

Select the records that exist in the Source database table, but do not exist in the Destination database table.
Then insert them into the Destination database table.

SQL

insert into
    DestinationDB.dbo.TableName DestinationDBTable
    (ColumnList)
select
    SourceDBTable.Columnlist
from
    SourceDB.dbo.TableName SourceDBTable
where
    not exists
    (select
        1
     from
        DestinationDB.dbo.TableName DestinationDBTable
     where
        DestinationDBTable.PrimaryColumnName1 = SourceDBTable.PrimaryColumnName1 and
        DestinationDBTable.PrimaryColumnName2 = SourceDBTable.PrimaryColumnName2 and
        ...
        DestinationDBTable.PrimaryColumnNameN = SourceDBTable.PrimaryColumnNameN
    )

Column with TimeStamp datatype should be excluded from the Column list of the above insert statement as we cannot explicitly set values for TimeStamp column.

As specified in the Step 1, we can use identity column rather than primary key column as follows:

SQL

insert into
    DestinationDB.dbo.TableName DestinationDBTable
    (ColumnList)
select
    SourceDBTable.Columnlist
from
    SourceDB.dbo.TableName SourceDBTable
where
    not exists
    (select
        1
     from
        DestinationDB.dbo.TableName DestinationDBTable
     where
        DestinationDBTable.IdentityColumn = SourceDBTable. IdentityColumn
    )

If the table having identity column then the above insert statement must be enclosed by the “set identity_insert on/off” as follows:

SQL

set identity_insert TableName On
... above insert statement 
set identity_insert TableName off

Step 3: Records to be Updated in the Destination Database Table

Select the records that are differ from Source & Destination database table.
Then update them in the Destination database table with the source database table data.

SQL

update
    TableName
set
    ColumnName1 = SourceDBTable.ColumnName1,
    ColumnName2 = SourceDBTable.ColumnName2,
    ...
    ColumnNameN = SourceDBTable.ColumnNameN
from
    DestinationDB.dbo.TableName DestinationDBTable,
    (
    select
        max(TableName) as TableName, columnlist
    from
        (
        select
            'SourceTableName' as TableName, columnlist
        from
            SourceTableName
        union all
        select
            'DestinationiTableName' as TableName, columnlist
        from
            DestinationTableName
        ) AliasName
    group by
        columnlist
    having
        count(*) = 1
        and max(TableName) = 'SourceTableName'
    ) SourceDBTable
where
    SourceDBTable.PrimaryColumnName1  = DestinationDBTable.PrimaryColumnName1 and
    SourceDBTable.PrimaryColumnName2  = DestinationDBTable.PrimaryColumnName2 and
    ...
    SourceDBTable.PrimaryColumnNameN  = DestinationDBTable.PrimaryColumnNameN

Column with TimeStamp datatype should be excluded from the SET clause of the above update statement as we cannot explicitly set values for TimeStamp column.

Column with LOB datatypes (Text, nText & Image) should be converted to respective Large Value datatypes[Varchar(max), nVarchar(max) & varbinary(max)] from the queries that are combined by UNION ALL of the above statement. This is because the UNION ALL causes the sorting mechanism which prohibits LOB datatypes.

As specified in Steps 1 & 2, we can use identity column rather than primary key column in the WHERE clause of the update statement.

Implementation of Method-II

Download Data Synchronization Stored_Procedures and views - Part II - 15.98 KB

The above core logic is implemented to synchronize one table at a time for a given list of tables. For each destination table, its records will be deleted, inserted and updated by comparing them with the associated source table. This logic works fine as long as there will be no relationship between tables, i.e, no table should be enforced with foreign key constraints. In other words, this logic would not perform DTS for tables with FK references. To overcome this issue, we have simply disabled the FK constraints of all the tables to be synchronized before performing DTS. Then they will be enabled back after DTS. The NOCHECK/CHECK clause of the ALTER TABLE statement is handy to achieve this solution.

The attached zip file ‘Data Synchronization Stored Procedures and Views Part_II.zip’ contains the following view and stored procedures (SP) to implement the DTS:

v_DTS_ColumnInformation
stp_DTS_GetCommaSeperatedColumnString
stp_DTS_GetIdentityOrPrimaryKeyColumnDetails
stp_DTS_SetDestinationColumnWithSourceColumnString
stp_DTS_DataSynchronization
v_DTS_ForeignKey
stp_DTS_EnableDisableForeignKeyConstraint

v_DTS_ColumnInformation: This view will be used to populate the column details such as Data Type, Primary Key, Null constraint, Identity property, Column size constraint (Length for character datatype, Precision and scale of number datatype).

stp_DTS_GetCommaSeperatedColumnString: This SP generates various strings comma separated column string for a given table.

stp_DTS_GetIdentityOrPrimaryKeyColumnDetails: This SP generates various strings for an Identity or Primary key columns of a given table.

stp_DTS_SetDestinationColumnWithSourceColumnString: This SP generates the SET clause for the update statement described in the step 3.

stp_DTS_DataSynchronization: This is the main SP will be used to synchronize the destination tables with the source tables’ data.

v_DTS_ForeignKey: A view is used to retrieve the foreign key details(both reference and referenced table) of a table.

stp_DTS_EnableDisableForeignKeyConstraint: A stored procedure is used to switch on/off (i.e., enable/disable) of all foreign key constraints of a given list of tables.

Details of the parameters used in all of the above procedures are described along with the header of each SP.

Advantages of Method-II

The stored procedure is already compiled and stored within the destination database. Client application code is just needed to call this SP using the connection string.
Both table Compare & Synchronization will be done at a single query (one per each delete, insert & update)
Records are processed (deleted, inserted & updated) in bulk manner.
LOB datatypes are supported.

Method-III: Data Synchronization using Join Mechanism without Enable/Disable Foreign Key Constraints

Method-III will elaborate some of the minor performance issues found in Method-II. Also, it provides a solution to fix them and improve the performance of data synchronization process as well.

Problem# 1. SP Recompilation and its performance bottleneck: During Data synchronization(DTS), records will be inserted/deleted/updated in the specified list of tables (specified in the comma separated table list input parameter) at destination database. This record manipulation is happened in a sequence and if the tables are referenced each other via FK then it will end up with FK error. This is because, before inserting records into child table, their parent table must be inserted with respective records. Similarly, before deleting records from the parent table, their child table must be deleted with respective records. To overcome this ‘Chicken and Egg’ problem, in Method-II DTS version, we have disabled the FK references of all the specified tables before starting synchronization. Then it will be enabled at end of synchronization process.

Right after synchronization, if a Stored Procedure (SP) is executed and if that SP dependence with any of the synchronized table then SQL Server optimizer forces that SP to recompile. This is because the dependent table’s schema has changed as part of enable/disable FK during synchronization. So after synchronization, all SPs’ that are dependent to the synchronized tables will be forced to recompile before they are being executed. This SP recompilation degrades the performance.

Solution: The solution to avoid SP recompilation is eliminating the mechanism of enable/disable foreign keys introduced in Method-II DTS version. Then how do we overcome the FK errors as a result of record manipulation on tables with FK constraints. We must rearrange the given tables in a order in which they will never end-up with ‘Chicken and Egg’ scenario. For example, we have a list of table T7, T2, T1, T4, T5, T3 and T6. They are depending on each other as follows:

Child Table	Parent Table
T7	T1
T1	T5
T5	None
T2	T1
T3	None
T6	None
T4	T3

We have to construct a hierarchical tree of given list of tables based on their child-parent relationship as follows:

Table	Child To Parent Level
T7	1
T2	1
T6	1
T4	1
T1	2
T3	2
T5	3

We can now delete the records from the tables in the ascending order (Child To Parent Level) of above hierarchical tree, while records will be inserted into the tables in the descending order. Update can happen at any order. This will eliminate the possible FK conflict errors during DTS process.

Problem# 2. UNIONALL & GROUPBY mechanism and their performance bottleneck: In the previous Method-II version of DTS, INSERT/DELETE record manipulation handled by primary key comparison while UPDATE handled by UNION ALL mechanism. If we have large number of records that need to be synchronized and if most of them require UPDATEs, then the system might require a large memory space to perform UNION ALL of source/destination records. The subsequent GROUP BY on all columns worsens the situation further. This degraded the performance of DTS process.

Solution: The solution to replace UNIONALL & GROUPBY mechanism requires a new column TimeStamp in each table (that need to be synchronized) to hold the date & time of record modification. Though the column name is “Time Stamp”, it is a regular “DateTime” data type column and not the “TimeStamp” data type. Whenever a record gets modified in the source database, respective record TimeStamp column has to be updated with current date & time. User has to modify their application code/SP to perform this TimeStamp updation. During DTS, any target record with less TimeStamp (i.e., date time) values will be updated with their respective source record values.

However, this enhancement is the optional one and user can still go with UNIONALL & GROUPBY mechanism. This is because some users do not want to introduce new “Time Stamp” column in their existing tables to avoid code rework.

Implementation of Method-III

The attached zip file ‘Data Synchronization using Join mechanism without Enable Disable Foreign Key Constraints.zip’ contains the following latest view and stored procedures (SP) to implement the enhanced version of Data Synchronization:

v_DTS_ColumnInformation – No change
stp_DTS_GetCommaSeperatedColumnString – No change
stp_DTS_GetIdentityOrPrimaryKeyColumnDetails - No change
stp_DTS_SetDestinationColumnWithSourceColumnString - No change
stp_DTS_DataSynchronization – Modified, see the program’s modification history for more details
v_DTS_ForeignKey – No Change
~~<s>stp_DTS_EnableDisableForeignKeyConstraint</s>~~ – Obsolete
fnTableHierarchy – New
stp_DTS_ArrangeTablesInTheirOrderOfRelationships – New

fnTableHierarchy: For a given table, this function returns all its dependent tables in their order of “Child To Parent” relationship using recursive CTE (Common Table Expression) mechanism.

stp_DTS_ArrangeTablesInTheirOrderOfRelationships: This SP returns data set that contains a list of tables arranged in a specified order (Child To Parent Or Parent To Child Or No Order) for a given comma separated list of tables.

Conclusion

In this article, we have seen three different methods to perform Data Synchronization in SQL Express edition database. Among them, Method-I has more disadvantages. Though Method-II can be used to perform DTS, it has its own performance bottlenecks. Method-III has resolved those issues found in Method-II. However, user can choose either Method-II or Method-III based on their requirements.

Assumption

Both source and destination tables’ schema are identical.
Both source and destination data sources are different.
Destination server has a linked server with source server in case both are remotely connected.
All the SPs and views listed above are stored & compiled in the destination database.
The main SP stp_DTS_DataSynchronization will be executed at destination database.
User calls the main SP stp_DTS_DataSynchronization with valid parameters.

Limitation

Column with Timestamp datatype excluded from the data synchronization.

Features of Future Version

It will be extended to provide Log of the data synchronization process. So that user can know, how many records are deleted/inserted/updated and what are they.
Hint: The OUTPUT clause can be used to achieve this.

References

The shortest, fastest, and easiest way to compare two tables in SQL Server: <place w:st="on">UNION (http://weblogs.sqlteam.com/jeffs/archive/2004/11/10/2737.aspx)
TableDiff Utility (http://technet.microsoft.com/en-us/library/ms162843.aspx)
How to find Tables with Foreign Key Constraints in database (http://blog.sqlauthority.com/2007/09/04/sql-server-2005-find-tables-with-foreign-key-constraint-in-database/)

Acknowledgement

My sincere thanks to all the experts who participated and spent their valuable time discussing the technique of Table Comparison in the Jeff's SQL Server Blog (http://weblogs.sqlteam.com/jeffs/archive/2004/11/10/2737.aspx).

Jeff: For his UNION ALL method to compare tables
Click: For his NOT EXISTS method to compare tables
David L. Penton: For his explanation of the issues found in NOT EXISTS method
John: For his powerful coding to generate comma separated list with a single SELECT
Pinal Dave: For his script to find a foreign key constraints in a database

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)