Merge Data in SQL Server

Nitin Singh India

4.91/5 (7 votes)

20 Aug 2013CPOL2 min read

13.6K

Merge data in SQL Server.

In the world of distributed computing, the data is stored in multiple servers and updates are sent to master database in the form of XML or Excel files. If the application offers several places to update the same data, like a special web form along with SharePoint/InfoPath for simple update, the data needs synchronization.

Also, with the SharePoint, users can download data as Excel file and then work in offline mode. Later, they submit the data to the application by uploading the modified Excel file with some new and some edited records.

The application now needs to merge the records by inserting new and updating the modified records. If user wants to delete records, then the missing records in the main table will be marked as deleted, so this will require complete lock on the table else other’s record will be removed.

So we need a way to update such changes to our database.

The solution depends in the version of SQL Server which we are using. If we have 2008 or later, we are in luck, as that gives us the MERGE statement which, like its name, merges the changes with perfect performance. However, if we have 2005 or lower; or if its running in lower Compatibility Mode to support legacy servers ( :( :( :( ), we need to explicitly write insert/update statements which are crude and causes bad performance.

Let's illustrate through code.

SQL

-- This table holds the master data
create table Customer_Request  (
 ReqID  integer not null ,
 ReqDescription  varchar (100),
 ReqStatus  varchar(10)
 )
 
--- This table holds intermediate data, loaded through Excel / BCP
create table Customer_Request_Staging  (
 ReqID  integer not null ,
 ReqDescription  varchar (100),
 ReqStatus  varchar(10)
 )
 
-- Initial data of main table
insert into Customer_Request values(1, 'Laptop 4589', 'Pending')
insert into Customer_Request values(2, 'Laptop 4543', 'Pending')

-- create data in staging, having an update and another new record
insert into Customer_Request_Staging values(1, 'Laptop 4589', 'Completed')
insert into Customer_Request_Staging values(3, 'Laptop 4549', 'Pending')

This is how the datasets look now

Main data

Staging data

So we have the initial data setup. Now we create a CTE to go over the staging data and collect relevant records and then do the merging.

SQL

-- Select staging data in a CTE, optionally including special criteria or functions 
WITH stagingData (ID, [Description], [Status])
AS
(
--- read records from the temp table populated from excel
    select ReqId, ReqDescription, ReqStatus from Customer_Request_Staging  
)

MERGE Customer_Request m
USING stagingData s
ON s.ID = m.ReqId  --- match condition: on id of table
WHEN MATCHED then 
--- fields to update. If we wanted to send a list of items to be deleted, 
----  this will be replaced by a delete statement
 UPDATE    
  SET ReqDescription = s.Description, 
  ReqStatus = s.Status
--- we got some new records, so insert them in table
WHEN NOT MATCHED THEN 
 INSERT 
 (ReqId, ReqDescription,  ReqStatus)
 VALUES(s.ID, s.Description, s.Status)
 ;

This is how the final data look. It has the 1st record updated with the 'completed' status and a new record inserted from the staging table.

However what will happen if we don't have the richness of MERGE method. The real power of this gets illustrated when we go in the past to figure out the alternate explicit manner of insert/update.

SQL

-- Insert functionality
 SELECT ReqId, ReqDescription, ReqStatus INTO #tmpRequests_insert
 FROM Customer_Request_Staging
 WHERE ReqID NOT IN (
 SELECT DISTINCT(ReqID) FROM Customer_Request (NOLOCK) )

INSERT INTO Customer_Request 
 (ReqID, ReqDescription, ReqStatus)
 SELECT ReqID, ReqDescription, ReqStatus FROM #tmpRequests_insert
 

-- Update functionality
 SELECT ReqId, ReqDescription, ReqStatus INTO #tmpRequests_update
 FROM Customer_Request_Staging
 WHERE ReqID IN (
 SELECT DISTINCT(ReqID) FROM Customer_Request (NOLOCK) )
 
UPDATE Customer_Request
 SET ReqDescription = tmp.ReqDescription,
  ReqStatus = tmp.ReqStatus
 FROM Customer_Request main INNER JOIN #tmpRequests_update tmp 
  ON main.ReqID = tmp.ReqID

Note the insert and update temp tables only differ by "NOT IN" and "IN" for the inner select. While the insert statement does a direct pick from temp table, the update statement does a inner join to get the actual records to be updated.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)