In the world of distributed computing, the data is stored in multiple servers and updates are sent to master database in the form of XML or Excel files. If the application offers several places to update the same data, like a special web form along with SharePoint/InfoPath for simple update, the data needs synchronization.
Also, with the SharePoint, users can download data as Excel file and then work in offline mode. Later, they submit the data to the application by uploading the modified Excel file with some new and some edited records.
The application now needs to merge the records by inserting new and updating the modified records. If user wants to delete records, then the missing records in the main table will be marked as deleted, so this will require complete lock on the table else other’s record will be removed.
So we need a way to update such changes to our database.
The solution depends in the version of SQL Server which we are using. If we have 2008 or later, we are in luck, as that gives us the
MERGE
statement which, like its name, merges the changes with perfect performance. However, if we have 2005 or lower; or if its running in lower Compatibility Mode to support legacy servers ( :( :( :( ), we need to explicitly write insert/update statements which are crude and causes bad performance.
Let's illustrate through code.
create table Customer_Request (
ReqID integer not null ,
ReqDescription varchar (100),
ReqStatus varchar(10)
)
create table Customer_Request_Staging (
ReqID integer not null ,
ReqDescription varchar (100),
ReqStatus varchar(10)
)
insert into Customer_Request values(1, 'Laptop 4589', 'Pending')
insert into Customer_Request values(2, 'Laptop 4543', 'Pending')
insert into Customer_Request_Staging values(1, 'Laptop 4589', 'Completed')
insert into Customer_Request_Staging values(3, 'Laptop 4549', 'Pending')
This is how the datasets look now
Main data
| Staging data
|
So we have the initial data setup. Now we create a CTE to go over the staging data and collect
relevant records and then do the merging.
WITH stagingData (ID, [Description], [Status])
AS
(
select ReqId, ReqDescription, ReqStatus from Customer_Request_Staging
)
MERGE Customer_Request m
USING stagingData s
ON s.ID = m.ReqId
WHEN MATCHED then
UPDATE
SET ReqDescription = s.Description,
ReqStatus = s.Status
WHEN NOT MATCHED THEN
INSERT
(ReqId, ReqDescription, ReqStatus)
VALUES(s.ID, s.Description, s.Status)
;
This is how the final data look. It has the 1st record updated with the 'completed' status and a new record inserted from the staging table.
However what will happen if we don't have the richness of MERGE method. The real power of this gets illustrated when we go in the past to figure out
the alternate explicit manner of insert/update.
SELECT ReqId, ReqDescription, ReqStatus INTO #tmpRequests_insert
FROM Customer_Request_Staging
WHERE ReqID NOT IN (
SELECT DISTINCT(ReqID) FROM Customer_Request (NOLOCK) )
INSERT INTO Customer_Request
(ReqID, ReqDescription, ReqStatus)
SELECT ReqID, ReqDescription, ReqStatus FROM #tmpRequests_insert
SELECT ReqId, ReqDescription, ReqStatus INTO #tmpRequests_update
FROM Customer_Request_Staging
WHERE ReqID IN (
SELECT DISTINCT(ReqID) FROM Customer_Request (NOLOCK) )
UPDATE Customer_Request
SET ReqDescription = tmp.ReqDescription,
ReqStatus = tmp.ReqStatus
FROM Customer_Request main INNER JOIN #tmpRequests_update tmp
ON main.ReqID = tmp.ReqID
Note the insert and update temp tables only differ by "NOT IN" and "IN" for the inner select. While the insert statement does a direct pick from temp table, the update statement does a inner join to get
the actual records to be updated.
More links ...