Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / database / SQL-Server

Challenges in Data Migration Testing

4.67/5 (2 votes)
12 Jan 2016CPOL5 min read 15.4K  
Challenges faced in data migration testing

Introduction

We had this opportunity to work on our very first database migration testing project a few months ago. The objective was to ensure the correctness of data migrated from a source system with MS SQL as its database to a target system with the same database. During the testing, we got this chance to articulate our SQL query writing skills. We want to share our learning and experience in this post.

Background

What is Data Migration?

Data Migration is the process of transferring data between source and target system. It includes data cleaning, validation and data quality. People often use term "Data conversion" as a synonym of "Data Migration", but we should know that conversion is an integral part of Migration.

Challenges Faced During the Testing Cycle

  1. The first and biggest challenge was Data Mapping. There was no one to one mapping document from source database tables to target database tables. We created an initial mapping document by studying the source and target database table structure to start with and got it approved by the concerned team. This was helpful in the next phase of testing.
  2. There were some structure changes between the source and target database tables which increased the testing challenge. For example, the source table columns were directly mapped to the primary keys from master tables with common data types such as varchar or int whereas in the target database, along with common data types, they also had auto generated GUIDs for each row of table.
  3. The next challenge was to compare the raw data from source database against the views in target database. In the source database, most information can be fetched by putting simple joins between 2 or 3 tables, but in target database, this was not the case. Hence, testers were required to get the required data from views rather than raw data.

Common Issues in Migration Testing

  1. Change in datatype.
  2. Migration of values depending on other fields/tables present in the source database.
  3. Data in the source table may be present as a character, but in the target table the mapped column is an integer.
  4. Concatenation of two or more columns.

Using the Code

Approach for Testing Data Integrity

To check the data integrity between source and target, there were different sets of SQL scripts created.

For example, it was required to validate the data integrity between 2 SQL tables, one from source db and another from target db. This was achieved by using the temp tables and full outer join.

-Temp table to get data from target database-

SQL
drop table #mosoblink
select convert(varchar, CheckinDateTime,101) as dt, PartyID, RoleId, count(*) as Ct
into #mosoblink
from dbo.Checkin
where convert(varchar, CheckinDateTime,101) = '10/19/2015'
group by convert(varchar, CheckinDateTime,101), PartyID, RoleID
order by 1

-Temp table to get data from source database-

SQL
drop table #eClubblink
select convert(varchar(10), sdtTime,101) as dt, a.vcMemberID, _
	replace(vcVisibleMemberID,'P','') as MemberID, count(*) as ct
into #eClubblink
from tblCheckins a
inner join tblMembers b on a.vcMemberID = b.vcMemberID
where convert(varchar, sdttime,101) = '10/19/2015'
group by convert(varchar(10), sdtTime,101), a.vcMemberID, replace(vcVisibleMemberID,'P','')

-Full outer join to compare the data rowwise-

SQL
select a.*, b.*
from #mosoblink a full outer join #eClubblink b on a.RoleID = b.MemberID
where coalesce(a.ct,0) <> coalesce(b.ct,0)

Sample Result Set from the Above Script

dt PartyID RoleId Ct dt vcMemberID MemberID ct
NULL NULL NULL NULL 11/7/2015 1030025722 1030025722 1

Difference in the count of rows in source and database was the result set of the above script. Tester could easily know the mismatch and was able to reduce the testing time.

Challenges Faced While Following the Above Approach

The above approach worked in case where we were having only one source system. The above queries picked up the data from source and target tables, dropped them into temp tables and compared them using full outer join on the basis of Member ID (which is unique in every case) which was migrated as Role ID in target table. The real benefit of going with this approach came into the picture when we had to migrate the data from more than one source system to a common system.

As initially we migrated data from a single source and now as we moved ahead in the roll out process, we found a glitch in this approach. We were having 2 source systems in different locations, i.e., in UK and US and each of them having its separate server. The member ID generated in each source system was unique on its own but only within the source system and our application was configured separately for each server and there were no sync jobs to sync the data between these two servers. But our target system was a common system having a single server handling the data for each source system in one.

As we moved ahead and migrated the data into target tables and began testing, we found that about 20-25% of the records were having discrepancies in their check-in counts. Some records were having check-in counts more than the expected and some of them were not having a single check-in record. This all happened because the Member ID which was unique within the system was overlapping with the Member ID of the other source system.

For example: 1020000909, the member id of a record from US source system (which was unique within the US source system) was overlapping with the record of UK source system having the same member ID. As these Member IDs were auto-generated at run-time and due to the difference of servers at different locations and no syncing activity going on between them created these kind of records.

Points of Interest

Solution Provided for the Issue Faced

After having several meetings to discuss the solution for the above issue, the team decided to modify the Role ID and Member ID in the target and source tables by appending the source system id in the Member ID of one of the source system’s records. We picked out the records which were having the IDs overlapped and appended source system ID in them. We used this approach against the UK source system.

For example: The record having member ID 1020000909 in UK was modified to 1020000909-4 using 4 as the source system ID for UK on both sides leading it not to mismatch the record in US source system with the same Member ID, i.e., on Source as well as in target (inside Role ID) table, this solution removed the discrepancy of overlapped record and now the check-in count for each record was matching as every record had its unique role ID.

History

  • 12th January, 2016: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)