Introduction
This article provides the ways to test large size data feed files using tools.
Feed Types
There were two types of feeds.
DELTA (incremental data)
FULL LOAD
Staging Tables/Intermediate Tables
Staging tables are intermediate tables used to store the data before generating actual feed files and later this data can be used for validation and verification purpose.
Tools
- MS Excel - Get External Data from Text / Import Data from a text file tool
- MS Access - Get External Data from Text / Import Data from a text file tool
- MS-SQL Server
- Microsoft DOS findstr command
Background
Data Feed file testing involves manual process. But we can use some tool to improve our testing and increase performance.
Normally, most of the QA testers follow Microsoft Excel and import the feed files (.txt/.csv format) data using Get External Data from Text / Import Data from a text file tool. But Excel has its own limitations that each sheet can have a maximum of 10 million records. Data needs to be imported into multiple Excel sheets when the feed file has more than 10 million records. In this case, validation and verification of the imported data will be tedious job for Software Developers/QA Testers.
Excel - Get External Data from Text / Import Data from a Text File Tool
Many testes are using this tool for data validation and verification:
- This is a very good tool for small/medium size.
- This tool has its advantages and some limitations.
- Once data is loaded into sheets, we can utilize filter/sorting options for verification & validating the data.
Data Validation & Verification
Once data is imported, filters can be applied to view required records OR know whether records exist for the selected criteria. Refer to the below screens for filtering the data.
Provide record selection criteria using Number Filters:
Provide record selection criteria using Text Filters.
Limitations
- Memory issues may occur as large amount of data needs to be loaded into temp/cache before displaying into sheet.
- Maximum of 1 million records can be imported in a sheet- 10, 48,576 records approximately (including header)
- ”Start Import at Row” option can be used in tool to import next set of data into another sheet, but too much manual process involved.
Access - Get External Data from Text / Import Data from a Text File Tool
- Select Blank Database / Blank Web Database template
- Select External Data from main menu and Text File – Import data from Text file
- Specify the source of the data
- Select format type, Delimiter, First Row contains Fields names
- Select required field options (including Primary Key Options)
Data Validation & Verification
Apply filters to validate the data for selected criteria and get results, refer to the below screens to filter imported data.
Select the imported table and click open to view the data.
Apply Number Filter to filter specific department data. Similarly, we can apply Text filters to view the data for the selected criteria.
Filtering the data for DEPT ID 10
.
Limitations
- Need to install licenced version of Access
- Memory issues may occur as data needs to be loaded into temp/cache before loading into access database
SQL Server Tools
BULK INSERT (Transact-SQL)
- Imports a data file into a database table or view in a user-specified format in SQL Server
Example: BULK INSERT tmpStList FROM 'c:\TxtFile2.txt' WITH (FIELDTERMINATOR = ',')
OPENROWSET
Limitations
- User should have permissions to use these tools and create table in SQL server.
- Need to write queries against the table which was created as part of this process for data verification and validation
Data Import Tool
Data Validation & Verification
Once data imported to tables, use SQL queries to validate the data based on the required criteria.
Example: SELECT * FROM Sample WHERE DEPT ID=130
Microsoft DOS findstr Command
Searches for patterns of text in files using regular expressions.
Findstr
is capable of finding the exact text you are looking for in any ASCII file or files. However, sometimes you have only part of the information that you want to match, or you want to find a wider range of information. In such cases, findstr
has the powerful capability to search for patterns of text using regular expressions.
The findstr
command is short for find string and is a command used in MS-DOS to locate files containing a specific string of plain text.
Command parameters (options) can be utilized to get the required output at a faster rate.
Availability
The findstr.exe command is an external command that is available in the below Microsoft operating systems.
Advantages
- No need to load the feed file data into another tool for verifications/validation.
- No permission issues
- Simple and easy to use
- Faster than any other tools which we discussed earlier
- No software licence issues
Data Validation & Verification
Sample.txt
#1 : Record count validation.
Findstr /r /n “^” sample.txt
Iterates through all the lines (records) and creates a unique sequences number/Row number for each row. Row number includes header row as well. Please use total row count -1 to know the actual record count.
In the below example, the total rows show 9 (including header row), but the actual record count is 8.
#2 : To search for a specific string
Findstr /c:” 130” sample.txt
This command helps to get all the rows from sample.txt file where the string 130
is found.
#3 : Export search results to a text file / csv – automated process. We can create a batch file for this task.
Findstr /c:”130” sample.txt > results.txt
This command helps us to get all the rows into results.txt from sample.txt file where the string 130 is found.
Results.txt
Use the below command to search multiple strings in a given file:
findstr "110" sample.txt | findstr "HILL STREET"
This command helps us to get all the records/rows from sample.txt file where the string 130 OR HILL STREET
found.
#4 : We can also write a batch file to automate the process.
References
Points of Interest
This DOS findstr
command is very fast for data verification and validation. You will definitely find fruitful results using this command.
History
- December 14, 2015: Posted and updated