Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / DevOps / testing

Large Size Data Feed File Testing (.txt/.csv) - Tools

4.75/5 (3 votes)
15 Dec 2015CPOL5 min read 16.2K  
Large size data feed files testing tools

Introduction

This article provides the ways to test large size data feed files using tools.

Feed Types

There were two types of feeds.

DELTA (incremental data)

  • A delta feed contains all of the changed items/data from previous load.

FULL LOAD

  • A full load feed contains all the data.

Staging Tables/Intermediate Tables

Staging tables are intermediate tables used to store the data before generating actual feed files and later this data can be used for validation and verification purpose.

Tools

  1. MS Excel - Get External Data from Text / Import Data from a text file tool
  2. MS Access - Get External Data from Text / Import Data from a text file tool
  3. MS-SQL Server
  4. Microsoft DOS findstr command

Background

Data Feed file testing involves manual process. But we can use some tool to improve our testing and increase performance.

Normally, most of the QA testers follow Microsoft Excel and import the feed files (.txt/.csv format) data using Get External Data from Text / Import Data from a text file tool. But Excel has its own limitations that each sheet can have a maximum of 10 million records. Data needs to be imported into multiple Excel sheets when the feed file has more than 10 million records. In this case, validation and verification of the imported data will be tedious job for Software Developers/QA Testers.

Excel - Get External Data from Text / Import Data from a Text File Tool

Many testes are using this tool for data validation and verification:

  • This is a very good tool for small/medium size.
  • This tool has its advantages and some limitations.
  • Once data is loaded into sheets, we can utilize filter/sorting options for verification & validating the data.

Image 1

Data Validation & Verification

Once data is imported, filters can be applied to view required records OR know whether records exist for the selected criteria. Refer to the below screens for filtering the data.

Image 2

Provide record selection criteria using Number Filters:

Image 3

Provide record selection criteria using Text Filters.

Image 4

Limitations
  • Memory issues may occur as large amount of data needs to be loaded into temp/cache before displaying into sheet.
  • Maximum of 1 million records can be imported in a sheet- 10, 48,576 records approximately (including header)
  • ”Start Import at Row” option can be used in tool to import next set of data into another sheet, but too much manual process involved.

Access - Get External Data from Text / Import Data from a Text File Tool

  • Select Blank Database / Blank Web Database template
  • Select External Data from main menu and Text File – Import data from Text file
  • Specify the source of the data

Image 5

Image 6

  • Select format type, Delimiter, First Row contains Fields names

Image 7

Image 8

  • Select required field options (including Primary Key Options)
Data Validation & Verification

Apply filters to validate the data for selected criteria and get results, refer to the below screens to filter imported data.

Select the imported table and click open to view the data.

Image 9

Apply Number Filter to filter specific department data. Similarly, we can apply Text filters to view the data for the selected criteria.

Image 10

Filtering the data for DEPT ID 10.

Image 11

Image 12

Limitations
  • Need to install licenced version of Access
  • Memory issues may occur as data needs to be loaded into temp/cache before loading into access database

SQL Server Tools

BULK INSERT (Transact-SQL)
  • Imports a data file into a database table or view in a user-specified format in SQL Server

Example: BULK INSERT tmpStList FROM 'c:\TxtFile2.txt' WITH (FIELDTERMINATOR = ',')

OPENROWSET
Limitations
  • User should have permissions to use these tools and create table in SQL server.
  • Need to write queries against the table which was created as part of this process for data verification and validation
Data Import Tool

Image 13

Image 14

Image 15

Data Validation & Verification

Once data imported to tables, use SQL queries to validate the data based on the required criteria.

Example: SELECT * FROM Sample WHERE DEPT ID=130

Microsoft DOS findstr Command

Searches for patterns of text in files using regular expressions.

Findstr is capable of finding the exact text you are looking for in any ASCII file or files. However, sometimes you have only part of the information that you want to match, or you want to find a wider range of information. In such cases, findstr has the powerful capability to search for patterns of text using regular expressions.

The findstr command is short for find string and is a command used in MS-DOS to locate files containing a specific string of plain text.

Command parameters (options) can be utilized to get the required output at a faster rate.

Availability

The findstr.exe command is an external command that is available in the below Microsoft operating systems.

Advantages

  • No need to load the feed file data into another tool for verifications/validation.
  • No permission issues
  • Simple and easy to use
  • Faster than any other tools which we discussed earlier
  • No software licence issues

Data Validation & Verification

Sample.txt

Image 16

#1 : Record count validation.

  • Findstr /r /n “^” sample.txt

Iterates through all the lines (records) and creates a unique sequences number/Row number for each row. Row number includes header row as well. Please use total row count -1 to know the actual record count.

In the below example, the total rows show 9 (including header row), but the actual record count is 8.

Image 17

#2 : To search for a specific string

  • Findstr /c:” 130” sample.txt

This command helps to get all the rows from sample.txt file where the string 130 is found.

Image 18

#3 : Export search results to a text file / csv – automated process. We can create a batch file for this task.

  • Findstr /c:”130” sample.txt > results.txt

This command helps us to get all the rows into results.txt from sample.txt file where the string 130 is found.

Image 19

Results.txt

Image 20

Use the below command to search multiple strings in a given file:

findstr "110" sample.txt | findstr "HILL STREET"

This command helps us to get all the records/rows from sample.txt file where the string 130 OR HILL STREET found.

Image 21

Image 22

#4 : We can also write a batch file to automate the process.

References

Points of Interest

This DOS findstr command is very fast for data verification and validation. You will definitely find fruitful results using this command.

History

  • December 14, 2015: Posted and updated

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)