Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / database / SQL-Server / SQL-Server-2016

Building a Basic Test Data Generator for Books in SQL

4.77/5 (10 votes)
13 Nov 2017CPOL10 min read 21.7K   426  
This article helps beginners to generate over million records of random test data for a database object like books in SQL.

Introduction

In this article, I am going to walk you through the steps of creating test data for a database object such as books in SQL.

The idea behind this article is to develop a basic understanding of generating test data for an object of interest for development testing or solving a problem.

Another objective of this article is to provide a high-level (basic) view of how test data generators work.

Background

If you have just begun your SQL journey and are keen to get a starter, then this article is a good match for your requirements.

This article can help you to get your hands on a very basic test data generator open for further exploration assuming you have basic know how of SQL.

We often come accross situations where we need to generate test data. There are many ways to generate test data including using 3rd party tools and pre-written test script.

We are going to explore the article further just like a real world scenario in which you are given a problem and you have to come up with a solution at your earliest convenience.

Problem

A Bookstore Solution saves information about books in a database object (table) called Book. The staff member (end user of the system) adds information about books belonging to different categories in the table Book.

The table also contains price and stock count of the books along with an Id for unique identification of books. The title of the book is also kept in the same table as shown below:

Image 1

The above table is now object under test and the test team requires it to be filled up to some extent as many times as they like so that they perform their tests on the table.

Using the system to populate the table is a time consuming process as it requires the tester to do the following:

  1. Log into the Sytem
  2. Select "Add New Book" option
  3. Fill the form with relavant fields
  4. Save the form
  5. View the form

Image 2

Since this is not a UI test (where testers test the graphical interface as well and in that case it has to be done either through the above mentioned way or by an automated way) so it does not matter how the table is populated.

They also want books titles to be tied to their categories while price and stock can have any random value.

Solution

Now the ball's in your court assuming you have just started your SQL career (but have strong SQL background). Let us very carefully pick the first step towards the solution.

Preliminary Analysis

The requirements are a bit vague so it is worth doing preliminary analyses.

After doing some preliminary analyses followed by information gathering, the requirements are refined now as follows:

  1. (one thing is for sure) A SQL stored procedure is required to populate Book table with the following fields:
    1. BookId (auto generated)
    2. Title
    3. Category
    4. Price
    5. Stock
  2. The Title of the book(s) should be tied to the Category, that means if we have a book called "SQL Adventures", it should fall under the category "Development" rather than "Adventure" and so on.

Image 3

Now that it is understood what exactly is required, why not jump straight to the coding part then, as I don't want to distract the beginners by following the whole set of standard processes rather, I would like to focus on coding logic and its implementation to get to the solution.

Development (Code and Logic)

Creating Data Structure (Book Table)

The first thing we need to do in order to meet the requirements is to create the desired table in SQL as follows:

SQL
-- Create book table
CREATE TABLE Book
(BookId INT PRIMARY KEY IDENTITY(1,1),
Title VARCHAR(100),
Category VARCHAR(30),
Price DECIMAL(10,2),
Stock INT);
GO

Now test run the table which will have no data as follows:

SQL
-- Check the Book table structure
SELECT BookId, Title, Category, Price, Stock FROM Book

Image 4

Populating Data Structure (Book Table) with Test Data

The next step is to populate the table Book with test data which should be random with the exception of titles to be tied to categories.

In order to populate the table, we need to divide the problem into two parts:

  1. Columns Price and Stock that must contain numbers (INT and DECIMAL)
  2. Columns Title and Category that must contain strings (VARCHAR)

Generating Test Data for Stock

Now generating random Stock values in the range from 1 to 10 can be achieved by using RAND() function which generates random numbers as follows:

SQL
-- Generating Test Data for Stock in the range 1-10
SELECT CONVERT(INT,RAND()*10+1) as Stock

Generating Test Data for Price

Now we need to create Price test data with a wider range between 1 and 50 which can be achieved by altering the above line of code replacing 10 with 50 as follows:

SQL
-- Generating Test Data for Price in the range 1-50
SELECT CONVERT(INT,RAND()*50+1) as Price

Combining Test Data Scripts for both Stock and Price

Combining both scripts as follows:

SQL
-- Generating Test Data for Stock from 1-10 and Price from 1-50

SELECT (SELECT CONVERT(INT,RAND()*10+1)) as [Stock 1-10],
(SELECT CONVERT(INT,RAND()*50+1)) AS [Price 1-50]

Running the above scripts twice for both Stock and Price show the following results:

Image 5

Image 6

One thing worth mentioning is that the scripts will still work if we exclude the inner SELECT keywords but for the ease of quick testing, I am leaving them as it is.

Deriving Logic of Generating Test Data for Numbers

Now is a good time to derive general formula of generating numbers test data whether it be for count of Stock or Price of a book or any other column which is filled with numbers.

If SELECT CONVERT(INT,RAND()*10+1) get us random numbers from 1 to 10,

then it can be generalised as follows:

SQL
SELECT CONVERT(INT,RAND()*MAX+1)

Same way in order to generate numbers from 1-100 to be used as test data then keeping the above formula in mind we can simply write the following piece of code in SQL:

SQL
-- Generating number from 1 to 100
SELECT CONVERT(INT,RAND()*100+1)

Generating Test Data for Categories (Using CHOOSE)

Now a question comes into mind that "Can we use the above mentioned formula to generate test data for categories?" the answer is "Yes".

If we are able to convert the categories names into numbers, then we can use the test data generator for numbers.

Let us first quickly define some categories for the books:

  1. Adventure
  2. Science
  3. Development

Now consider each category has a code and in the above example, 1 is for Adventure, 2 is for Science and 3 is for Development.

We can easily use the general formula to select any number from 1 to 3 as follows:

SQL
-- Generating code from 1 to 3
SELECT CONVERT(INT,RAND()*3+1)

Next, based on the randonly chosen value, we have to associate it with a category which can be achived by using CHOOSE() Function which returns item associated with the number passed to it as shown below:

SQL
-- Passing 1 in CHOOSE get us First Choice
-- Passing 2 in CHOOSE get us Second Choice
SELECT CHOOSE(2,'First Choice','Second Choice')

Image 7

Similarly, in our case, we are going to use CHOOSE function to supply randomly chosen number out of 3 and then based on that number, CHOOSE is going to return Category exists at that position as follows:

SQL
-- Declare variable to hold randomly selected category out of 3
DECLARE @CategoryCode INT=(SELECT RAND()*3+1),@Category VARCHAR(30)
 
-- Initialise variable with random category
SET @Category=(SELECT CHOOSE(@CategoryCode,'Adventure','Science','Development'))

-- Check chosen category
SELECT @CategoryCode as [CategoryCode 1-3],@Category as [Category 1-3]

Image 8

Generating Test Data for Title Tied to Category (Using CASE)

This bit is a bit tricky. We have to have some predefined titles to be tied with categories.

For example, if I say the book "a" belongs to "Adventure" Category and book "b" belongs to "Science", then whenever we have to provide test data for "Adventure" category, it should be "a" not "b".

This can be achived by using CASE statement as follows:

SQL
DECLARE @Category VARCHAR(40)='Adventure' -- Set Adventure Category
DECLARE @I INT =1 -- Helping Number

SELECT CASE @Category -- Check Category
WHEN 'Adventure' THEN CONCAT('An Adventure City ',@I) -- Create Adventure Book Title
WHEN 'Science' THEN CONCAT('The Amazing World of Science Part-',@I) -- Create Science Book Title
WHEN 'Development' _
   THEN CONCAT('The Art and Science of SQL Chapter-',@I) -- Create Development Book Title
END

Image 9

One thing worth noting is a helping number that is used to create as many as books as possible but no two books have the same name. For example, the first book in the Adventure Category is "An Adventure City 1" and the next book might be "An Adventure City 10" and so on.

Final Script - Basic Test Data Generator for Books

Finally, see below the whole test data generator script in action (please refer to the download souce code section for the latest version of the code):

SQL
-- (1) Create book table with Title
CREATE TABLE Book
(BookId INT PRIMARY KEY IDENTITY(1,1),
Title VARCHAR(100),
Category VARCHAR(30),
Price DECIMAL(10,2),
Stock INT);
GO

-- (2) Declare Counter (to repeat steps to generate test data), Category Code, Category and Title
DECLARE @Counter INT=1
DECLARE @CatogeryCode INT
DECLARE @Category VARCHAR(30)
DECLARE @Title VARCHAR(100)

WHILE @Counter<30
BEGIN
-- (3) Initialize CategoryCode variable with randomly selected category out of 3
SET @CatogeryCode =(SELECT RAND()*3+1)
 
-- (4) Initialize Category based on above generated CategoryCode
SET @Category=(SELECT CHOOSE(@CatogeryCode,'Adventure','Science','Development'))

-- (5) Initialize Title of the Book based on above generated Category
SET @Title=(SELECT CASE @Category
WHEN 'Adventure' THEN CONCAT('An Adventure City ',@Counter)
WHEN 'Science' THEN CONCAT('The Amazing World of Science Part-',@Counter)
WHEN 'Development' THEN CONCAT('The Art and Science of SQL Chapter-',@Counter)
END)
 
-- (5) Add data to the table Book
INSERT INTO dbo.Book
        ( Title,Category, Price, Stock )
VALUES
(@Title -- Randomly selected Title based on selected category
,@Category, -- Randomly selected category
(SELECT CONVERT(INT,RAND()*50+1)),(SELECT CONVERT(INT,RAND()*10+1))) -- Randomly selected price and stock

SET @Counter+=1

END
 
-- (6) View table
SELECT B.BookId ,
    B.Title,
       B.Category ,
       B.Price ,
       B.Stock FROM dbo.Book B
 
-- (7) Cleanup script (drop table)
DROP TABLE dbo.Book

The output is as follows:

Image 10

Running the script for the second time is going to give us different results:

Image 11

As said, no two outputs of the script are going to be the same so the random test data generator script is ready now.

Finally Turning Script into Stored Procedure

The script when turned into a stored procedure is going to become very handy for the test team since it is a matter of calling the procedure for them to get the Book table populated.

Let us now quickly focus on the things we need, in order to turn the script into procedure.

In order to see the full potential of the script (keeping in mind the requirements), we need to do the following things:

  1. A Database (BooksDemo) to be created for demo purposes
  2. A table (Book) to be created inside the database
  3. A stored procedure to create test data for books
  4. The stored procedure when called populates the table Book as a result
  5. The stored procedure should be able to accept total number of test data records to be created

Now inside the stored procedure, a temporary table needs to be created same as book table. The trick is to simply add # with the name and you are done.

SQL
-- Table Book 
CREATE TABLE Book
(BookId INT PRIMARY KEY IDENTITY(1,1),
Title VARCHAR(100),
Category VARCHAR(30),
Price DECIMAL(10,2),
Stock INT)

-- Temporary Table Book
CREATE TABLE #Book
(BookId INT PRIMARY KEY IDENTITY(1,1),
Title VARCHAR(100),
Category VARCHAR(30),
Price DECIMAL(10,2),
Stock INT)

Now the temporary table inside the procedure gets populated first and returns its results to the table Book outside plus few more tweaks (to be done) inside the procedure such as disabling identity inserts to populate id field.

SQL
-- (3) Generate Test Data for Books and put it into table Book
INSERT INTO Book
(
BookId,
Title,
Category ,
Price ,
Stock)
EXEC GenerateBooksTestData @Total

The code above is for demo purposes only and does not run unless the procedure and table being called is created.

Please refer to the download source code section for runnable version of the above code.

Generating Over 1 Million Records 

Once the stored procedure has been created based on the script, next step was to test run the procedure to generate a relatively large data set.

The stored procedure successfully generated over one million records in 1 minute and 14 seconds (on a dev machine) simply by initialising @Total variable with the desired value as shown below:

Image 12

In short, simply turning the script into the stored procedure, you have made your solution a lot more flexible and reliable which is also capable of generating over one million test data records.

Points of Interest

If you are interested to generate large data sets please initialise @Total variable in the stored procedure (please refer to the download code section) with a big number and check how far you can go.

The script generates a different result set each time it runs that makes it a bit difficult to unit test.

Another big challenge would be creating a unit test for such a stored procedure that generates random data because we are not sure what exactly is going to be the output so the expected value cannot be easily compared with the actual value, one of the requirements of unit testing.

Another interesting way of using this script is to consider populating two different book tables and then comparing them for testing comparative analysis.

It is also worth improving the random number generator formula further to be able to generate numbers in any range rather than starting from 1.

It is possible that any of the above discussed ideas may form the bases for another article based on this article.

History

  • 2017-10-21 Version 1.0 Complete
  • 2017-10-22 Version 1.1 Improved source code and article
  • 2017-11-02 Version 1.2 Script was turned into stored procedure (source code and article update)
  • 2017-11-09 Version 1.3 Generating over one million records plus some Typos fixed (article update)
  • 2017-11-13 Version 1.4 The random number mininum value has to be 1 (article and source code update)
  • 2017-11-14 Version 1.5 Minor fixes (making sure random Price is between 1 and 50) and added another Point of Interest (article update)

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)