Thinking in SQL - Generating Random Numbers

Christian Graus

4.80/5 (2 votes)

23 Jan 2014CPOL6 min read

28.5K

194

A discussion of ways to create random number sequences in SQL Server

Download Thinking_in_SQL_-_Random_Numbers.zip - 1.1 KB

Introduction

Welcome back. I confess, I am considering dropping the episode numbers, as each article is becoming less sequential and more self contained. I intend to keep writing SQL articles and using the overarching title, just to group them together, and still create the concept of a series that can be worked through as if it were a book, to allow people wanting to learn to stick with one set of information and not just read random articles of varying style, quality and accuracy.

I've also decided to rename the series to 'Thinking in SQL', because SQL does require a different mindset, and that's what all my articles are trying to encourage.

Databases are generally used to store data, in a structured format, so we can rely on them to provide us with the same information every time we make the same request. Nevertheless, sometimes we do want to generate random numbers inside SQL Server, not least when we're trying to populate test data. So, today's article is about how generating random numbers works in SQL Server. As always, it's more interesting and has more points to consider than you might first think.

Generating a random number

SQL

select rand()

That's all you need to do, to generate a random number in SQL. This creates a value between 0 and 1, so you can create a range up to xx by multiplying by xx, and you can convert to int to make it a whole number, if you require it. The conversion is also implicit if you just use it somewhere that a whole number is expected.

Generating a random number sequence

Now run this to get a set of five random numbers:

SQL

select rand() union all 
select rand() union all 
select rand() union all 
select rand() union all 
select rand()

Pretty straightforward, right ? But, it does not scale well if you wanted to create a LOT of them. Enter the tally table. Here is SQL to use a tally table to generate 100 random numbers.

SQL

WITH 
  E1(N) AS (
            SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
            SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
            SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
           ),                          -- 1*10^1 or 10 rows
  E2(N) AS (SELECT 1 FROM E1 a, E1 b) -- 1*10^2 or 100 rows
	select rand() from e2 
;

What happened ? SQL Server has optimised our query and decided to call rand() once. It doesn't realise that it returns a different value every time, and so assumes that it's being efficient for us. This is, in general, a good thing, but in this case, it's an issue.

The first way we'll try to abstract out the random number generation, is with a view. Create this:

SQL

CREATE VIEW vwRand
AS
SELECT RAND() as Rand

go

And then run it like this:

SQL

WITH 
  E1(N) AS (
            SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
            SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
            SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
           ),                          -- 1*10^1 or 10 rows
  E2(N) AS (SELECT 1 FROM E1 a, E1 b) -- 1*10^2 or 100 rows
	select rand() from e2, vwRand 
;

This is a cross join, but because vwRand has one value in it, we still get 100 values back. Sadly, they are all the same. So, what can we run in SQL that knows not to be optimised away ? The answer is, a user defined function.

SQL

CREATE FUNCTION RandNumber()
RETURNS float
AS
  BEGIN
  RETURN (SELECT rand() as  RandNumber)
  END

go

This is not allowed. Rand is considered a 'side effecting function' ( perhaps it increments a sequence, all 'random' sequences being not really random at all ). So, we need to do it in two steps, as follows:

SQL

-- Function to create random number by selecting from view
CREATE FUNCTION Random()
RETURNS float
AS
  BEGIN
  RETURN (SELECT Rand FROM vwRand)
  END

go

So, our view is still creating the random number, but a function is causing the view to be invoked for every row. This is a good lesson in terms of the efficiency of functions in general, but in our case, it's perfect.

SQL

WITH 
  E1(N) AS (
            SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
            SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
            SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
           ),                          -- 1*10^1 or 10 rows
  E2(N) AS (SELECT 1 FROM E1 a, E1 b) -- 1*10^2 or 100 rows
	select dbo.random() from e2
;

This finally gives us 100 random numbers. Of course, they are all between 0 and 1. So long as we have this wrapped up at all, may as well make it easy to specify what our range is. Here is a function to do this:

SQL

CREATE FUNCTION RandomNumber(@max int)
RETURNS float
AS
  BEGIN
  RETURN (SELECT Rand * @max FROM vwRand)
  END

go

And now you can call this to get a sequence of random values up to the specified maximum

SQL

WITH 
  E1(N) AS (
            SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
            SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
            SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
           ),                          -- 1*10^1 or 10 rows
  E2(N) AS (SELECT 1 FROM E1 a, E1 b) -- 1*10^2 or 100 rows
	select dbo.randomNumber(40) from e2
;
go

If we wanted to get random values that are ints, we could cast to int, or we can take a different approach. This code takes the random number, moves the decimal place across, then uses the % operator ( which is mod ) to strip down to the remainder from the value. As this will return a range of 0 to n-1, we add one to the result.

SQL

CREATE FUNCTION RandomInt(@max int)
RETURNS int
AS
  BEGIN
  RETURN (SELECT convert(int, Rand * 100000)% @max FROM vwRand) + 1
  END
go

And here's the code to call it.

SQL

WITH 
  E1(N) AS (
            SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
            SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
            SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
           ),                          -- 1*10^1 or 10 rows
  E2(N) AS (SELECT 1 FROM E1 a, E1 b) -- 1*10^2 or 100 rows
	select dbo.randomInt(50) from e2
;

Benchmarking our results

The big question with any random sequence is, how random is it ? We're going to use SQL to try to find out. Our strategy will be to create a small range of numbers, count them and see if the spread is reasonably even. If we asked for a value between 0 and 5, and it was 4 60% of the time, that would not indicate a reasonable spread of values...

SQL

WITH 
  E1(N) AS (
            SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
            SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
            SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
           ),                          -- 1*10^1 or 10 rows
  E2(N) AS (SELECT 1 FROM E1 a, E1 b), -- 1*10^2 or 100 rows
  rand as(select dbo.randomInt(5) as r from e2)
  select r, count(r) from rand group by r order by count(r) desc
;

This will return a result along the lines of:

Which is a reasonable sort of spread ( if they were all 20, that would not be random, either ).

Another way to create random numbers

Another way to create random numbers, is to start with a GUID. A GUID is, by definition, random. The following code casts a GUID to a binary value of 6 bytes. We then use the mod operator to clamp down to the range we want, maximum possible values exceed 2 billion. This also converts the value to an int, but it could be negative, so we use ABS to strip the negative sign, and again add one to clamp to our range (as above)

SQL

SELECT ABS(CAST(NEWID() AS binary(6)) %10) + 1

The next question is, will this work in inline code ? Run this and you will see it does work, NewId() is called for every row.

SQL

WITH 
  E1(N) AS (
            SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
            SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
            SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
           ),                          -- 1*10^1 or 10 rows
  E2(N) AS (SELECT 1 FROM E1 a, E1 b) -- 1*10^2 or 100 rows
	select ABS(CAST(NEWID() AS binary(6)) %10) + 1  from e2 
;

However, it's enough of a mouthful to make sense to wrap it in a view and a function. We need a function in order to be able to pass in the maximum value we want our range to be in, otherwise we could just use a view. A function makes more sense though, as we are after a single value. It is worth mentioning that using functions will slow down your SQL, so you should make an intelligent choice based on your needs.

SQL

create VIEW vwRandFromId
AS
	select abs(convert(int, CAST(NEWID() AS binary(6)))) as rand
go 

CREATE FUNCTION RandomIntFromId(@max int)
RETURNS int
AS
  BEGIN
  RETURN (SELECT Rand% @max FROM vwRandFromId)
  END

Now we can do this where ever we need to:

SQL

WITH 
  E1(N) AS (
            SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
            SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
            SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
           ),                          -- 1*10^1 or 10 rows
  E2(N) AS (SELECT 1 FROM E1 a, E1 b) -- 1*10^2 or 100 rows
	select dbo.RandomIntFromId(500) from e2 
;

Once again, let's see how random this really is. We're not using functionality designed for randomness, so this is worth looking in to.

SQL

WITH 
  E1(N) AS (
            SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
            SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
            SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
           ),                          -- 1*10^1 or 10 rows
  E2(N) AS (SELECT 1 FROM E1 a, E1 b), -- 1*10^2 or 100 rows
  rand as(select dbo.RandomIntFromId(5) as r from e2)
  select r, count(r) from rand group by r order by count(r) desc
;

My results were as follows:

SQL

That seems like a decent spread to me, more than good enough to use instead of the build in function, if we wanted to.

Conclusion

Today we looked in to two ways to create random numbers in SQL Server, wrote functions to call to return random values within a range ( although we didn't work on having a minimum value that's not zero, that's a simple exercise if you want to do it ), and checked our code to make sure that the spread of values was reasonable. Random values are not something you need often in SQL, but when you do, it's good to know how to create them efficiently and cleanly.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)