Introduction
Welcome back. I confess, I am considering dropping the episode numbers, as each article is becoming less sequential
and more self contained. I intend to keep writing SQL articles and using the overarching title, just to group them
together, and still create the concept of a series that can be worked through as if it were a book, to allow people
wanting to learn to stick with one set of information and not just read random articles of varying style, quality and
accuracy.
I've also decided to rename the series to 'Thinking in SQL', because SQL does require a different mindset, and that's what all my articles are trying to encourage.
Databases are generally used to store data, in a structured format, so we can rely on them to provide us with the same
information every time we make the same request. Nevertheless, sometimes we do want to generate random numbers inside SQL
Server, not least when we're trying to populate test data. So, today's article is about how generating random numbers
works in SQL Server. As always, it's more interesting and has more points to consider than you might first think.
Generating a random number
select rand()
That's all you need to do, to generate a random number in SQL. This creates a value between 0 and 1, so you can create
a range up to xx by multiplying by xx, and you can convert to int to make it a whole number, if you require it. The
conversion is also implicit if you just use it somewhere that a whole number is expected.
Generating a random number sequence
Now run this to get a set of five random numbers:
select rand() union all
select rand() union all
select rand() union all
select rand() union all
select rand()
Pretty straightforward, right ? But, it does not scale well if you wanted to create a LOT of them. Enter the tally table.
Here is SQL to use a tally table to generate 100 random numbers.
WITH
E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),
E2(N) AS (SELECT 1 FROM E1 a, E1 b)
select rand() from e2
;
What happened ? SQL Server has optimised our query and decided to call rand() once. It doesn't realise that it returns
a different value every time, and so assumes that it's being efficient for us. This is, in general, a good thing, but in
this case, it's an issue.
The first way we'll try to abstract out the random number generation, is with a view. Create this:
CREATE VIEW vwRand
AS
SELECT RAND() as Rand
go
And then run it like this:
WITH
E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),
E2(N) AS (SELECT 1 FROM E1 a, E1 b)
select rand() from e2, vwRand
;
This is a cross join, but because vwRand has one value in it, we still get 100 values back. Sadly, they are all the
same. So, what can we run in SQL that knows not to be optimised away ? The answer is, a user defined function.
CREATE FUNCTION RandNumber()
RETURNS float
AS
BEGIN
RETURN (SELECT rand() as RandNumber)
END
go
This is not allowed. Rand is considered a 'side effecting function' ( perhaps it increments a sequence, all 'random'
sequences being not really random at all ). So, we need to do it in two steps, as follows:
CREATE FUNCTION Random()
RETURNS float
AS
BEGIN
RETURN (SELECT Rand FROM vwRand)
END
go
So, our view is still creating the random number, but a function is causing the view to be invoked for every row. This
is a good lesson in terms of the efficiency of functions in general, but in our case, it's perfect.
WITH
E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),
E2(N) AS (SELECT 1 FROM E1 a, E1 b)
select dbo.random() from e2
;
This finally gives us 100 random numbers. Of course, they are all between 0 and 1. So long as we have this wrapped up
at all, may as well make it easy to specify what our range is. Here is a function to do this:
CREATE FUNCTION RandomNumber(@max int)
RETURNS float
AS
BEGIN
RETURN (SELECT Rand * @max FROM vwRand)
END
go
And now you can call this to get a sequence of random values up to the specified maximum
WITH
E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),
E2(N) AS (SELECT 1 FROM E1 a, E1 b)
select dbo.randomNumber(40) from e2
;
go
If we wanted to get random values that are ints, we could cast to int, or we can take a different approach. This code
takes the random number, moves the decimal place across, then uses the % operator ( which is mod ) to strip down to the
remainder from the value. As this will return a range of 0 to n-1, we add one to the result.
CREATE FUNCTION RandomInt(@max int)
RETURNS int
AS
BEGIN
RETURN (SELECT convert(int, Rand * 100000)% @max FROM vwRand) + 1
END
go
And here's the code to call it.
WITH
E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),
E2(N) AS (SELECT 1 FROM E1 a, E1 b)
select dbo.randomInt(50) from e2
;
Benchmarking our results
The big question with any random sequence is, how random is it ? We're going to use SQL to try to find out. Our
strategy will be to create a small range of numbers, count them and see if the spread is reasonably even. If we asked for
a value between 0 and 5, and it was 4 60% of the time, that would not indicate a reasonable spread of values...
WITH
E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),
E2(N) AS (SELECT 1 FROM E1 a, E1 b),
rand as(select dbo.randomInt(5) as r from e2)
select r, count(r) from rand group by r order by count(r) desc
;
This will return a result along the lines of:
2 26
3 21
1 19
4 18
5 16
Which is a reasonable sort of spread ( if they were all 20, that would not be random, either ).
Another way to create random numbers
Another way to create random numbers, is to start with a GUID. A GUID is, by definition, random. The following code
casts a GUID to a binary value of 6 bytes. We then use the mod operator to clamp down to the range we want, maximum
possible values exceed 2 billion. This also converts the value to an int, but it could be negative, so we use ABS to
strip the negative sign, and again add one to clamp to our range (as above)
SELECT ABS(CAST(NEWID() AS binary(6)) %10) + 1
The next question is, will this work in inline code ? Run this and you will see it does work, NewId() is called for
every row.
WITH
E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),
E2(N) AS (SELECT 1 FROM E1 a, E1 b)
select ABS(CAST(NEWID() AS binary(6)) %10) + 1 from e2
;
However, it's enough of a mouthful to make sense to wrap it in a view and a function. We need a function in order to
be able to pass in the maximum value we want our range to be in, otherwise we could just use a view. A function makes
more sense though, as we are after a single value. It is worth mentioning that using functions will slow down your SQL, so you should make an intelligent choice based on your needs.
create VIEW vwRandFromId
AS
select abs(convert(int, CAST(NEWID() AS binary(6)))) as rand
go
CREATE FUNCTION RandomIntFromId(@max int)
RETURNS int
AS
BEGIN
RETURN (SELECT Rand% @max FROM vwRandFromId)
END
Now we can do this where ever we need to:
WITH
E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),
E2(N) AS (SELECT 1 FROM E1 a, E1 b)
select dbo.RandomIntFromId(500) from e2
;
Once again, let's see how random this really is. We're not using functionality designed for randomness, so this is
worth looking in to.
WITH
E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),
E2(N) AS (SELECT 1 FROM E1 a, E1 b),
rand as(select dbo.RandomIntFromId(5) as r from e2)
select r, count(r) from rand group by r order by count(r) desc
;
My results were as follows:
0 28
1 22
3 17
4 17
2 16
That seems like a decent spread to me, more than good enough to use instead of the build in function, if we wanted to.
Conclusion
Today we looked in to two ways to create random numbers in SQL Server, wrote functions to call to return random values
within a range ( although we didn't work on having a minimum value that's not zero, that's a simple exercise if you want
to do it ), and checked our code to make sure that the spread of values was reasonable. Random values are not something
you need often in SQL, but when you do, it's good to know how to create them efficiently and cleanly.