Generating desired amount of rows in Oracle using single statement

Wendelius

5.00/5 (1 vote)

31 Aug 2014CPOL4 min read

17.4K

Two ways of generating desired amount of rows in Oracle by using a single SQL statement.

Download OraInfiniteSql.zip - 550 B

Introduction

Sometimes you need to generate an exact amount of rows for an SQL statement. There are several ways of doing this like creating a table containing sufficient amount of rows, creating a function which returns the desired set and so on. This tip shows how you can use either CONNECT BY or recursive CTE (Common Table Expression) to fullfil the requirement.

Alternative 1: CONNECT BY

CONNECT BY clause is used to create hierarchical queries. Generating desired amount of rows can be considered as a hierarchical query but without actual connection between the parent and te child records (basically just an endless loop).

So to generate 10 rows, you can use a query like

SQL

----------------------------------
-- Generate 10 rows
----------------------------------
SELECT Level 
FROM Dual 
CONNECT BY Level <= 10;

The Level pseudocolumn returns the current depth for recursion so it is used to restrict the amount of rows returned. The result of the query above is:

LEVEL
------
1
2
3
4
5
6
7
8
9
10

Well, that was simple.

Alternative 2: CTE

You may want to use a common-table expression to do the recursion. For example, the CONNECT BY is Oracle specific and if you need a statement structure more close to standard SQL then CTE would be an option.

The query for the 10 rows using CTE could look like this:

SQL

----------------------------------
-- CTE: Generate 10 rows
----------------------------------
WITH InfiniteRows (RowNumber) AS (
   -- Anchor member definition
   SELECT RowNum AS RowNumber
   FROM Dual
   UNION ALL
   -- Recursive member definition
   SELECT a.RowNumber + 1    AS RowNumber
   FROM   InfiniteRows a
   WHERE  a.RowNumber < 10
)
-- Statement that executes the CTE
SELECT RowNumber
FROM   InfiniteRows;

Some explanation for the statement. The anchor statement produces one row having 1 as row number. The recursive part uses the CTE as the source table thus receiving rows from previous iteration as a result set. Now since this is a UNION the number of rows is growing by one on each iteration.

The restricting WHERE clause is placed in the recursive member definition in order to stop the recursion when the desired amount of rows has been generated.

So why the WHERE clause is inside the CTE, why not on the outside where the CTE is queried? This is because CTE works as a inline view or a temporary table. Only after the set is created the result is passed to the executing statement. Moving the condition outside would cause an infinite loop. Lets try:

SQL

----------------------------------
-- CTE: Generate 10 rows
-- Wrong place for the condition
----------------------------------
WITH InfiniteRows (RowNumber) AS (
   -- Anchor member definition
   SELECT RowNum AS RowNumber
   FROM Dual
   UNION ALL
   -- Recursive member definition
   SELECT a.RowNumber + 1    AS RowNumber
   FROM   InfiniteRows a
) 
-- Statement that executes the CTE
SELECT RowNumber
FROM   InfiniteRows
WHERE  RowNumber < 10;

Oracle notices the cycle so running the statement above would cause an error:

ORA-32044: cycle detected while executing recursive WITH query

Of course you could define a CYCLE option in the statement but it wouldn't make sense to allow the database to execute the infinite loop.

Further investigations

Large amount of rows

So what happens if we query for example 10'000'000 rows using the CONNECT BY.

SQL

----------------------------------
-- Try to generate 10'000'000 rows
----------------------------------
SELECT Level 
FROM Dual 
CONNECT BY Level <= 10000000;

Depending on your system settings this may or may not execute succesfully. With my initial 11 XE installation an error is returned when the amount of rows exceeds 2'766'500. The error is:

ORA-30009: Not enough memory for CONNECT BY operation

The explanation for the error is

Quote: Oracle documentation

ORA-30009: Not enough memory for string operation
Cause: The memory size was not sufficient to process all the levels of the hierarchy specified by the query.
Action: In WORKAREA_SIZE_POLICY=AUTO mode, set PGA_AGGREGATE_TARGET to a reasonably larger value. Or, in WORKAREA_SIZE_POLICY=MANUAL mode, set SORT_AREA_SIZE to a reasonably larger value.

Since the WORKAREA_SIZE_POLICY in my system is set to MANUAL, I'll increase the sort area size for this session:

SQL

----------------------------------
-- Increase the sort area size 
----------------------------------
ALTER SESSION SET SORT_AREA_SIZE = 500000000;

Now running the same query should work fine.

Well what about the CTE version. If I run the following query with initial system settings:

SQL

----------------------------------
-- CTE: Generate 10'000'000 rows
----------------------------------
WITH InfiniteRows (RowNumber) AS (
   -- Anchor member definition
   SELECT RowNum AS RowNumber
   FROM Dual
   UNION ALL
   -- Recursive member definition
   SELECT a.RowNumber + 1    AS RowNumber
   FROM   InfiniteRows a
   WHERE  a.RowNumber < 10000000
)
-- Statement that executes the CTE
SELECT RowNumber
FROM   InfiniteRows;

The query returns the correct result without any error messages.

However, the execution time was huge compared to the CONNECT BY query.

Some performance indicators

As said, there is a big difference between CONNECT BY version and CTE version. With small amounts of data the difference is unnoticiable but with large amount of rows the time consumed by CTE can be significant. To get the idea, some performance indicators from my tests:

	CONNECT BY	CTE
Runtime memory	1'608	2'700
Disk reads	0	92
Direct writes	0	28'637
Buffer gets	0	103'201'301
User I/O wait time (µs)	0	1'734'483
CPU time (µs)	17'908'915	448'050'470
Elapsed (µs)	18'715'008	465'691'358

So as you see the main difference is how the query is actually executed. As said earlier the CTE is interpreted as an inline view or temporary table. This is clearly seen in the direct writes thus affecting the execution time. Since CONNECT BY is an in memory operation the execution statistics are obviously far better. On the other hand handling lots of rows may be difficult with CONNECT BY.

References

Some references that may be useful:

History

31^st August, 2014: Created.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)