Introduction
Sometimes you need to generate an exact amount of rows for an SQL statement. There are several ways of doing this like creating a table containing sufficient amount of rows, creating a function which returns the desired set and so on. This tip shows how you can use either CONNECT BY
or recursive CTE (Common Table Expression) to fullfil the requirement.
Alternative 1: CONNECT BY
CONNECT BY
clause is used to create hierarchical queries. Generating desired amount of rows can be considered as a hierarchical query but without actual connection between the parent and te child records (basically just an endless loop).
So to generate 10 rows, you can use a query like
SELECT Level
FROM Dual
CONNECT BY Level <= 10;
The Level
pseudocolumn returns the current depth for recursion so it is used to restrict the amount of rows returned. The result of the query above is:
LEVEL
------
1
2
3
4
5
6
7
8
9
10
Well, that was simple.
Alternative 2: CTE
You may want to use a common-table expression to do the recursion. For example, the CONNECT BY
is Oracle specific and if you need a statement structure more close to standard SQL then CTE would be an option.
The query for the 10 rows using CTE could look like this:
WITH InfiniteRows (RowNumber) AS (
SELECT RowNum AS RowNumber
FROM Dual
UNION ALL
SELECT a.RowNumber + 1 AS RowNumber
FROM InfiniteRows a
WHERE a.RowNumber < 10
)
SELECT RowNumber
FROM InfiniteRows;
Some explanation for the statement. The anchor statement produces one row having 1 as row number. The recursive part uses the CTE as the source table thus receiving rows from previous iteration as a result set. Now since this is a UNION
the number of rows is growing by one on each iteration.
The restricting WHERE
clause is placed in the recursive member definition in order to stop the recursion when the desired amount of rows has been generated.
So why the WHERE
clause is inside the CTE, why not on the outside where the CTE is queried? This is because CTE works as a inline view or a temporary table. Only after the set is created the result is passed to the executing statement. Moving the condition outside would cause an infinite loop. Lets try:
WITH InfiniteRows (RowNumber) AS (
SELECT RowNum AS RowNumber
FROM Dual
UNION ALL
SELECT a.RowNumber + 1 AS RowNumber
FROM InfiniteRows a
)
SELECT RowNumber
FROM InfiniteRows
WHERE RowNumber < 10;
Oracle notices the cycle so running the statement above would cause an error:
ORA-32044: cycle detected while executing recursive WITH query
Of course you could define a CYCLE
option in the statement but it wouldn't make sense to allow the database to execute the infinite loop.
Further investigations
Large amount of rows
So what happens if we query for example 10'000'000 rows using the CONNECT BY
.
SELECT Level
FROM Dual
CONNECT BY Level <= 10000000;
Depending on your system settings this may or may not execute succesfully. With my initial 11 XE installation an error is returned when the amount of rows exceeds 2'766'500. The error is:
ORA-30009: Not enough memory for CONNECT BY operation
The explanation for the error is
Quote: Oracle documentation
ORA-30009: Not enough memory for string operation
Cause: The memory size was not sufficient to process all the levels of the hierarchy specified by the query.
Action: In WORKAREA_SIZE_POLICY=AUTO mode, set PGA_AGGREGATE_TARGET to a reasonably larger value. Or, in WORKAREA_SIZE_POLICY=MANUAL mode, set SORT_AREA_SIZE to a reasonably larger value.
Since the WORKAREA_SIZE_POLICY
in my system is set to MANUAL
, I'll increase the sort area size for this session:
ALTER SESSION SET SORT_AREA_SIZE = 500000000;
Now running the same query should work fine.
Well what about the CTE version. If I run the following query with initial system settings:
WITH InfiniteRows (RowNumber) AS (
SELECT RowNum AS RowNumber
FROM Dual
UNION ALL
SELECT a.RowNumber + 1 AS RowNumber
FROM InfiniteRows a
WHERE a.RowNumber < 10000000
)
SELECT RowNumber
FROM InfiniteRows;
The query returns the correct result without any error messages.
However, the execution time was huge compared to the CONNECT BY
query.
Some performance indicators
As said, there is a big difference between CONNECT BY
version and CTE version. With small amounts of data the difference is unnoticiable but with large amount of rows the time consumed by CTE can be significant. To get the idea, some performance indicators from my tests:
| CONNECT BY | CTE |
Runtime memory | 1'608 | 2'700 |
Disk reads | 0 | 92 |
Direct writes | 0 | 28'637 |
Buffer gets | 0 | 103'201'301 |
User I/O wait time (µs) | 0 | 1'734'483 |
CPU time (µs) | 17'908'915 | 448'050'470 |
Elapsed (µs) | 18'715'008 | 465'691'358 |
So as you see the main difference is how the query is actually executed. As said earlier the CTE is interpreted as an inline view or temporary table. This is clearly seen in the direct writes thus affecting the execution time. Since CONNECT BY
is an in memory operation the execution statistics are obviously far better. On the other hand handling lots of rows may be difficult with CONNECT BY
.
References
Some references that may be useful:
History
- 31st August, 2014: Created.