Ah, the WITH
statement. A (mostly) loved holdover from such languages as Python and VB(A) that still finds some questionable use today in JavaScript. It was easy to use and could save quite a bit of coding time. I do, at times, wish it was available in Java. But this blog is not about that WITH
, nor am I pining for Java to take on some old VB traits (GoTo
, anyone?).
I recently learned that WITH
’s twin, if only in name, lives on as a SQL clause. Even though most of my time is spent doing application development, I’m no stranger to database development. I do enjoy diving into a DBMS and whipping up some nice SQL code, stored procedures and the like. But WITH
was a clause I’d done without because I had no idea it existed in the SQL world.
Recently, a coworker asked me to look at a very frustrating DB2 statement that wasn’t compiling or executing as it should. It seemed that there were problems buried somewhere deep in the layers of nested statements required because temporary tables were not going to be something we could use here. After admitting defeat, we called in an architect who suggested the WITH
clause. What? Neither of us had heard of a WITH
clause for SQL. With my curiosity peaked, I decided to investigate this WITH
clause further to see what I had been missing out on.
It turns out that WITH
can be pretty useful in the right situations, simplifying code and providing a more efficient way to access subqueries and temporary table structures. I found that it was adopted in the SQL:1999 standard as a way to write recursive queries and create named subquery blocks.
Some of the major DBMS players jumped on this right away, but with slightly different implementations. Oracle chose not to implement the recursive functionality of the WITH
clause because they already had this functionality using the PL/SQL CONNECT BY
clause:
SELECT product_id, product_name, parent_id
FROM tbl_products
START WITH product_id=50
CONNECT BY parent_id = PRIOR product_id;
Using the WITH
clause in DB2 or SQL Server to achieve the same result looks like this:
WITH recursiveProducts
(product_id, product_name, category_id) AS
(SELECT parent.product_id,
parent.product_name,
parent.parent_id
FROM tbl_products parent
WHERE parent.product_id=50
UNION ALL
SELECT child.product_id,
child.product_name,
child.parent_id
FROM recursiveProducts parent, tbl_products child
WHERE child.parent_id = parent.product_id)
SELECT product_id, product_name, product_id
FROM recursiveProducts;
Obviously not as clean or simple as Oracle’s implementation, but it works.
PostgreSQL requires the use of an additional keyword RECURSIVE
to make this statement work:
WITH RECURSIVE recursiveProducts
(product_id, product_name, category_id) AS
(SELECT parent.product_id,
parent.product_name,
parent.parent_id
FROM tbl_products parent
WHERE parent.product_id=50
UNION ALL
SELECT child.product_id,
child.product_name,
child.parent_id
FROM recursiveProducts parent, tbl_products child
WHERE child.parent_id = parent.product_id)
SELECT product_id, product_name, product_id
FROM recursiveProducts;
Sadly, some other “major” DBMSs (I’m looking at you MySQL) do not support the WITH
function or recursive queries at all, so at least we’ve got something here.
While knowing how to write a recursive SQL statement is a great thing, it's not something I’ve really had a need for very often. The other function of the WITH
clause, however, is something I could see myself using somewhat often now that I’m familiar with it. The ability to use WITH
to create named subquery blocks, or even pseudo temp tables, is supported by most (but not all – still looking at you, MySQL), such as Oracle, PostgreSQL, DB2 and SQL Server. As an added bonus, this also becomes a more efficient way to reuse the same subquery.
The optimizer in Oracle recognizes that the statement in the WITH
block only needs to be executed once, as opposed to each time it's used in the statement. For example, this statement, not using WITH
, will execute the same subquery twice:
SELECT product_id, (SELECT AVG(current_inventory) FROM product)
FROM product
WHERE current_inventory > (SELECT AVG(current_inventory)
FROM product);
Using WITH
, we get the same results, but Oracle knows to only execute the subquery once.
WITH product Avg AS (SELECT AVG(current_inventory) avg_inventory
FROM product)
SELECT product_id, (SELECT avg_inventory FROM productAVG)
FROM product
WHERE current_inventory > (SELECTavg_inventory
FROM productAVG);
DB2 handles things a little differently, using the WITH
clause as a way to create a pseudo temporary table, saving the overhead of creating and dropping a regular view that would only be used one time. Additionally, during statement preparation, DB2 does not need to access the catalog for the view, saving additional overhead.
Without WITH
, we get a pretty inefficient statement that creates a true temporary table. Not only do we need to define the table and its columns, it still needs to be populated with data. This can be an expensive set of operations depending on the size of your temporary table, the amount of data being stored and how many temporary tables you’ll end up using.
declare global temporary table product Avg
(avg_inventory numeric(8,2))
on commit preserve rows not logged ;
insert into SESSION.productAVG (avg_inventory)
SELECT AVG(current_inventory) avg_inventory
FROM product
SELECT prd.product_id, avg.avg_inventory
FROM product prd, SESSION.productAVG avg
WHERE prd.current_inventory > avg.avg_inventory
By using the WITH
clause instead, we’re not accessing the catalog nor are we having to use the SESSION
schema. Plus we don’t have to remember to drop the table when we’re done. This table structure falls out of scope and is gone forever once the statement is executed. Because DB2 treats this as a standard temporary table, you’re able to use it in exactly the same way.
WITH productAVG as (SELECT AVG(current_inventory) avg_inventory
FROM product)
SELECT prd.product_id, pAvg.avg_inventory
FROM product prd,.productAvg pAvg
WHERE prd.current_inventory > pAvg.avg_inventory
I’ve just outlined the basic uses here. There are many ways in which these can be used as building blocks for simplifying potentially complex SQL statements, all while keeping code as efficient as possible. I am excited to find excuses to put this clause to use now that I know it exists. The nightmares of SQL horrors past may never have occurred had I learned of this earlier.
A word of caution, though: I do encourage anyone looking to make use of the WITH
clause to take the time to research if it is supported (seriously MySQL, I do actually like you) and how it is implemented for the DBMS you are using. In my limited research, I have found that each platform handles WITH
a bit differently, from limiting functionality to differences in how the underlying structures are defined and created by WITH
. Good luck!
– Scott Peters, asktheteam@keyholesoftware.com