Introduction
Before we start discussing what a subquery is, we need to have a sample database schema.
Diagram was created with Vertabelo
What is a subquery?
A subquery is a SELECT
statement with another SQL statement, like in the example below:
select *
from product
where id in
(select product_id
from provider_offer
where provider_id = 156);
Subqueries are further classified as either a correlated subquery or a nested subquery. They are usually constructed in such a way to return:
- a table:
select max(average.average_price)
from (select product_category, avg(price) as average_price
from product
group by product_category) average;
- or a value :
select id
from purchase
where value > (select avg(value)
from purchase )
Nested Subqueries
Nested subqueries are subqueries that don't rely on an outer query. In other words, both queries in a nested subquery may be run as separate queries.
This type of subquery could be used almost everywhere, but it usually takes one of these formats:
SELECT
FROM
WHERE [NOT] IN (subquery)
Select *
from client
where city in (select city
from provider);
The example subquery returns all clients that are from the same city as the product providers.
The IN
operator checks if the value is within the table and retrieves the matching rows.
Correlated Subqueries
Subqueries are correlated when the inner and outer queries are interdependent, that is, when the outer query is a query that contains a subquery and the subquery itself is an inner query. Users that know programming concepts may compare it to a nested loop structure.
Let's start with a simple example. The inner query calculates the average value and returns it. In the outer query’s where
clause, we filter only those purchases which have a value greater than the inner query’s returned value.
Subquery Correlated in Where Clause
select id
from purchase p1
where date > '2013-07-15'
and value > (select avg(value)
from purchase p2
where p1.date = p2.date)
The query returns purchases after 15/07/2014
with a total price greater than the average value from the same day.
The equivalent example, but with joining tables.
Select p1.id
from purchase p1, purchase p2
where p1.date = p2.date
and p1.date > '2013-07-15'
group by p1.id
having p1.value > avg(p2.value);
This example can also be written as a select
statement with a subquery correlated in a from
clause.
The subquery returns the table that contains the average value for each purchase for each day. We join this result with the Purchase
table on column 'date
' to check the condition date > '15/07/2014'
.
select id
from purchase, (select date, avg(value) as average_value
from purchase
where date > '2013-07-15'
group by date) average
where purchase.date = average.date
and purchase.date > '2013-07-15'
and purchase.value > average.average_value;
Usually, this kind of subquery should be avoided because indexes can't be used on a temporary table in memory.
When a subquery is used, the query optimizer performs additional steps before the results from the subquery are used. If a query that contains a subquery can be written using a join
, it should be done this way. Join
s usually allow the query optimizer to retrieve the data in a more efficient way.
You can find an extended article as well as more examples here.