Correlated subqueries
So far the subquery ran a single time, independently. A correlated subquery is different: it references a column from the outer query, so it is evaluated once for each row of the outside. It is like a loop: for each category, it runs the inner one using that category.
-- Categories whose most expensive product is over 100
SELECT name
FROM categories c
WHERE 100 < (
SELECT MAX(p.price)
FROM products p
WHERE p.category_id = c.id -- <- reference to the outer row "c"
)
ORDER BY name;
The key is the alias: c is defined outside and used inside. That is what
"correlates" both queries.
EXISTS
EXISTS takes a subquery and returns true if that subquery
produces at least one row (it doesn't matter which row: only whether it exists or not). It is used almost
always correlated, to ask "is there any related row?":
-- Categories that HAVE at least one product
SELECT name
FROM categories c
WHERE EXISTS (
SELECT 1 FROM products p WHERE p.category_id = c.id
)
ORDER BY name;
By convention you write SELECT 1 inside EXISTS: since only
existence matters, the specific columns are irrelevant.
Its negation, NOT EXISTS, finds the opposite: categories without products.
SELECT name
FROM categories c
WHERE NOT EXISTS (
SELECT 1 FROM products p WHERE p.category_id = c.id
)
ORDER BY name;
EXISTS vs IN: they often give the same result, but
EXISTSis usually clearer and more robust when there is correlation or possibleNULL, and it can stop as soon as it finds the first match.
Examples
EXISTS: categories with available stock
SELECT name
FROM categories c
WHERE EXISTS (
SELECT 1 FROM products p
WHERE p.category_id = c.id AND p.stock > 0
)
ORDER BY name;
NOT EXISTS: categories with no product in stock
SELECT name
FROM categories c
WHERE NOT EXISTS (
SELECT 1 FROM products p
WHERE p.category_id = c.id AND p.stock > 0
)
ORDER BY name;