EXISTS and correlated subqueries — Subqueries and CTEs

Correlated subqueries

So far the subquery ran a single time, independently. A correlated subquery is different: it references a column from the outer query, so it is evaluated once for each row of the outside. It is like a loop: for each category, it runs the inner one using that category.

-- Categories whose most expensive product is over 100
SELECT name
FROM categories c
WHERE 100 < (
  SELECT MAX(p.price)
  FROM products p
  WHERE p.category_id = c.id   -- <- reference to the outer row "c"
)
ORDER BY name;

The key is the alias: c is defined outside and used inside. That is what "correlates" both queries.

EXISTS

EXISTS takes a subquery and returns true if that subquery produces at least one row (it doesn't matter which row: only whether it exists or not). It is used almost always correlated, to ask "is there any related row?":

-- Categories that HAVE at least one product
SELECT name
FROM categories c
WHERE EXISTS (
  SELECT 1 FROM products p WHERE p.category_id = c.id
)
ORDER BY name;

By convention you write SELECT 1 inside EXISTS: since only existence matters, the specific columns are irrelevant.

Its negation, NOT EXISTS, finds the opposite: categories without products.

SELECT name
FROM categories c
WHERE NOT EXISTS (
  SELECT 1 FROM products p WHERE p.category_id = c.id
)
ORDER BY name;

EXISTS vs IN: they often give the same result, but EXISTS is usually clearer and more robust when there is correlation or possible NULL, and it can stop as soon as it finds the first match.

Examples

EXISTS: categories with available stock

SELECT name
FROM categories c
WHERE EXISTS (
  SELECT 1 FROM products p
  WHERE p.category_id = c.id AND p.stock > 0
)
ORDER BY name;

NOT EXISTS: categories with no product in stock

SELECT name
FROM categories c
WHERE NOT EXISTS (
  SELECT 1 FROM products p
  WHERE p.category_id = c.id AND p.stock > 0
)
ORDER BY name;