DevPath · Learn to code ESPTEN

Performance, transactions and NoSQL

Indexes: speeding up searches

The problem: the full table scan

When you run a query with a filter, by default the database does a full table scan: it goes through all the rows one by one to check which ones meet the condition.

SELECT * FROM orders WHERE customer_id = 11;

With 10 rows you won't notice. With 10 million rows, scanning them all on every query is unfeasible. The cost grows linearly: O(n).

The solution: an index

An index is an auxiliary, sorted data structure that the database maintains separately from the table. It works like the alphabetical index of a book: instead of reading the whole book looking for a word, you go to the (sorted) index and jump straight to the page.

Internally, most engines (SQLite, PostgreSQL, MySQL) use a B+ tree (B-tree): a balanced tree where each level divides the search space. Looking up a value costs O(log n): with 10 million rows, instead of 10,000,000 comparisons, ~24 are enough.

Rows (n) Scan O(n) Index O(log n)
1,000 1,000 ~10
1,000,000 1,000,000 ~20
1,000,000,000 1,000,000,000 ~30

Create an index

CREATE INDEX idx_orders_customer ON orders (customer_id);

From that moment on, queries that filter (WHERE), sort (ORDER BY) or group by customer_id can use the index. The primary key (PRIMARY KEY) is indexed automatically; that's why searching by id is already fast without doing anything.

An index can span several columns (composite index). The order matters: idx (country, status) works to filter by country or by country AND status, but not to filter only by status.

CREATE INDEX idx_orders_country_status ON orders (country, status);

When should you create an index?

Create indexes on the columns that appear frequently in:

The cost: they aren't free

Indexes aren't free magic. They have a price:

Rule of thumb: index what you query a lot, not everything. An unused index is pure cost with no benefit.

See what the planner does: EXPLAIN QUERY PLAN

To find out whether a query uses an index or scans the table, prepend EXPLAIN QUERY PLAN (SQLite syntax; in PostgreSQL/MySQL it's EXPLAIN):

EXPLAIN QUERY PLAN
SELECT * FROM orders WHERE customer_id = 11;

It's the #1 tool for diagnosing slow queries: first measure, don't guess.

Examples

Create an index and query using it

CREATE INDEX idx_orders_customer ON orders (customer_id);

SELECT id, total
FROM orders
WHERE customer_id = 11
ORDER BY id;

Diagnose the execution plan

EXPLAIN QUERY PLAN
SELECT * FROM orders WHERE customer_id = 11;
Put this into practice

DevPath is a hands-on course: you read the theory here; in the app you put it into practice with exercises that really run, offline.

Start free in the app →
Transactions and ACID →