Why normalize?
Normalization is the process of organizing columns and tables to eliminate redundancy (repeated data) and avoid anomalies:
- Insertion anomaly: you can't store one piece of data without inventing another. If a product's category lives in the products table, you can't register a new category until a product of that category exists.
- Update anomaly: if a piece of data is repeated across many rows, when you change it you have to update them all; if you miss one, they become inconsistent.
- Deletion anomaly: when deleting a row you lose information you didn't want to lose (deleting a customer's last order would also delete their data).
You advance through cumulative normal forms: to be in 3NF you must first satisfy 1NF and 2NF.
First normal form (1NF)
A table is in 1NF if:
- Each cell contains an atomic value (no lists nor multiple values).
- There are no repeating groups of columns (
phone1,phone2, ...).
| ❌ Not 1NF | ✅ 1NF |
|---|---|
phones = '600, 611' |
one row per phone in a separate table |
Second normal form (2NF)
A table is in 2NF if it is in 1NF and, in addition, no non-key attribute depends only on part of a composite primary key (there are no partial dependencies).
It is only relevant when the PK is composite. Example: in
order_line(order_id, product_id, product_name), the
product_name depends only on product_id, not on the full key →
it violates 2NF. The solution: move product_name to the products table.
Third normal form (3NF)
A table is in 3NF if it is in 2NF and no non-key attribute depends on another non-key attribute (there are no transitive dependencies).
Example that violates 3NF:
| order_id | customer_id | customer_city |
|---|
Here customer_city depends on customer_id (not on the PK order_id).
If Ana changes city you have to update all her orders. The solution is
to separate: the city lives in the customers table, and orders only stores
customer_id.
In short, 3NF = "every non-key attribute depends on the key, on the whole key and on nothing but the key". It's the reasonable goal for most application schemas.