Why does NoSQL exist?
Relational databases (SQL) are excellent for structured data
with clear relationships and strong ACID guarantees. But when an application grows
to a huge scale (millions of users, very heterogeneous data or data with
a changing structure), the rigid model of tables and JOINs can become
a bottleneck.
NoSQL ("not only SQL") groups databases that give up part of the
relational model (fixed schema, JOIN, strict ACID) in exchange for
flexibility and horizontal scalability. It's not "better" than SQL: it's a
different tool for different problems.
The four NoSQL families
1. Documents (MongoDB, CouchDB)
They store documents like JSON, with a flexible structure. Each document can have different fields. Ideal when the data is nested or the schema evolves.
// A document in MongoDB
{
_id: "u123",
name: "Ada",
addresses: [ { city: "Madrid" }, { city: "London" } ]
}
2. Key-value (Redis, DynamoDB)
The simplest model: a giant dictionary key → value. Ultra-fast access
by key. Perfect for caches, sessions and counters.
SET session:ab12 "{user: 'Ada'}"
GET session:ab12
3. Graphs (Neo4j)
They model nodes and relationships as first-class citizens. They shine when what matters are the connections: social networks ("friends of friends"), recommendations, fraud detection.
4. Columnar / column family (Cassandra, HBase)
They store the data by columns instead of by rows. Optimized for massive writes and analytical queries over enormous sets (big data).
SQL vs NoSQL: when to use each
| SQL (relational) | NoSQL | |
|---|---|---|
| Schema | fixed, defined in advance | flexible or schemaless |
| Relationships | native JOIN |
denormalization / nested |
| Guarantees | strong ACID | often eventual consistency |
| Scaling | vertical (a more powerful machine) | horizontal (more machines) |
| When | related data, critical integrity (banking, ERP) | massive scale, flexible data, high availability |
The CAP theorem
The CAP theorem (Brewer) describes a fundamental limit of distributed systems (data spread across several nodes). In the face of a network failure, you can only guarantee two of these three properties:
- C — Consistency. Every read returns the most recent data; all nodes see the same thing.
- A — Availability. Every request receives a response (without errors), even if it's not the newest data.
- P — Partition tolerance. The system keeps working even if the network between nodes is cut.
The key: in a real distributed system, network partitions happen, so P is mandatory. The real choice is between CP and AP:
- CP (consistency + partition tolerance): in the face of a cut, it prefers to reject requests rather than return old data. E.g.: banks.
- AP (availability + partition tolerance): in the face of a cut, it responds anyway even if the data may be somewhat outdated (eventual consistency). E.g.: social media feeds, shopping carts.
CAP doesn't say "freely choose 2 of 3": in practice P is imposed, and you decide between consistency and availability when there is a partition.
Scaling: replication and sharding
To handle more load and data there are two complementary strategies:
Replication
Keep copies of the same data on several servers.
- Improves availability (if a node goes down, another responds) and reads (you spread queries across replicas).
- Common primary-replica pattern: writes go to the primary and are propagated to the replicas (which serve reads).
Sharding (horizontal partitioning)
Split the data across several servers: each shard stores a subset of the rows (e.g. users A–M in shard 1, N–Z in shard 2), according to a partition key (shard key).
- Improves writes and capacity: no single machine stores everything.
- More complex: queries that cross shards and keeping the balance between them are hard. Choosing the shard key well is critical.
Summary: replication copies the same data (high availability and reads); sharding splits different data (scales writes and volume). At large scale they are combined: each shard is, in addition, replicated.
Examples
JSON document (MongoDB style)
// In document NoSQL there are no fixed tables:
const user = {
_id: "u42",
name: "Grace",
orders: [
{ id: 1, total: 120 },
{ id: 8, total: 150 },
],
};
console.log(user.orders.length); // 2