NoSQL, the CAP theorem and scaling — Performance, transactions and NoSQL

Why does NoSQL exist?

Relational databases (SQL) are excellent for structured data with clear relationships and strong ACID guarantees. But when an application grows to a huge scale (millions of users, very heterogeneous data or data with a changing structure), the rigid model of tables and JOINs can become a bottleneck.

NoSQL ("not only SQL") groups databases that give up part of the relational model (fixed schema, JOIN, strict ACID) in exchange for flexibility and horizontal scalability. It's not "better" than SQL: it's a different tool for different problems.

The four NoSQL families

1. Documents (MongoDB, CouchDB)

They store documents like JSON, with a flexible structure. Each document can have different fields. Ideal when the data is nested or the schema evolves.

// A document in MongoDB
{
  _id: "u123",
  name: "Ada",
  addresses: [ { city: "Madrid" }, { city: "London" } ]
}

2. Key-value (Redis, DynamoDB)

The simplest model: a giant dictionary key → value. Ultra-fast access by key. Perfect for caches, sessions and counters.

SET session:ab12 "{user: 'Ada'}"
GET session:ab12

3. Graphs (Neo4j)

They model nodes and relationships as first-class citizens. They shine when what matters are the connections: social networks ("friends of friends"), recommendations, fraud detection.

4. Columnar / column family (Cassandra, HBase)

They store the data by columns instead of by rows. Optimized for massive writes and analytical queries over enormous sets (big data).

SQL vs NoSQL: when to use each

	SQL (relational)	NoSQL
Schema	fixed, defined in advance	flexible or schemaless
Relationships	native `JOIN`	denormalization / nested
Guarantees	strong ACID	often eventual consistency
Scaling	vertical (a more powerful machine)	horizontal (more machines)
When	related data, critical integrity (banking, ERP)	massive scale, flexible data, high availability

The CAP theorem

The CAP theorem (Brewer) describes a fundamental limit of distributed systems (data spread across several nodes). In the face of a network failure, you can only guarantee two of these three properties:

C — Consistency. Every read returns the most recent data; all nodes see the same thing.
A — Availability. Every request receives a response (without errors), even if it's not the newest data.
P — Partition tolerance. The system keeps working even if the network between nodes is cut.

The key: in a real distributed system, network partitions happen, so P is mandatory. The real choice is between CP and AP:

CP (consistency + partition tolerance): in the face of a cut, it prefers to reject requests rather than return old data. E.g.: banks.
AP (availability + partition tolerance): in the face of a cut, it responds anyway even if the data may be somewhat outdated (eventual consistency). E.g.: social media feeds, shopping carts.

CAP doesn't say "freely choose 2 of 3": in practice P is imposed, and you decide between consistency and availability when there is a partition.

Scaling: replication and sharding

To handle more load and data there are two complementary strategies:

Replication

Keep copies of the same data on several servers.

Improves availability (if a node goes down, another responds) and reads (you spread queries across replicas).
Common primary-replica pattern: writes go to the primary and are propagated to the replicas (which serve reads).

Sharding (horizontal partitioning)

Split the data across several servers: each shard stores a subset of the rows (e.g. users A–M in shard 1, N–Z in shard 2), according to a partition key (shard key).

Improves writes and capacity: no single machine stores everything.
More complex: queries that cross shards and keeping the balance between them are hard. Choosing the shard key well is critical.

Summary: replication copies the same data (high availability and reads); sharding splits different data (scales writes and volume). At large scale they are combined: each shard is, in addition, replicated.

Examples

JSON document (MongoDB style)

// In document NoSQL there are no fixed tables:
const user = {
  _id: "u42",
  name: "Grace",
  orders: [
    { id: 1, total: 120 },
    { id: 8, total: 150 },
  ],
};
console.log(user.orders.length); // 2