DevPath · Learn to code ESPTEN

Performance, transactions and NoSQL

NoSQL, the CAP theorem and scaling

Why does NoSQL exist?

Relational databases (SQL) are excellent for structured data with clear relationships and strong ACID guarantees. But when an application grows to a huge scale (millions of users, very heterogeneous data or data with a changing structure), the rigid model of tables and JOINs can become a bottleneck.

NoSQL ("not only SQL") groups databases that give up part of the relational model (fixed schema, JOIN, strict ACID) in exchange for flexibility and horizontal scalability. It's not "better" than SQL: it's a different tool for different problems.

The four NoSQL families

1. Documents (MongoDB, CouchDB)

They store documents like JSON, with a flexible structure. Each document can have different fields. Ideal when the data is nested or the schema evolves.

// A document in MongoDB
{
  _id: "u123",
  name: "Ada",
  addresses: [ { city: "Madrid" }, { city: "London" } ]
}

2. Key-value (Redis, DynamoDB)

The simplest model: a giant dictionary key → value. Ultra-fast access by key. Perfect for caches, sessions and counters.

SET session:ab12 "{user: 'Ada'}"
GET session:ab12

3. Graphs (Neo4j)

They model nodes and relationships as first-class citizens. They shine when what matters are the connections: social networks ("friends of friends"), recommendations, fraud detection.

4. Columnar / column family (Cassandra, HBase)

They store the data by columns instead of by rows. Optimized for massive writes and analytical queries over enormous sets (big data).

SQL vs NoSQL: when to use each

SQL (relational) NoSQL
Schema fixed, defined in advance flexible or schemaless
Relationships native JOIN denormalization / nested
Guarantees strong ACID often eventual consistency
Scaling vertical (a more powerful machine) horizontal (more machines)
When related data, critical integrity (banking, ERP) massive scale, flexible data, high availability

The CAP theorem

The CAP theorem (Brewer) describes a fundamental limit of distributed systems (data spread across several nodes). In the face of a network failure, you can only guarantee two of these three properties:

The key: in a real distributed system, network partitions happen, so P is mandatory. The real choice is between CP and AP:

CAP doesn't say "freely choose 2 of 3": in practice P is imposed, and you decide between consistency and availability when there is a partition.

Scaling: replication and sharding

To handle more load and data there are two complementary strategies:

Replication

Keep copies of the same data on several servers.

Sharding (horizontal partitioning)

Split the data across several servers: each shard stores a subset of the rows (e.g. users A–M in shard 1, N–Z in shard 2), according to a partition key (shard key).

Summary: replication copies the same data (high availability and reads); sharding splits different data (scales writes and volume). At large scale they are combined: each shard is, in addition, replicated.

Examples

JSON document (MongoDB style)

// In document NoSQL there are no fixed tables:
const user = {
  _id: "u42",
  name: "Grace",
  orders: [
    { id: 1, total: 120 },
    { id: 8, total: 150 },
  ],
};
console.log(user.orders.length); // 2
Put this into practice

DevPath is a hands-on course: you read the theory here; in the app you put it into practice with exercises that really run, offline.

Start free in the app →
← Transactions and ACIDView the module →