When one machine is not enough
There comes a point where your service receives more traffic than a single instance can handle. There are two ways to grow.
Vertical vs. horizontal
- Vertical scaling (scale up): a bigger machine (more CPU, more RAM). Simple, no code changes, but it has a ceiling (there is no infinite machine) and it is a single point of failure: if it goes down, everything goes down.
- Horizontal scaling (scale out): more identical machines in parallel, and you spread the traffic across them. It has no practical ceiling and tolerates failures (if one instance goes down, the rest keep going). In return, it requires designing the application for it.
In modern practice the horizontal approach is preferred, and it is the foundation of the cloud.
The load balancer
A load balancer is the piece that sits in front of your instances and distributes requests among them (round-robin, least connections, etc.). The client talks to a single address; the balancer decides which instance responds. It also queries the health checks and stops sending traffic to instances that do not respond.
The key to horizontal: being stateless
To be able to freely distribute requests across interchangeable instances, each instance must be stateless: it does not keep in its local memory anything the next request needs. Why? Because the balancer can send a user's request 1 to instance A and request 2 to B. If the session lives in A's memory, B does not know it and the user "loses" their session.
The solution: move the shared state outside, to a store that all of them share — the database, Redis for sessions/cache, a file service. The instances become disposable: you can create, destroy or replace any of them without losing anything. That is exactly what enables what comes next.
Autoscaling and load testing
- Autoscaling: the system adds or removes instances automatically based on demand (e.g. "if average CPU exceeds 70 % for 5 min, add one"). You pay for what you use and absorb the spikes without manual intervention. It is only possible if the services are stateless.
- Load testing: before production, you simulate growing traffic to find the breaking point and validate that autoscaling reacts. Better to discover the limit in a rehearsal than at the real launch.
Mental summary: stateless lets you scale horizontally, the balancer spreads the load and autoscaling adjusts the number of instances to demand. And you measure it all with the metrics from the first pillar.
Examples
Round-robin balancing: distribute by turns across instances
function createBalancer(instances) {
let i = 0;
return function next() {
const chosen = instances[i % instances.length];
i++;
return chosen;
};
}
const lb = createBalancer(["A", "B", "C"]);
console.log([lb(), lb(), lb(), lb()].join(" ")); // A B C A