What Scalability Really Means
Scalability is about graceful growth
Scalability is the ability of a system to handle increased load — more users, more requests, more data — by adding resources, without a collapse in performance or a rewrite of the architecture. A system is scalable when you can roughly double the capacity by doubling the resources, ideally with near-linear cost.
It is useful to separate three related ideas that interviewers often conflate:
- Performance — how fast a single request is served (latency) when the system is lightly loaded.
- Scalability — how well throughput holds up as load grows. A fast system that falls over at 100 users is not scalable.
- Capacity — the maximum sustainable load before latency degrades past your service-level objective (SLO).
Two directions to grow
There are fundamentally two ways to add capacity: make one machine bigger (vertical scaling, or scaling up) or add more machines (horizontal scaling, or scaling out). Almost every other technique in this module — load balancers, stateless services, read replicas, sharding, caching — exists to make horizontal scaling practical, because a single machine eventually hits a hard physical ceiling.
The interview goal is not to memorize techniques but to reason about where load goes, where it piles up, and how to spread it out.