Understanding Scaling in System Design
At its core, scaling is about increasing your system’s capacity so it can handle more traffic without slowing down or breaking.
A simple way to picture this is a car wash station on a busy day. There’s a long line of cars, but only one person washing them. You have two obvious ways to improve things:
- Make that one person faster – give them better tools or machines so they can wash cars more quickly.
- Add more people and bays – open more washing stations and have several people working at the same time.
In system design, these ideas become three main strategies: vertical, horizontal, and diagonal scaling.
1. Vertical Scaling (Scaling Up)
Vertical scaling means increasing the capacity of a single machine. For an application server, that usually looks like:
- Upgrading the CPU
- Adding more RAM
- Moving to faster disks (for example, NVMe SSDs)
Pros
- Simple to implement – You usually don’t need to change your application code or architecture.
- Low latency inside the box – Everything runs on one machine, so there’s no network hop between services on different nodes.
Cons
- Hard limit – Every machine has a maximum size. Eventually you hit the point where you can’t scale up any further.
- Single point of failure – If that one powerful server goes down, your entire system goes offline.
Vertical scaling is often enough for smaller apps, internal tools, or early-stage products.
2. Horizontal Scaling (Scaling Out)
Horizontal scaling means adding more machines instead of making a single one bigger. Instead of one huge server, you might run many smaller ones behind a load balancer.
Pros
- High availability – If one node fails, others can keep serving traffic.
- Better long‑term scalability – You can keep adding nodes as your user base grows.
Cons
- More moving parts – You need a load balancer, health checks, and a strategy for keeping data consistent across nodes.
- Network overhead – Services now talk over the network, which adds a bit of latency compared to in‑process calls.
Horizontal scaling is the backbone of most modern, high‑traffic systems.
3. Diagonal Scaling: The Hybrid Approach
Diagonal scaling combines both approaches. You:
- Scale each node up to a cost‑effective size (enough CPU/RAM to handle your baseline traffic).
- Scale out horizontally by adding more of those “right‑sized” nodes when load increases.
In cloud environments (AWS, Azure, GCP), this often looks like:
- Picking an instance type that balances cost and performance for your normal load.
- Using auto‑scaling groups to add or remove instances based on CPU, requests per second, or custom metrics.
Why use diagonal scaling?
It gives you a good balance between:
- Cost‑efficiency – You avoid paying for extremely overpowered single machines.
- Operational simplicity – You don’t manage hundreds of tiny instances if you don’t need to.
- Resilience – You still get the fault tolerance and flexibility of horizontal scaling.
Summary Table
Conclusion
The “right” scaling strategy depends on your stage and requirements:
- For smaller apps or internal tools, vertical scaling is usually the fastest, simplest option.
- For production systems with real traffic and strict uptime requirements, horizontal or diagonal scaling is almost always required.
In the kinds of high‑throughput microservice architectures I build with Golang, diagonal scaling tends to be the sweet spot: each service instance is sized sensibly, and the platform can spin up more copies when traffic spikes—keeping the system both responsive and resilient.