The Art of Scaling: From Car Washes to Microservices

Understanding Scaling in System Design

At its core, scaling is about increasing your system’s capacity so it can handle more traffic without slowing down or breaking.

A simple way to picture this is a car wash station on a busy day. There’s a long line of cars, but only one person washing them. You have two obvious ways to improve things:

Make that one person faster – give them better tools or machines so they can wash cars more quickly.
Add more people and bays – open more washing stations and have several people working at the same time.

In system design, these ideas become three main strategies: vertical, horizontal, and diagonal scaling.

1. Vertical Scaling (Scaling Up)

Vertical scaling means increasing the capacity of a single machine. For an application server, that usually looks like:

Upgrading the CPU
Adding more RAM
Moving to faster disks (for example, NVMe SSDs)

Pros

Simple to implement – You usually don’t need to change your application code or architecture.
Low latency inside the box – Everything runs on one machine, so there’s no network hop between services on different nodes.

Cons

Hard limit – Every machine has a maximum size. Eventually you hit the point where you can’t scale up any further.
Single point of failure – If that one powerful server goes down, your entire system goes offline.

Vertical scaling is often enough for smaller apps, internal tools, or early-stage products.

2. Horizontal Scaling (Scaling Out)

Horizontal scaling means adding more machines instead of making a single one bigger. Instead of one huge server, you might run many smaller ones behind a load balancer.

Pros

High availability – If one node fails, others can keep serving traffic.
Better long‑term scalability – You can keep adding nodes as your user base grows.

Cons

More moving parts – You need a load balancer, health checks, and a strategy for keeping data consistent across nodes.
Network overhead – Services now talk over the network, which adds a bit of latency compared to in‑process calls.

Horizontal scaling is the backbone of most modern, high‑traffic systems.

3. Diagonal Scaling: The Hybrid Approach

Diagonal scaling combines both approaches. You:

Scale each node up to a cost‑effective size (enough CPU/RAM to handle your baseline traffic).
Scale out horizontally by adding more of those “right‑sized” nodes when load increases.

In cloud environments (AWS, Azure, GCP), this often looks like:

Picking an instance type that balances cost and performance for your normal load.
Using auto‑scaling groups to add or remove instances based on CPU, requests per second, or custom metrics.

Why use diagonal scaling?

It gives you a good balance between:

Cost‑efficiency – You avoid paying for extremely overpowered single machines.
Operational simplicity – You don’t manage hundreds of tiny instances if you don’t need to.
Resilience – You still get the fault tolerance and flexibility of horizontal scaling.

Summary Table

Conclusion

The “right” scaling strategy depends on your stage and requirements:

For smaller apps or internal tools, vertical scaling is usually the fastest, simplest option.
For production systems with real traffic and strict uptime requirements, horizontal or diagonal scaling is almost always required.

In the kinds of high‑throughput microservice architectures I build with Golang, diagonal scaling tends to be the sweet spot: each service instance is sized sensibly, and the platform can spin up more copies when traffic spikes—keeping the system both responsive and resilient.