Scaling

Scaling is the process of handling the large amount of traffic on servers. There are two major ways to handle scaling:

Vertical Scaling
Horizontal Scaling

You can do scaling in any way to handle the traffic, but knowing when to use which scaling saves cost, efficiency (speed), and time.

Example

In normal days, Amazon has normal visitor rates and it can be handled using the existing system.
Difficulty arises whenever there is a sale or some event on the website and a lot of traffic comes, and the normal server configuration cannot handle it.

Options you have:

Let the site be down (no sales, customers disappointed, site down).
Increase the server compute power, allocate more resources before the sales day.
Increase the number of small servers and allocate chunks of users to the servers, balancing the server traffic.

The second option is called Vertical Scaling.
The third option is called Horizontal Scaling.

Vertical Scaling

Vertical Scaling means to increase the compute power, allocated resources, RAM, or storage for existing servers/VMs.

When to use it?

When traffic is not exponential
- Just a bit more than usual → better to vertically scale by upgrading your VM specs.
When using monolithic architecture
- All frontend + backend + modules in a single repo → logical to scale up a single instance.
When you have SQL databases
- SQL DBs are hard to split across multiple instances. Scaling vertically avoids latency.
Short-term solution
- Quick way to handle load without changing architecture.

Pros

Easy to implement (just upgrade hardware).
No changes needed in application code.

Cons

Expensive at high scale.
Limited by maximum hardware capacity.
Single point of failure (if server goes down → everything stops).

Horizontal Scaling

Horizontal scaling means to add more servers to handle increased traffic.

Example:

You have 1,00,000 customers hitting the system.
Instead of one big server, you create 10 small servers.
Each server handles ~10,000 users.
A load balancer (NGINX) maps requests to servers.

When to use it?

When traffic is exponential.
When your DB is NoSQL (easy to scale horizontally).
When your product uses Microservices Architecture.
More cost-effective compared to vertical scaling.

Pros

Virtually unlimited scaling.
Fault-tolerant & high availability.
Can be automated with cloud services (AWS Auto Scaling, GCP, Azure).

Cons

More complex (needs load balancers, distributed DB, synchronization).
Application may need redesign for distributed architecture.

Techniques for Horizontal Scaling

1. Sharding

Split the server/database into multiple parts.
Example:
- Single DB → Split into 5 smaller DB instances.
- Each instance handles part of the traffic.
- Works well with NoSQL + Microservices.

2. Replication

Create replicas (copies) of the main server.
If the main server is overloaded, replica servers handle requests.
Replicas may be slightly behind, but prevent downtime.
Updates can be synced with cron jobs or replication services.

3. Sharding + Replication

Combine both approaches.
Shard the data, and replicate each shard → best performance & reliability.

4. Load Balancing

Use a Load Balancer (like NGINX).
Distributes requests across servers based on load.
Ensures no single server is overloaded.

Example:

1,00,000 customers.
10 servers → each handles ~10,000 customers.
Load balancer uses a hash function or round-robin logic to distribute traffic.

Combine Both Approaches

In real-world systems, both Vertical + Horizontal Scaling are combined.

Vertical scaling gives quick upgrades.
Horizontal scaling gives long-term flexibility and high availability.

Scaling in Serverless Architecture

Scaling

Example

Vertical Scaling

When to use it?

Pros

Cons

Horizontal Scaling

When to use it?

Pros

Cons

Techniques for Horizontal Scaling

1. Sharding

2. Replication

3. Sharding + Replication

4. Load Balancing

Combine Both Approaches

Conclusion & Summary

Select Theme