Scaling
Scaling is the process of handling the large amount of traffic on servers.
There are two major ways to handle scaling:
- Vertical Scaling
- Horizontal Scaling
You can do scaling in any way to handle the traffic, but knowing when to use which scaling saves cost, efficiency (speed), and time.
Example
- In normal days, Amazon has normal visitor rates and it can be handled using the existing system.
- Difficulty arises whenever there is a sale or some event on the website and a lot of traffic comes, and the normal server configuration cannot handle it.
Options you have:
- Let the site be down (no sales, customers disappointed, site down).
- Increase the server compute power, allocate more resources before the sales day.
- Increase the number of small servers and allocate chunks of users to the servers, balancing the server traffic.
- The second option is called Vertical Scaling.
- The third option is called Horizontal Scaling.
Vertical Scaling
Vertical Scaling means to increase the compute power, allocated resources, RAM, or storage for existing servers/VMs.
When to use it?
- When traffic is not exponential
- Just a bit more than usual → better to vertically scale by upgrading your VM specs.
- When using monolithic architecture
- All frontend + backend + modules in a single repo → logical to scale up a single instance.
- When you have SQL databases
- SQL DBs are hard to split across multiple instances. Scaling vertically avoids latency.
- Short-term solution
- Quick way to handle load without changing architecture.
Pros
- Easy to implement (just upgrade hardware).
- No changes needed in application code.
Cons
- Expensive at high scale.
- Limited by maximum hardware capacity.
- Single point of failure (if server goes down → everything stops).
Horizontal Scaling
Horizontal scaling means to add more servers to handle increased traffic.
Example:
- You have 1,00,000 customers hitting the system.
- Instead of one big server, you create 10 small servers.
- Each server handles ~10,000 users.
- A load balancer (NGINX) maps requests to servers.
When to use it?
- When traffic is exponential.
- When your DB is NoSQL (easy to scale horizontally).
- When your product uses Microservices Architecture.
- More cost-effective compared to vertical scaling.
Pros
- Virtually unlimited scaling.
- Fault-tolerant & high availability.
- Can be automated with cloud services (AWS Auto Scaling, GCP, Azure).
Cons
- More complex (needs load balancers, distributed DB, synchronization).
- Application may need redesign for distributed architecture.
Techniques for Horizontal Scaling
1. Sharding
- Split the server/database into multiple parts.
- Example:
- Single DB → Split into 5 smaller DB instances.
- Each instance handles part of the traffic.
- Works well with NoSQL + Microservices.
2. Replication
- Create replicas (copies) of the main server.
- If the main server is overloaded, replica servers handle requests.
- Replicas may be slightly behind, but prevent downtime.
- Updates can be synced with cron jobs or replication services.
3. Sharding + Replication
- Combine both approaches.
- Shard the data, and replicate each shard → best performance & reliability.
4. Load Balancing
- Use a Load Balancer (like NGINX).
- Distributes requests across servers based on load.
- Ensures no single server is overloaded.
Example:
- 1,00,000 customers.
- 10 servers → each handles ~10,000 customers.
- Load balancer uses a hash function or round-robin logic to distribute traffic.
Combine Both Approaches
In real-world systems, both Vertical + Horizontal Scaling are combined.
- Vertical scaling gives quick upgrades.
- Horizontal scaling gives long-term flexibility and high availability.