Load Balancing

Load Balancing

Load balancing refers to efficiently distributing incoming network traffic across a group of backend servers, also known as a server farm or server pool.

Modern websites and applications generate lots of traffic and serve numerous client requests simultaneously.

Load balancing helps meet these requests and keeps the website and application response fast and reliable.


Load balancing performs these critical tasks:

  • Manages traffic spikes and prevents spikes on a single server
  • Minimizes user request response time
  • Ensures performance and reliability of computing resources, both physical and virtual
  • Adds redundancy and resilience to computing environments


How does a Load Balancer works

  • A client, such as an application or browser, receives a request and tries to connect with a server.
  • A load balancer receives the request and based on the preset patterns of the algorithm, it routes the request to one of the servers in a server group (or farm).
  • The server receives the connection request and responds to the client via the load balancer.
  • The load balancer receives the response and matches the IP of the client with that of the selected server. It then forwards the packet with the response.
  • Where applicable, the load balancer handles SSL offload, which is the process of decrypting data using the Security Socket Layer encryption protocol, so that servers don’t have to do it.
  • The process repeats until the session is over.

Load Balancing Algorithms


Different load balancing algorithms offer different benefits and complexity, depending on the use case. The most common load balancing algorithms are:

Round Robin

Distributes requests sequentially to the first available server and moves that server to the end of the queue upon completion. The Round Robin algorithm is used for pools of equal servers, but it doesn't consider the load already present on the server.

Least Connections

The Least Connections algorithm involves sending a new request to the least busy server. The least connection method is used when there are many unevenly distributed persistent connections in the server pool.

Least Response Time

Least Response Time load balancing distributes requests to the server with the fewest active connections and with the fastest average response time to a health monitoring request. The response speed indicates how loaded the server is.

Custom Load

The Custom Load algorithm directs the requests to individual servers via SNMP (Simple Network Management Protocol). The administrator defines the server load for the load balancer to take into account when routing the query (e.g., CPU and memory usage, and response time).


The Hash algorithm determines where to distribute requests based on a designated key, such as the client IP address, port number, or the request URL. The Hash method is used for applications that rely on user-specific stored information, for example, carts on e-commerce websites.

Did you find this article valuable?

Support Kushagra Sharma by becoming a sponsor. Any amount is appreciated!