How to Utilize Scalability and Load Balancing
2022 saw a reported 6.4 billion smartphone users worldwide, and that number is increasing daily. To meet usage growth, developers must ensure their applications can handle increasing workloads while continuing to deliver unmatched user experiences.
Scalable applications allow developers to prepare in advance for any workload spikes. Choosing when and how to scale is essential to avoid wasting resources.
Before we get into the intricacies of scalable applications, let’s look at what scalability means in application development and why it's important to prioritize scaling.
What is scalability?
Scalability is your system’s ability to efficiently handle increasing or decreasing numbers of users, transactions, or data volume. Building scalable applications offers both technical and business benefits:
Save time and resources by incorporating functions that adjust to application capacity in real time.
Establish yourself as a reliable solutions provider by ensuring minimal downtime during demand spikes.
Reduce costs by paying only for the resources you use.
Types of scaling
Developers can choose from two scaling methods: vertical and horizontal scaling.
Vertical scaling (or scaling up) increases the capacity of a single server or resource to handle expanding workloads—for example, upgrading the server's CPU for more processing power. Commonly used by small to midsize dev teams, vertical scaling offers benefits that include:
An easy-to-implement upgrade
Reduced costs since you’re only upgrading specific components
No added responsibilities that come with installing a completely new system
But vertical scaling also brings with it certain limitations:
Scalability is limited by how much processing power your existing system can support.
Relying solely on vertical scaling can create a single point of failure. This means you can risk longer downtimes and permanent data loss.
You may end up paying for unused resources when traffic drops.
Horizontal scaling (or scaling out) adds more servers or nodes to the existing system to meet increasing demands—for example, adding a database to a database cluster. Combining the power of multiple machines lets dev teams:
Scale up or down seamlessly without worrying about downtime.
Manage costs more efficiently because you only pay for what you’re using.
Improve fault tolerance because other systems remain active even if one goes down.
But horizontal scaling also offers some complications:
More servers and nodes mean complex day-to-day server management.
Increased risk of attack because more servers equal a larger attack surface.
The installation process, which includes load balancers and virtual instances, is more complex.
Choosing which scaling approach is right for you depends on your current and future business needs. Most dev teams prefer horizontal scaling because it offers unlimited scaling potential and reliability even during unexpected traffic spikes. However, you have to distribute the workload equally among your servers—which is where tools like load balancers come into play.
What is load balancing?
Load balancing is a performance optimization method that distributes incoming users, transactions, and data across your server pool. With load balancing, you can:
Check if servers can handle the load before pushing traffic their way.
Run an on-demand infrastructure that automatically adds or removes servers.
Securely route traffic from the client to the server while securing data from external attacks.
Types of load balancing
There are four main categories of load balancing depending on the data type:
Application load balancing checks incoming requests to identify which server is best suited to handle the request.
Network load balancing stores IP addresses and other network information to track incoming traffic sources. This helps reduce latency when routing to the optimal server.
Global server load balancing manages server loads in multiple data centers and third-party cloud providers around the globe.
Domain Name System (DNS) load balancing ensures high availability by balancing network requests across resources on your domain (website, email system, etc.).
Load balancers identify in real time which servers can handle the client request while ensuring no single server is overwhelmed. Servers can be on-premises, in a data center, or in a virtual instance.
Development teams use load balancers to manage the flow of information between the servers and an endpoint (PC, laptop, mobile, etc.). Without a load balancer, traffic may be unintentionally routed to a single server, overwhelming the system and leaving your application unresponsive.
There are two types of load balancers:
Hardware load balancers are a piece of hardware that you can add to your data centers to securely route traffic to different servers.
Software load balancers are software applications that you can install in your data centers to perform all load balancing functions.
Handling traffic spikes and application growth with load balancers
Traffic spikes are sudden surges in incoming requests, users, or data in an application. Several factors can cause traffic spikes, including seasonal events, marketing campaigns, and product launches. If your system can’t handle traffic spikes, it creates negative user experiences that directly correspond to drop-offs, missed sales, conversions, and engagement.
Load balancers act as the middleman enabling effective workload distribution among servers. During heavy workloads, load balancers route traffic to servers with fewer connections or dynamically add servers in real time. When workloads are lower, it drops servers to reduce overall costs.
Load balancing algorithms
Load balancers use a predefined logic called a load balancing algorithm to redirect traffic between servers. Load balancing algorithms follow two distinct approaches:
Dynamic load balancing algorithms distribute the workload based on the current traffic volume. Workload is sent to servers with the least connections, response time, or resources used.
Static load balancing algorithms send equal traffic to each server in a predefined order or at random. You can prioritize specific servers over others based on their capacity or follow a simple round robin logic.
Case study: Pokemon Go and load balancing
The company initially used Google's Network Load Balancer (NLB) for load balancing but encountered issues like slow Secure Sockets Layer (SSL) termination. Eventually, migrating to Google Cloud Load Balancing (GCLB) became a priority after a SYN flood attack. However, during the migration, Niantic had to adjust for traffic surges again as client demand was 200% higher than observed, leading to a cascading failure scenario.
Google's Traffic Site Reliability Experts (SREs) took steps to resolve the issue by isolating Google Front Ends (GFEs) and limiting traffic. Both companies made system changes and emphasized measuring workload as close to clients as possible for future scalability.
Better experiences with smarter scaling
Scalability ensures optimal application performance while adapting to changing workloads. Between the two available scaling methods, developers prefer horizontal scaling because it offers unlimited scaling potential without eating into your budget. It uses load balancers and load balancing algorithms to ensure even workload distribution and optimal server performance.
Implementing horizontal scaling capabilities can become complex when you’re adding more servers. It makes sense to partner with a third-party infrastructure provider to manage the nitty-gritty while your development teams focus on application development. PubNub provides an API platform with real-time infrastructure to host your live chats, events, and other real-time applications. With over 15 points of presence worldwide supporting 800 million monthly active users and 99.999% reliability, you’ll never have to worry about outages, concurrency limits, or any latency issues caused by traffic spikes while using PubNub.