KEY TAKEWAYSIn this guide, we discuss WebSockets and other types of client-server connections. Learn about how they work and the best cases for their application.
In today’s extremely connected and constantly online world, we expect to have information instantly. Think of all the applications that we use to send messages or to receive live, up-to-date notifications in a single day. WebSockets are one of many different tools for building web applications that provide instant, real-time updates and communication.
What are WebSockets used for? The WebSocket Protocol establishes full-duplex, bidirectional communication between a client and server. This two-way flow is unique to WebSocket connections, and it means they can transfer data very quickly and efficiently. While there are many great uses for WebSockets, there are also environments where it will work better to use a different approach, like long polling.
In this guide, we will explain what WebSockets are, and detail some of the benefits of using them for your real-time application. We will go over the best use cases to implement WebSockets, and discuss other options that you may want to use instead. By the end of this piece, you will have a clearer understanding of what WebSockets are used for and whether or not WebSockets will work for your application’s specific needs.
Drawbacks of WebSockets
While WebSockets sound like a fantastic way to approach real-time communications, it’s important to note some significant challenges when using WebSockets for real time communication.
If a connection over WebSockets is lost, there are no included mechanisms for load balancing or for reconnecting.
Many proxy servers still do not offer support for WebSockets.
WebSockets do not support caching, like HTTP.
It is still necessary to have fallback options, like HTTP streaming or long polling, in environments where WebSockets may not be supported.
Open source resources, like Socket.io, are not great for large scale operations or quick growth.
Features like Presence do not work very well over WebSocket connections, because it is hard to detect disconnections.
Background - web servers, HTTP, and polling
To understand the WebSocket API, it is also important to understand the foundation it was built on – HTTP (Hypertext Transfer Protocol) and its request/response model. HTTP is an application layer protocol, and it is the basis for all web-based communication and data transfers.
When using HTTP, clients—such as web browsers—send requests to servers, and then the servers send messages back, known as responses. The web as we know it today was built on this basic client-server cycle, although there have been many additions and updates to HTTP to make it more interactive. There are currently a few viable and supported versions of HTTP—HTTP/1.1 and HTTP/2—and a secure version known as HTTPS.
Basic HTTP requests work well for many use cases, such as when someone needs to search on a web page and receive back relevant, non-time-sensitive information on the subject. However, it is not always best suited for web applications that require real-time communication, or for data that needs to update quickly with minimal latency.
Every time that the client makes a new HTTP server request, the default behavior is to open a new HTTP connection. This is inefficient because it uses bandwidth on recurring non-payload data and increases latency between the data transfers.
Additionally, HTTP requests can only flow in one direction—from the client side. There is traditionally no mechanism for the server to initiate communication with the client. The server is unable to send data to the client unless the client requests it first. This can create issues for use cases where messaging needs to go out in real time from the server side.
One of the first solutions for receiving regular data updates was HTTP polling. Polling is a technique where the client repeatedly sends requests to the server until it responds with an update. As an example—all modern web browsers offer support for XMLHttpRequest, one of the original methods of polling servers.
These earlier solutions were still not ideal for efficient real-time communication—short polling is intensive, because for every request the non-payload data is re-sent and must be parsed, including the header html, the web url, and other repetitive information that wastes resources.
The next logical step to improve latency was HTTP long polling. When long polling, the client polls the server, and that connection remains open until the server has new data. The server sends the response with the relevant information, and then the client immediately opens another request, holding again until the next update. Long polling can hold a connection open for a maximum of 280 seconds before automatically sending another request. This method effectively emulates an HTTP server push.
Long polling provides fast communication in many environments and is widely used, often as opposed to true push-based methods like WebSocket connections or Server Side Events (SSE). Long polling can seem intensive on the server side, as it requires continuous resources to hold a connection open, but it uses much less than repeatedly sending polling requests.
What are WebSockets used for?
WebSockets were invented by developers to effectively facilitate real-time results. WebSockets work by initiating continuous, full-duplex communication between a client and server. This reduces unnecessary network traffic, as data can immediately travel both ways through a single open connection. This provides speed and real-time capability on the web. Websockets also enable servers to keep track of clients and “push” data to them as needed, which was not possible using only HTTP.
WebSocket connections enable streaming of text strings and binary data via messages. WebSocket messages include a frame, payload, and data portion. Very little non-payload data gets sent across the existing network connection this way, which helps to reduce latency and overhead, especially when compared to HTTP request and streaming models.
Google Chrome was the first browser to include standard support for WebSockets in 2009. RFC 6455—The WebSocket Protocol—was officially published online in 2011. The WebSocket Protocol and WebSocket API are standardized by the W3C and the IETF, and support across browsers is very common.
How WebSocket connections work
Before a client and server can exchange data, they must use the TCP (Transport Control Protocol) layer to establish the connection. WebSockets effectively run as a transport layer over the TCP.
Once connected through an HTTP request/response pair, the clients can use an HTTP/1.1 mechanism called an upgrade header to switch their connection from HTTP over to WebSockets. A WebSocket connection is established through a websocket handshake over the TCP. During a new websocket handshake, the client and server also communicate which subprotocol will be used for their subsequent interactions. After this is established, the connection will be running on the WebSocket protocol.
It is important to note that when running on the WebSocket protocol layer, WebSockets require a uniform resource identifier (URI) to use a “ws:” or “wss:” scheme, similar to how HTTP URLs will always use a “http:” or “https:” scheme.
Reasons to consider WebSockets for real-time communication
Websockets provide real-time updates and open lines of communication.
Websockets are HTML5 compliant, and offer backwards compatibility with older html documents. Therefore, they are supported by all modern web browsers—Google Chrome, Mozilla Firefox, Apple Safari, and more.
WebSockets are also compatible across platforms—Android, iOS, web, and desktop apps.
A single server can have multiple WebSocket connections open simultaneously, and can even have multiple connections with the same client, which opens the door for scalability.
WebSockets can stream through many proxies and firewalls.
PubNub’s thoughts on WebSockets vs. Long Polling
PubNub takes a protocol-agnostic stance, but in our current operations we have found that long polling is actually the best bet for most use cases. This is partly because of the maintenance and upkeep required to scale WebSockets, and potential issues that can arise when you can not easily identify a disconnection. WebSockets are a great tool, but long polling works reliably in every situation.
PubNub uses long polling to ensure reliability, security, and scalability in all networking environments, not just most. Long polling can be just as efficient as WebSockets in many real-word, real-time implementations. In fact, we have developed a method for efficient long polling – written in C and with multiple kernel optimizations for scale.
PubNub is a real-time communications platform that provides the foundation for authentic virtual experiences, like live updates, in-app chat, push notifications, and more. The building block structure of our platform allows for extra features like Presence, operational dashboards, or geolocation to be incorporated. PubNub also makes it extremely easy to scale, especially compared to socket frameworks like Socket.io or SocksJS.
To conclude, WebSockets are a very useful protocol for building real-time functionality across web, mobile, and desktop variants, but they are not a one-size-fits-all approach. WebSockets are just one tool that fits into a larger arsenal when developing real-time, communication based applications. It is possible to build off of basic WebSocket protocol and incorporate other methods like SSE or long polling and construct an even better, more scalable real-time application. The problem is that the shortcomings can be difficult to manage if you are not already an expert in building real-time systems.
Using PubNub saves significant development time and maintenance costs, speeding up time to market and reducing the complexity of what your engineering team will need to develop, manage and grow.
If you are interested in using PubNub to build and power your real-time application, please contact our team here.