I’ve had a fair number of conversations with our customers to develop PubNub case studies, and I’ve deduced a common theme embedded in each of their stories. Most of them have first experimented with building out a real-time data stream network infrastructure in-house, and have run into a series of common roadblocks in doing so. This blog post will categorize the issues you may run into when building a real-time data stream infrastructure on your own.
Defining Build vs Buy
Before we dive into this blog post, it’s important to first define what it means to build and what it means to buy.
Build: Building a data stream network on your own means you’re taking an open source protocol, tool, and framework that already exists in an open source community (like Socket.IO, SignalR, etc), then installing and orchestrating the operations of that open source service (into physical or virtualized hardware) while maintaining platform enhancements.
Why you might want to build a WebSocket solution yourself
We love, respect and embrace the spirit of DIY innovation. After all, that’s how PubNub was born. Installing and operating a socket server for the first time is fun and fulfilling. However as scale increases, the DIY solution typically fails due to orchestration challenges.
The purpose of this blog post is not to discourage building a real-time data stream network on your own. Rather, it’s to point out some of the issues you may face if you elect to build it yourself, and share how many developers have gotten past those issues by using a real-time data stream network service provider.
Often one of the first things that might go through your mind when you’re looking into whether to build vs buy is the idea of upfront cost; how much will it cost to build it yourself from scratch versus buying it from a real-time data stream service provider? As you well know, though, it’s not just the initial cost, but the total cost down the road. Here are a few things to consider when estimating those initial and down-the-road costs.
Problems you may face when building it yourself
There’s not much code you have to write to get a real-time system set up other than some ops code (real-time open source protocols and frameworks are pretty streamlined these days). When you start with an open source protocol or framework, you can carry out basic real-time functionality on your socket-type broadcast server. These functions include publish/subscribe, unicast and broadcast messaging.
But building out a fully functional data stream network on your own isn’t as easy or cheap as you may think. There are a number of considerations and costs that need to be taken into account when building it yourself. It’s important to weigh the total cost of ownership for your solution.
- Engineering & Operations Setup Time: Time is money. But an often overlooked cost associated with building a real-time infrastructure out on your own is just that, time. The days you and your development team put in to building infrastructure can be prolific. It’s hard labor, and you’re lifting the shovel and scooping the dirt out.
- Maintenance and Orchestration for WebSocket Servers: Maintaining and orchestrating a data stream network is an entire job in itself. This includes making sure software and versions are always current, compatible and tested with hardware you’re running on. Socket-based services require heavy orchestration behind the scenes at scale, and requires a dedicated individual or team to ensure your network is reliable, secure and up-to-date.
- WebSocket Server Costs: When you begin building a user base and are no longer running your app on your local network, you’ll need a server, so you’ll have to go with companies like Amazon, Softlayer, Joyent, Azure or Rackspace. This means that early in your build process, you’re already spending money on servers. As you scale out your infrastructure, the costs and maintenance requirements need to be scaled out as well, which will also increase server costs.
- Security and WebSocket SSL: Security is paramount no matter what type of real-time app you’re developing. Message, transaction and data streaming information must be kept private from end-to-end. To ensure this, your real-time infrastructure must include various encryption and access control options to manage all stream data.
- Deep Technical Expertise: Building, maintaining and scaling a data stream network is a behemoth job in itself that doesn’t just require time and money, but a deep understanding of real-time architecture as well. And as you add more features and functionality, the laundry list of technical requirements will continue to add up.
Advantages of buying a WebSocket solution on a Global Data Stream Network
If a primary concern of buying a service is upfront cost, it shouldn’t be. Many real-time service providers provide a free sandbox developer tier where you can build and test your real-time app free of charge, for as long as you need to.
When you decide to move your app from the lab and deploy it into production, scale and orchestration need to be taken into consideration. When you use a data stream network service provider, they handle this in all its complexity so you don’t have to. It’s their job to set up and maintain servers, install and update software and versions, upgrade features and add new service capabilities, and perform any and all necessary maintenance.
You may also run into the issue of picking the correct framework or protocol for your exact need. A data stream service provider will oftentimes build a number of SDKs available for whatever platform or device you want your app to run on. These SDKs are MIT open source, and can be modified as needed. They are also monitored and updated continuously.
Reliability and scalability
Reliability and scalability are critical. Reliability of your data stream network is essential for ensuring a consistent, fast real-time user experience. This means that when deploying and scaling apps to thousands of users simultaneously, you need a global presence; data centers must proliferate across different regions of Earth. If you’re targeting a certain region exclusively, one data center is able to handle that kind of user base. Otherwise, it’s important to make sure the network can scale easily when launched.
Redundancy is another key component. In the event that your data centers go down, your data stream network goes down. Data stream network service providers offer globally redundant data stream networks, so if one data center degrades, all real-time data has already been replicated to all other data centers in different regions, ensuring delivery of all messages.