Building a Server Health Monitoring Tool with PubNub
This is a guest post from Brandon Scott, a software and web developer who currently attends Bournemouth University in Bournemouth, England. For our project, we built a comprehensive server health monitoring suite with a backend service (Cadence), and a client side app (Pulse). We didn’t want to just check if the server was alive, but also to get useful statistics for reporting, specifically to check if the system was under heavy load. Most importantly, this system had to be extensible and platform-agnostic so it was accessible to as many people as possible on an array of devices. We also wanted real-time reporting, as there’s little point having a server health monitoring tool that tells users a server is under abnormally heavy load or that it’s offline five minutes after it’s occurred. Reporting had to be instant. The entire Cadence project is open source, and all the code is available at the bottom of this blog post.
We designed a three pronged system called Cadence, centered around a primarily extensible, RESTful API. Cadence acted as the backend service offering up information to client requests as needed, acting as an intermediary between server and the user. The API was joined with two other server side applications, a Server Host and the Availability Service. The host is installed on each server that is to be monitored and reports directly to the API via a ‘Pulse’ mechanism at a regular, pre-defined period. The availability service runs on a dedicated server and monitors all connected Cadence Server Hosts. This is where PubNub came in. During our research period in March, we were looking for technologies that provide the type of real time update that Cadence needed. After implementing a few of these services, such as Azure Service Buses, we found that they just didn’t fit well with our product, the interoperability between languages was virtually nonexistent with .NET showing a response as one thing and Java another. The primary focus, again, was extensibility. I remembered back to a local hack day I went to at the end of 2013 (HackBMTH), where my friend and University colleague, Chris Franklin were playing around with a service called PubNub.
Integrating PubNub into Cadence
PubNub was perfect for Cadence. It uses persistent socket connections to stream data to subscribed channels. It’s slick and with the company’s extensive support, it was simple to integrate into our product. There were libraries for a vast array of languages including Python and C# which allowed us to provide monitoring data for OS X, Linux and Windows.
Moreover, we had access to additional features like Presence. Presence formed the basis of notifying the API if a server had gone offline by allowing detection of subscriptions to all channels to check if the server joined, gracefully disconnected or timed out (immediate shut down or power outage). The availability service listened for these time-out events and notified users that were subscribed to SMS or voice call notifications using Twilio. The features from these two awesome services transformed Cadence from an on-demand, interaction-driven application into a complete, proactive solution that alerts users to problems as they happen.
The client side app that surfaced Cadence data is called Pulse. For the project, and as of writing, Pulse is both an Android OS client and available as a website. Pulse works on an opt-in user subscription model, users can choose via the web panel what server groups to subscribe to, and they’ll show up on the Android app. The user’s subscriptions are managed via Cadence authentication which supplies the client with an identifier for future API calls to other resource controllers. The Pulse app also allows users to view usage history, showing how statistics have changed over the past hour, day or week. This gives users the ability to review the history of their servers when they are not actively monitoring the application. The UI/UX for Pulse was really important to us, want to convey the information to the user in the most understandable and intuitive way. Each server screen displays three primary bubbles showing main server statistics which are color coded and updated in real time so users can see at a glance what the server load is. PubNub provided a foundation for Pulse in that it is incredibly responsive. The interface updates in less than a second from when the Server Host reports data back to the Cadence API giving users a really fluid and interactive experience.
Extensibility & Accessibility
As I’ve said a couple of times, extensibility was a really important characteristic of the Pulse and Cadence solution. Why? It heavily supports accessibility for all platforms, for all users on all devices. The API can be accessed from any client service that can send a HTTP request and parse a JSON response. Attributes can be added to Cadence as time goes on to add statistics that might be more relevant to one user base than another.
Cadence is an open source project that provides the user the ability to report, store and expose their data on their terms. The user owns their data, not a third party service. It’s important that we take care of our own privacy and that sensitive data isn’t subject to marketing algorithms . All of the code we have developed has been released as open source, with all the GitHub repos being made public at the end of the project:
- Cadence API
- Cadence Server Host (C#)
- Cadence Server Host (Python)
- Cadence Availability Service
- Pulse (Android App)
- Pulse (Web Panel)
About the project
As part of a University module, entitled ‘Project Management & Teamworking’, students worked in groups of seven to plan, design and implement a server monitoring application over a ten day period. The focus of this exercise was to learn about the challenges, issues and obstacles involved in working in a team and how some methodologies, such as Scrum, can prove invaluable. Despite being one of the more theoretical modules on our Software Engineering course, we had full creative control over what products we made with a basic example being a simple server/client ‘ping’ application.