Insights

What is a Data Firehose API?

4 min readJoe Hanson on Nov 14, 2014
Try PubNub Today

Free up to 1MM monthly messages. No credit card required.

Subscribe to our newsletter

By submitting this form, you are agreeing to our Terms and Conditions and Privacy Policy.

This blog post gives an overview of a data firehose API, how it works, use cases, and how to build one yourself from any data source.

Data is always changing. To cope with our ever-changing data, we need a way to continue streaming data as it updates to client applications, dashboards, Internet of Things devices and more. A firehose if you will.

And that’s exactly what a data firehose API does. The firehose API is a steady stream of all available data from a source in real time –  a giant spigot that delivers data to any number of subscribers at a time. The stream is constant, delivering new, updated data as it happens.

The amount of data in the firehose can vary with spikes and lows, but nonetheless, the data continues to flow through the firehose until it is crunched. Once crunched, that data can be visualized, published, graphed; really anything you want to do with it, all in real time.

Data-Firehose-API

Real World Firehose API Examples

The most well known firehose API is the Twitter firehose, an API that delivers 100% of tweets to end users in real time. At an average of 5,700 tweets per second, the Twitter firehose represents the massive volume of data that can stream over a data firehose.

The Twitter firehose in its entirety has only been granted access to a couple companies (GNIP and DataSift to name a couple). However, Twitter also has a Streaming API that streams a sample of all existing tweets, which is still substantial based on the massive scale of Twitter users.

Other than Twitter, there are thousands of other use cases for the data firehose. Pretty much any source of data can be turned into a stream, including:

  • Weather and temperature data
  • Stock quote prices
  • Public transportation time and location data
  • RSS and blog feeds
  • Multiplayer game player position and state
  • Internet of Things sensor network data

Luckily for you, many data sources have built an API for their data source that is open to developers. We’ve seen thousands of web and mobile applications tap into these APIs (for example, Weather Underground’s weather API), however, they don’t stream this data in real time. Updates are pushed sometimes minutes after they actually happen, or these apps are static. And what good is data if it’s outdated?

That’s where firehosing this data comes into play, and we can see why it’s so powerful.

A Historical Look at Data Streaming and Why It Matters

Believe it or not, we haven’t always been able to stream media. HTTP was originally designed to transfer only text. However, in the 1990s, we saw the rise of the layman’s internet, and with it, the demand for data transfer capabilities.

In 1992, the multipart content-type was introduced, and we could now transfer audio, images, and video. But we were still missing something. We couldn’t stream that data. As a result, the Real-time Streaming Protocol was born in 1996. And the rest is history!

Today, users expect not just their apps, but their data, to be real-time. One of the ways we stream and syndicate big data is with the firehose. We now have the ability to stream massive volumes of data in real time, and build in analytics and syndication of those streams. Developers can slice and dice data streams as they want, giving them fine grain control over who receives different streams down to the individual subscriber.

It’s fast, it’s efficient, and it’s powerful.

Syndicating Streams

The ability to syndicate a data firehose is just as powerful is streaming it. Imagine your main data firehose, which then streams to smaller hoses, each to individual subscribers. Those individual subscribers can specify, or developers can control who receives what streams, at any time.

This enables developers to build accurate, robust real-time applications like stock quote applications, real-time statistic dashboards, or content curation platforms. We can now stream any amount of data in real time, to any number of subscribers.

Turning a Data Source Into a Firehose API

This is the fun part for developers. You can essentially take any source of data with an API, and turn it into a data firehose. Step 1 is connect to the PubNub Data Stream Network (you’ll first need to sign up for a PubNub account). Step 2 is retrieve the data from the data API of your choice. And Step 3 is publish the data.

We’ve got a great tutorial on Turning Any Data Source Into a Firehose API. Our tutorial uses Python, but you can use JavaScript, Ruby, or any of our other 50+ SDKs.

Firehose API In Action: Demo Firehose Data Streams

In addition to our tutorial on building your own firehose API, we’ve created a bunch of demo data streams for your consumption. Some of them use real data, and a couple use simulation data (plug in the API of your choice).

We’ve got Twitter, weather, Wikipedia edits, and more, all that can be seen on our firehose API demos page. In the coming months we’ll be rolling out more Data Streams; if you’re interested in having yours listed on our site or would like to give us ideas for what types of live data you’d like to be made publicly available, please drop us a line.

More from PubNub

How to Advance Telehealth and Virtual Care Technologies
News May 2, 20221 min read

How to Advance Telehealth and Virtual Care Technologies

Dr. Joe Kvedar, Chair of the Board for the American Telemedicine Association, joins our COO, Casey Clegg, to discuss why...

PubNub Staff

PubNub Staff

Another Step Towards Data Security: ISO-27001 Implementation
BuildMay 2, 20221 min read

Another Step Towards Data Security: ISO-27001 Implementation

Today, we are glad to announce that we are currently in the process of implementing ISO-27001 security standards.

PubNub Staff

PubNub Staff

Improving Virtual Care in a Post-Pandemic World
News Apr 14, 20221 min read

Improving Virtual Care in a Post-Pandemic World

Meg Barron, VP of Digital Health Strategy at the AMA, talks with PubNub’s COO, Casey Clegg, about what’s next for the healthcare...

PubNub Staff

PubNub Staff