4 min read
on Nov 14, 2014
This blog post gives an overview of a data firehose API, how it works, use cases, and how to build one yourself from any data source.

Data is always changing. To cope with our ever-changing data, we need a way to continue streaming data as it updates to client applications, dashboards, Internet of Things devices and more. A firehose if you will.

And that’s exactly what a data firehose API does. The firehose API is a steady stream of all available data from a source in real time –  a giant spigot that delivers data to any number of subscribers at a time. The stream is constant, delivering new, updated data as it happens.

The amount of data in the firehose can vary with spikes and lows, but nonetheless, the data continues to flow through the firehose until it is crunched. Once crunched, that data can be visualized, published, graphed; really anything you want to do with it, all in real time.


Real World Firehose API Examples

The most well known firehose API is the Twitter firehose, an API that delivers 100% of tweets to end users in real time. At an average of 5,700 tweets per second, the Twitter firehose represents the massive volume of data that can stream over a data firehose.

The Twitter firehose in its entirety has only been granted access to a couple companies (GNIP and DataSift to name a couple). However, Twitter also has a Streaming API that streams a sample of all existing tweets, which is still substantial based on the massive scale of Twitter users.

Other than Twitter, there are thousands of other use cases for the data firehose. Pretty much any source of data can be turned into a stream, including:

  • Weather and temperature data
  • Stock quote prices
  • Public transportation time and location data
  • RSS and blog feeds
  • Multiplayer game player position and state
  • Internet of Things sensor network data

Luckily for you, many data sources have built an API for their data source that is open to developers. We’ve seen thousands of web and mobile applications tap into these APIs (for example, Weather Underground’s weather API), however, they don’t stream this data in real time. Updates are pushed sometimes minutes after they actually happen, or these apps are static. And what good is data if it’s outdated?

That’s where firehosing this data comes into play, and we can see why it’s so powerful.

A Historical Look at Data Streaming and Why It Matters

Believe it or not, we haven’t always been able to stream media. HTTP was originally designed to transfer only text. However, in the 1990s, we saw the rise of the layman’s internet, and with it, the demand for data transfer capabilities.

In 1992, the multipart content-type was introduced, and we could now transfer audio, images, and video. But we were still missing something. We couldn’t stream that data. As a result, the Real-time Streaming Protocol was born in 1996. And the rest is history!

Today, users expect not just their apps, but their data, to be real-time. One of the ways we stream and syndicate big data is with the firehose. We now have the ability to stream massive volumes of data in real time, and build in analytics and syndication of those streams. Developers can slice and dice data streams as they want, giving them fine grain control over who receives different streams down to the individual subscriber.

It’s fast, it’s efficient, and it’s powerful.

Syndicating Streams

The ability to syndicate a data firehose is just as powerful is streaming it. Imagine your main data firehose, which then streams to smaller hoses, each to individual subscribers. Those individual subscribers can specify, or developers can control who receives what streams, at any time.

This enables developers to build accurate, robust real-time applications like stock quote applications, real-time statistic dashboards, or content curation platforms. We can now stream any amount of data in real time, to any number of subscribers.

Turning a Data Source Into a Firehose API

This is the fun part for developers. You can essentially take any source of data with an API, and turn it into a data firehose. Step 1 is connect to the PubNub Data Stream Network (you’ll first need to sign up for a PubNub account). Step 2 is retrieve the data from the data API of your choice. And Step 3 is publish the data.

We’ve got a great tutorial on Turning Any Data Source Into a Firehose API. Our tutorial uses Python, but you can use JavaScript, Ruby, or any of our other 50+ SDKs.

Firehose API In Action: Demo Firehose Data Streams

In addition to our tutorial on building your own firehose API, we’ve created a bunch of demo data streams for your consumption. Some of them use real data, and a couple use simulation data (plug in the API of your choice).

We’ve got Twitter, weather, Wikipedia edits, and more, all that can be seen on our firehose API demos page. In the coming months we’ll be rolling out more Data Streams; if you’re interested in having yours listed on our site or would like to give us ideas for what types of live data you’d like to be made publicly available, please drop us a line.

More From PubNub