The firehose API is a steady stream of all available data from a source in real time – a giant spigot that delivers data to any number of subscribers at a time. The stream is constant, delivering new, updated data as it happens.
The amount of data in the firehose can vary with spikes and lows, but nonetheless, the data continues to flow through the firehose to any number of recipients. Once received, that data can be visualized, published, graphed; really anything you want to do with it, all in real time.
The most well known firehose API is the Twitter firehose, an API that delivers 100% of tweets to end users in real time. At an average of 5,700 tweets per second, the Twitter firehose represents the massive volume of data that can stream over a data firehose.
The Twitter firehose in its entirety has only been granted access to a few companies (GNIP and DataSift to name two). However, Twitter also has a Streaming API that streams a sample of all existing tweets, which is still substantial based on the massive scale of Twitter users.
Other than Twitter, there are thousands of other use cases for the data firehose. Pretty much any source of data can be turned into a stream, including:
Weather and temperature data
Stock quote prices
Public transportation time and location data
RSS and blog feeds
Multiplayer game player position and state
Internet of Things sensor network data
The ability to syndicate a data firehose is just as powerful is streaming it. This involves splitting the main data firehose into smaller hoses, each with individual subscribers. Subscribers can specify, or developers can control who receives what streams, at any time.
In addition to our how-to on building your own real-time data streaming application, we also have a demo that shows how to consume a firehose API. Some of the demo streams use real data from Twitter and Wikipedia, and a couple use simulated data.