Social news websites have changed the way we find, read, and share content. However, these websites are static and fresh content is not presented as it happens. To this, I aimed to change this, and I picked Hacker News as my news source. By leveraging the power of PubNub’s global data stream network, and scraping a little RSS, nobody will ever miss a new Hacker News article again.
I built a news and social content feed, that auto updates in real time as new content is posted to Hacker News. However, this can be applied to pretty much any social news website (all you need is a RSS feed). This otherwise static content is now pushed in real time to the browser.
The first task is to grab the RSS feed from Hacker News. There is a plethora of ways to do this and you can quickly write your own RSS scraper if you want, but I decided to use Python and feedparser. With a quick “pip install feedparser” we have our RSS.
With no customization, you’ll get every single post that’s posted. However, I decided the most interesting information was the rank of the post, title of the post, the link to the article, and the comments link.
Python Command Line
The Python Argparse module is used, which very powerfully gives you robust command line options.
python hn.py --help to see descriptions of all the options from the command line. The Python module gives you options for specifying how often you want to poll Hacker News for changes and if you want to get a new page after every change to the site or just the new posts that appear on the site. For instance, if you wanted to poll every five seconds and get the entire page you could run to be up and going:
Argparse also gives defaults, so run the following to use the defaults:
Now that we have the information that is important to us, and know how to run the scraper locally, it’s time to send it global. PubNub provides our incredibly simple API to publish the message. Quickly “pip install Pubnub” and publish our information from Hacker News.
Now it’s up to you. PubNub offers over 50 different SDK’s for your use. Take your pick. When trying to consume the information simply subscribe to the channel (in our case “hacker-news”) and you’re off. There are publicly available publish and subscribe keys to use for demos.
And the final result should look something like this: