Countering Malicious Inputs: HTML Content XSS Protection

0 MIN READ • Michael Carroll on May 31, 2017

Real-time Countering: HTML Content XSS Protection

Waving hand

Good News! We've launched an all new Chat Resource Center.

We recommend checking out our new Chat Resource Center, which includes overviews, tutorials, and design patterns for building and deploying mobile and web chat.

Take me to the Chat Resource Center →

This is a tutorial on implementing XSS protection and HTML cleanup on real-time messages streaming via PubNub using the Neutrino HTML Sanitizer block.

What is HTML Cleanup / XSS Protection?

HTML cleanup and cross-site scripting (XSS) protection refer to taking a piece of text input and removing any tags that could be problematic, such as JavaScript or unnecessary formatting markup. It counters a number of issues, including invalid/incomplete content, malicious input, international text encodings and issues with language-specific content.

XSS protection capabilities can be useful in a variety situations, such as customer service, article and text post analysis, marketing research, health applications and any application where you need the system to be resilient to malicious or invalid input.

As voice and language-based techniques gain popularity, user behavior and expectations are shifting from web-based to voice-based user experiences (including text analysis) in many everyday settings; tools such as HTML Clean API allow applications to distil HTML content into its simplest plaintext form, suitable for voice output.

What is the Neutrino HTML Sanitizer block?

Neutrino's HTML Clean API sanitizes and cleans untrusted HTML from user supplied content (or content from external sources), to ensure that it is safe and prevent cross-site scripting attacks (XSS).

The Neutrino HTML Sanitizer block filters HTML from real-time messages to prevent security exploits or reformats the text to whatever you want. For example, if you're building a real-time forum, HTML Sanitizer block can reformat user submissions to ensure that it only includes bold and italics, and scrubs out any security issues that may be hidden in the HTML.

Tutorial Overview

In this article, we dive into a simple example of how to enable HTML cleanup and cross-site scripting (XSS) protection in a real-time Angular 2 web application.

As we prepare to explore our sample AngularJS web application with HTML sanitization features, let’s check out the underlying Neutrino API.

Neutrino API

Automated text analysis services are quite challenging to build and test on your own; they require substantial effort and engineering resources to maintain across a diverse array of content markup systems and user languages. In the meantime, the Neutrino APIs make it easy to enable your applications with straightforward HTML cleanup functionality (and more!).

Looking closer at the APIs, HTML cleanup is just the beginning. There are a lot of API methods available for things like email and phone validation, profanity filtering, code highlighting, currency conversion, geolocation, IP blacklisting and more. It really is a powerful tool for augmenting your applications with a wide range of utility features.

In this article though, we’ll keep it simple and just implement a basic XSS filter for user-provided HTML content.

Obtaining your PubNub Developer Keys

The first things you’ll need before you can create a real-time application with PubNub are publish and subscribe keys. Just in case you haven’t already, you can create an account, get your keys and be ready to use the PubNub network in less than 60 seconds using the handy signup form.

Once you do that, the publish and subscribe keys look like UUIDs and start with “pub-c-” and “sub-c-” prefixes respectively. Keep those handy – you’ll need to plug them in when initializing the PubNub object in your HTML5 app below.

About the PubNub JavaScript SDK

PubNub plays together really well with JavaScript because the PubNub JavaScript SDK is extremely robust and has been battle-tested over the years across a huge number of mobile and backend installations. The SDK is currently on its 4th major release, which features a number of improvements such as isomorphic JavaScript, new network components, unified message/presence/status notifiers, and much more.

NOTE: In this article, we use the PubNub Angular 2 SDK, so our UI code can use the PubNub JavaScript v4 API syntax!

The PubNub JavaScript SDK is distributed via Bower or the PubNub CDN (for Web) and NPM (for Node), so it’s easy to integrate with your application using the native mechanism for your platform. In our case, it’s as easy as including the CDN link from a script tag.

That note about API versions bears repeating: the user interfaces in this series of articles use the v4 API (since they use the new Angular2 API, which runs on v4). In the meantime, please stay alert when jumping between different versions of JS code!

Getting Started with Neutrino APIs

To get started, you'll need a Neutrino API account to take advantage of the HTML Clean API.

Go to the Neutrino signup form and sign up for a free trial.
Make note of the API credentials (user name and client API token).

Setting Up the Block

With PubNub BLOCKS, it’s really easy to create code to run in the network. Here’s how to make it happen:

Go to the application instance on the PubNub Admin Dashboard.

Create a new block.

Paste in the block code from the next section and update the credentials with the Neutrino credentials from the previous steps above.

Start the block, and test it using the “publish message” button and payload on the left-hand side of the screen. That’s all it takes to create your serverless code running in the cloud!

Diving into the Code – The Block

You’ll want to grab the 29 lines of BLOCK JavaScript and save them to a file, say, pubnub_neutrino_block.js. It’s available as a Gist on GitHub for your convenience.

First up, we declare our dependency on xhr and query_string (for HTTP requests) and create a function to handle incoming messages.

export default (request) => {
    const xhr = require('xhr');
    const query = require('codec/query_string');

Next, we set up variables for accessing the service (the user name and API client token from previous steps and API url and output type).

const userId = 'YOUR_USER';
const apiKey = 'YOUR_API_KEY';
const outputType = 'plain-text';
let apiUrl = 'https://neutrinoapi.com/html-clean';

Next, we set up the HTTP params for the HTML Clean API request. We use a GET request to submit the data (by default). We use the client API token to authenticate our request to the API. We pass the content payload attribute from the message.

let params = {
    'user-id': userId,
    'api-key': apiKey,
    'content': request.message.payload,
    'output-type': outputType
};

Next, we create the URL from the given parameters.

apiUrl = apiUrl + '?' + query.stringify(params);

Finally, we call the HTML clean endpoint with the given data, decorate the message with a clean_htmland original_html values containing the HTML content, and catch any errors and log to the BLOCKS console. Pretty easy!

    return xhr.fetch(apiUrl).then((response) => {
        request.message.clean_html = response.body;
        request.message.original_html = request.message.payload;
        delete request.message.payload;
        return request.ok();
    }).catch(err => {
        console.error(err);
    });
};

All in all, it doesn’t take a lot of code to add XSS protection to our application. We like that!

OK, let’s move on to the UI!

Diving into the Code – The User Interface

You’ll want to grab these 94 lines of HTML & JavaScript and save them to a file, say, pubnub_neutrino_ui.html.

The first thing you should do after saving the code is to replace two values in the JavaScript:

YOUR_PUB_KEY: with the PubNub publish key mentioned above.
YOUR_SUB_KEY: with the PubNub subscribe key mentioned above.

If you don’t, the UI will not be able to communicate with anything and probably clutter your console log with entirely too many errors.

For your convenience, this code is also available as a Gist on GitHub, and a Codepen as well. Enjoy!

Dependencies

First up, we have the JavaScript code & CSS dependencies of our application.

<!DOCTYPE html>
<html>
  <head>
    <title>Angular 2</title>
    <script src="https://unpkg.com/core-js@2.4.1/client/shim.min.js"></script>
    <script src="https://unpkg.com/zone.js@0.7.2/dist/zone.js"></script>
    <script src="https://unpkg.com/reflect-metadata@0.1.9/Reflect.js"></script>
    <script src="https://unpkg.com/rxjs@5.0.1/bundles/Rx.js"></script>
    <script src="https://unpkg.com/@angular/core/bundles/core.umd.js"></script>
    <script src="https://unpkg.com/@angular/common/bundles/common.umd.js"></script>
    <script src="https://unpkg.com/@angular/compiler/bundles/compiler.umd.js"></script>
    <script src="https://unpkg.com/@angular/platform-browser/bundles/platform-browser.umd.js"></script>
    <script src="https://unpkg.com/@angular/forms/bundles/forms.umd.js"></script>
    <script src="https://unpkg.com/@angular/platform-browser-dynamic/bundles/platform-browser-dynamic.umd.js"></script>
    <script src="https://unpkg.com/pubnub@4.3.3/dist/web/pubnub.js"></script>
    <script src="https://unpkg.com/pubnub-angular2@1.0.0-beta.8/dist/pubnub-angular2.js"></script>
    <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css" />
    <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap-theme.min.css" />
  </head>

For folks who have done front-end implementation with Angular2 before, these should be the usual suspects:

CoreJS ES6 Shim, Zone.JS, Metadata Reflection, and RxJS : Dependencies of Angular2.
Angular2 : core, common, compiler, platform-browser, forms, and dynamic platform browser modules.
PubNub JavaScript client: to connect to our data stream integration channel.
PubNub Angular2 JavaScript client: provides PubNub services in Angular2 quite nicely indeed.

In addition, we bring in the CSS features:

Bootstrap: in this app, we use it just for vanilla UI presentation.

Overall, we were pretty pleased that we could build a nifty UI with so few dependencies. And with that… on to the UI!

The User Interface

Here’s what we intend the UI to look like:

The UI is pretty straightforward – everything is inside a main-component tag that is managed by a single component that we’ll set up in the Angular2 code.

<body>
  <main-component>
    Loading...
  </main-component>

Let’s skip forward and show that Angular2 component template. The h3 heading should be pretty self-explanatory. We provide a simple input and button to trigger the user’s content to send to the PubNub channel via the publish()action.

<div class="container">
    <pre>
    NOTE: make sure to update the PubNub keys below with your keys,
    and ensure that the BLOCK settings are configured properly!
    </pre>
    <h3>MyApp BLOCKS HTML Sanitizer Integration</h3>
    <br />
    Text:
    <br />
    <input type="text" [(ngModel)]="toSend" placeholder="HTML message" />
    <input type="button" (click)="publish()" value="Send!" />
    <hr/>
    <br/>
    <br/>
    <ul>
      <li *ngFor="let item of messages.slice().reverse()">
        <div>Clean: {{item.message.clean_html}}</div>
        <div>Original: <i>{{JSON.stringify(item.message.original_html)}}</i></div>
      </li>
    </ul>
</div>

The component UI consists of a simple list of messages. We iterate over the messages in the controller scope using a trusty ngFor. Each message includes the sanitized as well as the original content (which we JSON stringify for safety).

And that’s it – a functioning real-time UI in just a handful of code (thanks, Angular2)!

The Angular 2 Code

Right on! Now we’re ready to dive into the Angular2 code. It’s not a ton of JavaScript, so this should hopefully be pretty straightforward.

The first lines we encounter set up our application (with a necessary dependency on the PubNub AngularJS service) and a single component (which we dub main-component).

<script>
var app = window.app = {};
app.main_component = ng.core.Component({
    selector: 'main-component',
    template: `...see previous...`

The component has a constructor that takes care of initializing the PubNub service, and configuring the channel name. NOTE: make sure this matches the channel specified by your BLOCK configuration and the BLOCK itself!

}).Class({
    constructor: [PubNubAngular, function(pubnubService){
        var self = this;
        self.pubnubService = pubnubService;
        self.JSON = JSON;
        self.channelName = 'neutrino-channel';
        self.messages = [];
        self.toSend = "";
        pubnubService.init({
            publishKey:   'YOUR_PUB_KEY',
            subscribeKey: 'YOUR_SUB_KEY',
            ssl:true
        });

We subscribe to the relevant channel and create a dynamic attribute for the messages collection.

pubnubService.subscribe({channels: [self.channelName], triggerEvents: true});
self.messages = pubnubService.getMessage(this.channelName,function(msg){
  // no handler necessary, dynamic collection of msg objects
});

We also create a publish() event handler that performs the action of publishing the message containing the content to the PubNub channel.

    }],
    publish: function(){
        this.pubnubService.publish({ channel: this.channelName, message: {payload:this.toSend} });
        this.toSend = "";
    }
});

Now that we have a new component, we can create a main module for the Angular2 app that uses it. This is pretty standard boilerplate that configures dependencies on the Browser and Forms modules and the PubNubAngular service.

app.main_module = ng.core.NgModule({
    imports: [ng.platformBrowser.BrowserModule, ng.forms.FormsModule],
    declarations: [app.main_component],
    providers: [PubNubAngular],
    bootstrap: [app.main_component]
}).Class({
    constructor: function(){}
});

Finally, we bind the application bootstrap initialization to the browser DOM content loaded event.

document.addEventListener('DOMContentLoaded', function(){
    ng.platformBrowserDynamic.platformBrowserDynamic().bootstrapModule(app.main_module);
});

We mustn’t forget close out the HTML tags accordingly.

});
</script>
</body>
</html>

Not too shabby for about 94 lines of HTML & JavaScript!

Additional Features

There are a couple other endpoints worth mentioning in the Neutrino API. You can find detailed API documentation here.

Data Tools: services for content cleanup, entity validation, etc.
Telephony: services for working with phone numbers.
Geolocation: working with geographic locations and IP addresses.
Security & Networking: IP blocklists, host reputation, and IP probing.
Images: image processing and watermarking.

All in all, we found it pretty easy to get started with HTML cleanup using the API, and we look forward to using more of the deeper integration features!

Conclusion

Thank you so much for joining us in the XSS protection article of our BLOCKS and web services series! Hopefully it’s been a useful experience learning about content-enabled technologies. In future articles, we’ll dive deeper into additional web service APIs and use cases for other nifty services in real time web applications.

Stay tuned, and please reach out anytime if you feel especially inspired or need any help!