Build a Smart Chat App that Categorizes Topics with Amazon Comprehend and ChatEngine

What’s the difference between a chatroom and a forum? Both are used as a means of collaboration between people, however, chatrooms are realtime whereas forums are not. But what if you could take the best functions of each and blend them together?

How would we do that? It is easy to have a chat room that is online 24/7 and users hop in and out for discussions. Users can treat it as a chatroom as well as a forum. But there is a problem. Since a forum is mostly offline conversation, it leaves a small conversation trail that is easy to follow. Chatrooms, on the other hand, leave a very long conversation trail, thanks to the realtime interactions between the users. How can we make sense of that?

Making Sense of Conversations Through Natural Language Processing

The sense lies in the common purpose again. So what if the chatroom is intelligent enough such that the user can meander through the conversation to discover topics of interest related to the common purpose?

Amazon Comprehend is a Natural Language Processing (NLP) service that offers features such that we can make sense of a conversational text. The service is backed by a machine-learning platform that continually learns and improves from a variety of information sources. If we plug the chatroom’s conversation stream to Amazon Comprehend, then we can get some interesting insights.

A conversation can lead to topics, information or even questions around a subject. By leveraging the features of Amazon Comprehend, we can build a chatroom that provides intelligent segmentation of conversation, which can spin-off side conversations, just like a forum.

A Conversation-Aware Chat Room

Before we dive into the tutorial, let’s apply our application to a recent world event.

“Zika virus has spread around the world in recent times, like a wildfire. Many countries have reported incidents of Zika virus infections. Deaths have been reported too. Governments are issuing travel advisories to citizens across airports, train stations, and all other transit points. Health organizations and medical research firms are trying to understand the impact of this virus and preparing remedial measures.”

Now imagine this. Sensing the alarming rate of spread of the Zika virus, one non-profit organization decides to start a public forum for issuing advisory messages on Zika virus, based on the latest reports. They also plan to host a chat room where users can join and listen in to the advisory alerts and also interact with other users.

Bingo! We have the perfect setting for deploying a conversation aware chat room. A chat room where a moderator can periodically publish advisory messages and users can also share information about their local situation relating to Zika virus. What can we do to create forum topics out of this chat room?

There are many options. Again, the common purpose dictates this. If users want to look for chat messages specific to a location, where Zika virus has had an impact, then the chat room can list all the locations that are mentioned in chat messages. This way, users can engage in an offline conversation specific to that location.

Tag Heatmaps

Have you ever seen a tag heat map? Well, here is what we mean by it.

It is not exactly plotted on a map so we could likewise call it a heat tag.

That’s how we can add some intelligence to the Zika advisory chatroom. Most participants would be interested to know about the spread of Zika virus in and around their location. Moreover, a location may be mentioned multiple times, based on numerous incidents reported from there. Based on this information, we can build a heat map of location names, to indicate the intensity of Zika virus impact. The chatroom conversation ultimately drives all this.

Chat Room App with Location Tag Heatmap

Let’s see how the tag heat map builds up along with the chat conversation. Here is sample chat session where a moderator user, by the name of “Zika News,” delivers a few advisory messages on Zika virus to a set of users.

Looking at the tag heat map, the users can filter the conversation based on the location they are interested in. Now we can spin off location-specific topic forums, which can be spawned as different chat rooms or forums.

The possibilities are many, but for this demo, let’s keep the focus limited to generating the tags alone. And for that, we are going to use the Amazon Comprehend service. You can check out the source code of this chatroom app here in GitHub.

But before you jump to the code level details, let’s have a look at the components of this app and their major functionality.

The ChatEngine Framework

PubNub ChatEngine already has the required features to build such an app. It not only provides a simple chat UI for getting started but also takes care of all the backend heavy lifting to ensure that the chat messages are exchanged in realtime, by leveraging the PubNub Data Stream Network (the hard part).

Start building your own chat app by following the ChatEngine Quickstart Guide and get the sample UI code generated for the app. However, to demonstrate this chat app, we have already customized the code and have defined a set of users along with a pre-scripted chat conversation to mimic the Zika advisory service.

There are three types of users in this chat app:

  • Moderator – An user who logs in to the chat room as moderator and also acts as the official communicator for all advisory messages related to Zika.
  • Dummy Users – A set of users who log in to the chat room randomly but they are just listeners.
  • Human User – This is the user that logs in when you launch the app UI. This user is represented by the name “Peter.”

The moderator’s actions are scripted through this Node.js script. Initially, when the moderator logs in, it starts reading a text file that contains prescripted lines containing information about Zika virus.

//Read and publish the Zika news from file
var s = fs.createReadStream('eventJournal.txt')
    .pipe(es.split())
    .pipe(es.mapSync(function(line){

        // pause the readstream
        s.pause();

        lineNr += 1;

        // process line here and call s.resume() when rdy
        // function below was for logging memory usage
        //logMemoryUsage(lineNr);
        setTimeout(function(data){
        
        homeChat.emit('message',{
        
                "comprehend" :{

                    "sender" : "zika-news",
                    "text" : data

                }
          
        });

        s.resume();

        },(randomInt(15,25) * 1000),line)

        // resume the readstream, possibly from a callback
        
    })
    .on('error', function(err){
        console.log('Error while reading file.', err);
    })
    .on('end', function(){
        
        console.log('Read entire file.');
        
        setTimeout(function(data){
            
            homeChat.emit('message',{
            
                    "comprehend" :{
                    "sender" : "zika-news",
                  "text" :" That's it for today, goodbye folks"
                    }
    });

            homeChat.leave()

        },5000);

        setTimeout(function(data){
            
            homeChat.leave()

        },10000);
    })
);

Everytime the human user or the moderator sends a chat message, it is framed as a JSON request and sent to a PubNub Function.

homeChat.emit('message',{
        
                "comprehend" :{

                    "sender" : "zika-news",
                    "text" : data

                }
          
});

And PubNub Function forwards it to all chat clients with the count of all locations that have been described in the chat conversation. This information is used to generate the color intensity for tag heat map in the chat UI.

// Update Tags with Heatmap
const updateTags = (locCount) => {

    console.log(locCount);

    for (var loc in locCount) {
        
        var locId = loc.replace(/[ ,]/g,'-');
        
        if($('#'+locId).length == 0){
            $('#tagcont').append('<div id="'+locId + '" class="tags">' + loc + '</div>');
        }                 
        
        $('#'+locId).css("background-color",colorScale(locCount[loc]));
        
    }

}

PubNub Functions

By default, creating a chat engine app instance also creates a PubNub Function to orchestrate the chat messages.

However, in this chat app, we need to do additional processing to find out the location information on each chat message. For this purpose, we have to use another PubNub Function.

Amazon Comprehend

Amazon Comprehend does the real trick of finding out the location names within a text. Amongst the many natural language processing features supported by it, finding an entity within a text can really help decipher the true meaning. Comprehend supports different types of entities, such as dates, quantities, persons, locations.

For this app, location is the only entity that matters. So every chat message is intercepted by the PubNub Function and then sent to the Comprehend API for analysis, which returns a set of location names with their mention counts.

To invoke the Amazon Comprehend API, we can make a call from within the PubNub Function code as follows.

return vault.get('AWS_access_key').then((AWS_access_key) => {
                return vault.get('AWS_secret_key').then((AWS_secret_key) => {
                    var awsCreds = {accessKeyId: AWS_access_key, secretAccessKey: AWS_secret_key};
                    var entityOpts = {
                      path: '/',
                      service: 'comprehend',
                      region: 'us-east-2',
                      headers: {
                        'Content-Type': 'application/x-amz-json-1.1',
                        'X-Amz-Target': 'Comprehend_20171127.DetectEntities'
                        //'X-Amz-Target': 'Comprehend_20171127.BatchDetectEntities'
                      },
                      host: 'comprehend.us-east-2.amazonaws.com',
                      body: '{"Text": "' + payload.comprehend.text + '" , "LanguageCode":"en"}'
                      //body: '{"TextList": ' + JSON.stringify(payload.comprehend.TextList) + ' , "LanguageCode":"en"}'
                      
                    };
                    
                    
                    var s1 = signAWS(entityOpts, awsCreds);
                
                    const entityHttp_options = {
                        "method": "POST",
                        "body": entityOpts.body,
                        "headers": entityOpts.headers
                    };
                    
                    
                    return kvstore.get("locations").then((locationCount) => {
                                
                        if(!locationCount){
                            locationCount = {};
                        }
                        
                        return xhr.fetch('https://' + entityOpts.host, entityHttp_options)
                            .then(function (response) {
                            
                                var topics = JSON.parse(response.body)
                             
                                console.log(topics);
                            
                                if(topics.Entities){
                                
                                    topics.Entities.forEach(function(element){
                                
                                        if("LOCATION" == element.Type){
                                            const locationName = element.Text.toLowerCase()
                                            locationCount[locationName] = locationCount[locationName] == undefined ? 1 : locationCount[locationName]+1 ;
                                        }
                                    
                                    });
                                
                                    kvstore.set("locations",locationCount);
                                    
                                    request.message.locations = locationCount;
                                    console.log(locationCount)
                                    return request.ok();
                                }
                                
                            
                            }).catch(function (error) {
                                console.log(error);
                                return request.ok();
                            });
                    })
                });
        });

Build Your Tag Heatmap

Now that you understand the various components that make this app work, its time to head over to the README file and start putting the pieces together.

Follow the README instructions in sequence to set up

Follow the app deployment steps to run the chat session.

The chat messages sent by the Moderator bot are contained in this text file. You can change the content of this file to frame your own messages (each one delimited by a newline). You should try it out and also check how the tag heatmaps form as the conversation progresses.

Beyond Topics & Conversations

Amazon Comprehend also supports a topic modeling feature that can suggest possible topics emerging out of a text. It is suitable for analyzing loads of historical conversations based on a more intelligent analysis and semantic information.

If you can extract location from chat communication, then why not try out the topic modeling API to extract information from some historical conversation logs. You could scrape the Wikipedia page on Zika virus and run it through topic modeling to see what it churns out.

As we embrace more chat interfaces across diverse applications, it will be interesting to see how can we make more sense of the tons of text messages generated from these apps. This demo was an attempt towards this direction and we hope you enjoyed it. Please drop in your comments if you have any ideas around this demo. We will be eager to take your feedback.

Try PubNub Today