Chat Moderation with OpenAI

0 MIN READ • Darryn Campbell on Jul 2, 2024

Any application containing in-app chat needs some way to regulate and moderate the messages that users can exchange. Since it is not feasible to moderate all inappropriate content with human moderators, the moderation system must be automatic. Since users will frequently try to circumvent moderation, machine learning, generative AI, and large language models (LLMs) [and GPT models such as GPT-3 and GPT-4] are popular ways to moderate content.

Moderation is a complex topic, and PubNub offers various solutions to meet all of our developers’ use cases.

PubNub Functions can intercept and modify messages before they reach their destination. You can apply custom logic within a Function, including calling an external REST API, allowing you to use any external service for message moderation. This approach is used in this article to integrate with OpenAI.
PubNub Functions offer custom integrations that support content moderation and sentiment analysis, including Lasso Moderation, Tisane, A RegEx based profanity filter, Lexalytics, and Community Sift.
PubNub’s BizOps Workspace can monitor and moderate conversations, including the ability to edit and delete messages.

The Open AI Moderation Endpoint

This article will look at OpenAI’s Moderation API, a REST API that uses artificial intelligence (AI) to determine whether the provided text contains potentially harmful terms. The API's intention is to allow developers to filter or remove harmful content, and at the time of writing, it is provided free of charge though only supports English.

The model behind the Moderation API will categorize the provided text as follows (taken from the API documentation):

Hate: Content that expresses, incites, or promotes hate based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status, or caste. Hateful content aimed at non-protected groups (e.g., chess players) is harassment.
Hate / Threatening: Hateful content that also includes violence or serious harm towards the targeted group based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status, or caste.
Harassment: Content that expresses, incites, or promotes harassing language towards any target.
Harassment / Threatening: Harassment content that also includes violence or serious harm towards any target.
Self-Harm: Content that promotes, encourages, or depicts acts of self-harm, such as suicide, cutting, and eating disorders.
Self-Harm / Intent: Content where the speaker expresses that they are engaging or intend to engage in acts of self-harm, such as suicide, cutting, and eating disorders.
Self-Harm / Instructions: Content that encourages performing acts of self-harm, such as suicide, cutting, and eating disorders, or that gives instructions or advice on how to commit such acts.
Sexual: Content meant to arouse sexual excitement, such as the description of sexual activity, or that promotes sexual services (excluding sex education and wellness).
Sexual / Minors: Sexual content that includes an individual who is under 18 years old.
Violence: Content that depicts death, violence, or physical injury.
Violence / Graphic: Content that depicts death, violence, or physical injury in graphic detail.

Results are provided within a JSON structure as follows (again, taken from the API documentation):

Calling the Open AI Moderation API from PubNub

Integrating the Moderation API into any PubNub application is easy using PubNub Functions by following this step-by-step tutorial:

Functions allow you to capture real-time events happening on the PubNub platform, such as messages being sent and received; you can then write custom serverless code within those functions to modify, re-route, augment, or filter messages as needed.

You will need to use the “Before Publish or Fire” event type; this function type will be invoked before the message is delivered and must finish executing before the message is released to be delivered to its recipients. The PubNub documentation provides more background and detail, but in summary: “Before Publish or Fire” is a synchronous call that can alter a message or its payload.

Create the PubNub Function

Log into the PubNub admin portal and select the application and keyset for the app you want to moderate.
Select ‘Functions’, which can be found unde the ‘Build’ tab.
Select ‘+ CREATE NEW MODULE’ and give the module a name and description
Select ‘+ CREATE NEW FUNCTION’ and give the function a name.
For the event type, select ‘Before Publish or Fire’
For the Channel name, enter * (this demo will use *, but your application may choose to specify only the channels here that you want to moderate)

Having created the PubNub function, you need to provide your Open AI API key as a secret.

Select ‘MY SECRETS’ and create a new key with name ‘OPENAI_API_KEY’
Generate an Open AI API key and ensure that key has access to the moderate API.
Provide the generated API key to the PubNub function secret you just created.

The body of the PubNub function will look as follows:

The function itself is quite straightforward:

For each message received:

Pass it to the Open AI moderation function
Append the returned moderation object as a new key on the Message (JSON) object

Save your function and make sure your module is started

Latency

The PubNub function you have just created will be executed synchronously every time a message is sent, and that message will not be delivered until the function has finished executing. Since the function contains a call to an external API, the delivery latency will depend on how fast the API call to Open AI returns, which is outside of PubNub’s control and could be quite high.

There are several ways to mitigate any degradation in the user experience. Most deployments provide immediate feedback to the sender that the message was sent and then rely on read receipts to indicate that the message is delivered (or reported).

Update the Client Application

Let’s consider what would be required to handle the moderation payload within your application using the Chat Demo, which is a React application that uses the PubNub Chat SDK to show most of the features of a typical chat app.

Set up an attribute to track whether or not a potentially harmful message should be displayed:

And add some logic to not show a potentially harmful message by default, in this case within message.tsx:

Note that these changes are not present on the hosted version of the Chat Demo but the ReadMe contains full instructions to build it and run it yourself from your own keyset.

Wrap up

And there you have it, a quick and easy (and free) way to add both moderation and sentiment analysis to your application using Open AI.

To learn more about integrating Open AI with PubNub, check out these other resources:

OpenAI GPT API Integration with Functions
Build a Chatbot with PubNub and ChatGPT (Adding a Chatbot to our PubNub showcase)
Enhance a Geo App with PubNub & Chat GPT / OpenAI

Feel free to reach out to the DevRel team at devrel@pubnub.com or contact our Support team for help with any aspect of your PubNub development.