How to Add Video to your Chat Application

5 min read Mathew Jenkinson on Jun 29, 2020
Header for how to add video to your chat app

Remote interactions call for video chat 

Adding video to your  existing chat service is a great way to expand your platform and increase the ways your users can connect with one another

With the new necessity (and explosive growth) of remote interactions, communications are in demand. In healthcare, HIPAA-compliant video calling lets doctors check up on patients organically. Video calls also bring humanity back to remote work and customer support. And, streaming video is crucial for digital conferences and live events.

By offering integrations with A/V providers, PubNub makes it easy to enhance your chat app with voice and video calling.

Even though an A/V service solves the hardest parts of voice and video calling, orchestrating video calls between many users can be somewhat complex.

This article will walk through some of the architectural considerations behind building a video chat app. In this article, we’ll cover:

  1. The advantages of using a video API and a chat API together

  2. How to set up 1-1 and 1-many video chat using PubNub

  3. Advanced video chat features you can build on top of PubNub 

By the end, you’ll come away with a clear sense for how a video chat app can come together, and some of the key ways in which you can deliver advanced functionality to your users.

Use a video API and a chat API together

First, let’s break down how a chat service and a video service interact, and explore why this lets you avoid handling webRTC yourself.

It's important to understand that video and chat are two separate beasts, and there is no single system chat can do both video and chat and do them both well. Single, hybrid systems end up with a mess of compromises between two different systems. 

For example, many video services support text-based messaging, but do so by repurposing the signaling layer of the video call’s connection. While this lets them support text chats during the session, these chats are usually: 

  1. Not savable.

  2. Have no timestamps, so are contextually lost.

  3. Not encryptable.

  4. Lacking in features such as language translation or complex functions such as ‘@’ mentions.

In other words, hybrid chat systems only offer barebones functionalities.. This type  of chat can be sufficient if your primary drivers are cost, and if your app is video-first. But if your offering is chat-first, and if that chat needs to scale reliably, a hybrid system just won’t be enough. 

If you’re focused on providing users with a robust chat-first experience, it makes more sense to use PubNub’s chat API, and integrate it with a video API. This approach brings together two specialized systems, which gives you the best of both worlds. 

You don’t have to spend time or money setting up and maintaining the signaling infrastructure for your own webRTC implementation, and you don’t have to worry about manually extending or working around subpar chat frameworks.

Set up peer-to-peer video chat between two users

Let’s take a birds-eye view of orchestrating a video call between two users, using PubNub and a video service. 

Take a look at this diagram:

Set up video chat with pubnub

In this example the ‘server’ would be the customer's application server.

As you can see, there is a lot to do just to get User A and User B to connect with video:

  1. User A initiates a video request, and that request is validated by the application server

  2. User A receives the Video Session Object.

  3. At the same time, PubNub fulfills a crucial role by seamlessly passing the video invite from the application server to User B.

  4. Once the invite is accepted, the Video Session Object is passed directly to User B, and the video call begins. 

PubNub plays a key role by reliably passing requests and invitations along, within 250ms of the initial invite. This creates an instantaneous experience for your users, and reliably smooths out any lag they would experience otherwise. 

What about one-to-many video chat?

The diagram above shows how you would put together a straightforward peer-to-peer video chat. But with PubNub, one-to-many video calling uses exactly the same principles. 

With PubNub, publishing messages to multiple users at once, even at large scales, is simple. So, our basic pattern can be extended to support group video invitations with relative ease. 

When you consider use-cases for large audiences, such as conferences, webinars, or company all-hands, the value of reliable, easy invitations becomes clear. 

Set up video and text chat together

The flexibility of PubNub APIs mean that it’s easy to build apps that combine video calling and text chat, no matter your use case. 

When it comes to chat architecture, PubNub gives you flexible channel topology. This means that channels never have to have a defined type, and can contain any number of users. In a video chat app, this lets you:

  • Build call-specific chat rooms automatically.

  • Allow users to initiate direct messages with one another during the video call. 

  • Crucially, these chats can be as design- and feature-rich as you like.

Ultimately, you have full control over how chat channels are created. Plus, the messages sent over these channels are secure and separate from the media stream, so they’ll only ever be seen by their intended recipients.

Additionally, you can store user and channel metadata within PubNub. This makes it easy to notify friends of an event when it begins, prevent banned users from joining calls, or create persistent video chat rooms. This flexibility is ultimately where PubNub is superior to any video service that bolts a messaging bus on the side.


Video chat applications are easy with PubNub

Beyond simplifying the initial handshake between callers and offering secure group chat, PubNub can enhance your video calls by making it easy to create features useful for both end-users and your development team. 

A few core benefits include:

  1. Security and validation: Route invitations and chats through Functions to ensure that banned users cannot access the service or message another user.

  2. Privacy and message management: PubNub messages are secure and not part of the media stream. Rather than risk exposing private messages, you can make sure chat channels are truly secure and separate from video.

  3. Logging and event clarity: Log call events to keep minutes, comply with audits, or for post-hoc analysis.

Beyond these inherent benefits, you can build enhanced video-calling features that help your application exceed user expectations. With PubNub, you can build things like:

  1. Closed captioning. Using Functions and an integration like Amazon Transcribe, you can produce dynamic captions during live calls.

  2. Live whiteboards. For eLearning and collaboration, let multiple users draw at the same time by publishing draw paths, brush size, and color. 

  3. Screensharing. You can send a shared screen as video, then use a similar method to live whiteboards to allow users to annotate their shared feed.

  4. Post-call alerts and notifications. Send an NPS survey after customer service calls, or let doctors send appointment summaries after video consultations.  

  5. Enhanced support tickets. Log call info and include it in tickets, giving your agents contextually relevant information.


Building a chat application on PubNub means you’re building with flexibility and extensibility, especially as your application scales. Mobile push notifications, integrations with third-party services, and serverless functions all come built-in.

All of this also means that integrating a video service with PubNub is simple, reliable, and gives you full control over the way users, chatrooms, chat, and video interact. 

Additionally, by leveraging key features of PubNub with your video implementation, you can build features that give your users unique, surprising, and engaging ways to interact with one another.

 With A/V integrations, developing omnichannel communications has never been easier.