IoT

Enable Text-To-Speech in Real time Apps in 70 Lines of Code

0 MIN READ • Michael Carroll on Oct 26, 2016

TTS for Real-time Applications in 70 Lines of Code

Text To Speech Demo

Hello, and welcome back to our series on Voice Activation in Real time web applications. In this article, we dive into a simple example of how to enable text-to-speech in a real-time AngularJS web application with just 70 lines of HTML and JavaScript. This application lets the user type in a message to be spoken, which is then sent to anyone viewing the same page. It also gives us an application to build more advanced features on later.

As we prepare to explore our sample AngularJS web application with text-to-speech features, let's check out the underlying Speech APIs in the Chrome desktop and Android browser.

Emerging Web Speech APIs

Native solutions work well in a single technology stack such as iOS or Android, but they require substantial effort and engineering resources to maintain across a diverse array of targeted platforms. In the meantime, there is one promising set of technologies that is worth highlighting to enable rapid prototyping across desktop and Android environments. The W3C has an emerging standard for Speech Synthesis and Speech Recognition.

The Chrome desktop and Android browsers (hopefully iOS mobile soon!) include support for these speech APIs. Using the APIs it's possible to create applications with voice control, dictation and text-to-speech. Although they don't allow the same level of user interface polish, integration and control as native APIs, we really enjoy using them for cases where rapid development is necessary and web applications suffice.

Since you're reading this at PubNub, we'll presume you have a real-time application use case in mind, such as ride-hailing, messaging, IoT or other. In the sections below, we'll dive into the text-to-speech use case, saving dictation and voice control for future articles.

Getting Started with PubNub

The first things you'll need before you can create a real-time application with PubNub are publish and subscribe keys from PubNub (you probably already took care of this if you already followed the steps in a previous article). If you haven't already, you can create an account, get your keys and be ready to use the PubNub network in less than 60 seconds.

That note about API versions bears repeating: the user interfaces in this series of articles use the v3 API (since they need the AngularJS API, which still runs on v3). We expect the AngularJS API to be v4-compatible soon. In the meantime, please stay alert when jumping between different versions of JS code!

Diving into the Code

First up, we have the JavaScript code & CSS dependencies of our application.

<!doctype html>
<html>
<head>
  <script src="https://cdn.pubnub.com/pubnub-3.15.1.min.js"></script>
  <script src="https://ajax.googleapis.com/ajax/libs/angularjs/1.5.6/angular.min.js"></script>
  <script src="https://cdn.pubnub.com/sdk/pubnub-angular/pubnub-angular-3.2.1.min.js"></script>
  <script src="https://cdnjs.cloudflare.com/ajax/libs/underscore.js/1.8.3/underscore-min.js"></script>
  <link rel="stylesheet" href="https://netdna.bootstrapcdn.com/bootstrap/3.0.2/css/bootstrap.min.css" />
  <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.6.3/css/font-awesome.min.css" />
</head>
<body>

For folks who have done front-end implementation with AngularJS before, these should be the usual suspects:

PubNub JavaScript client: to connect to our data stream integration channel.
AngularJS: were you expecting a niftier front-end framework? Impossible!
PubNub Angular JavaScript client: provides PubNub services in AngularJS quite nicely indeed.
Underscore.js: we could avoid using Underscore.JS, but then our code would be less awesome.

In addition, we bring in 2 CSS features:

Bootstrap: in this app, we use it just for vanilla UI presentation.
Font-Awesome: we love Font Awesome because it lets us use truetype font characters instead of image-based icons. Pretty sweet!

Overall, we were pretty pleased that we could build a nifty UI with so few dependencies. And with that… on to the UI!

The User Interface

Here's what we intend the UI to look like:

The UI is pretty straightforward – everything is inside a div tag that is managed by a single controller that we'll set up in the AngularJS code. That h3 heading should be pretty self-explanatory.

<div class="container" ng-app="PubNubAngularApp" ng-controller="MySpeechCtrl">
<pre>NOTE: make sure to update the PubNub keys below with your keys!</pre>
<h3>MyText to Speech</h3>

We provide a simple text input for a message to send to the PubNub channel as well as a button to perform the publish() action.

<input ng-model="toSend" />
<input type="button" ng-click="publish()" value="Send!" />

Our UI consists of a simple list of messages. We iterate over the messages in the controller scope using a trusty ng-repeat. Each message includes a link to allow the user to speak the text again.

<ul>
  <li ng-repeat="message in messages track by $index">
    {{message.data}}
    <a ng-click="sayIt(message.data)">(speak again)</a>
  </li>
</ul>

And that's it – a functioning real-time UI in just a handful of code (thanks, AngularJS)!

The AngularJS Code

Right on! Now we're ready to dive into the AngularJS code. It's not a ton of JavaScript, so this should hopefully be pretty straightforward.

The first lines we encounter set up our application (with a necessary dependency on the PubNub AngularJS service) and a single controller (which we dub MySpeechCtrl). Both of these values correspond to the ng-app and ng-controller attributes from the preceding UI code.

<script>
angular.module('PubNubAngularApp', ["pubnub.angular.service"])
.controller('MySpeechCtrl', function($rootScope, $scope, Pubnub) {

Next up, we initialize a bunch of values. First is an array of message objects which starts out with a single message for testing. After that, we set up the msgChannel as the channel name where we will send and receive real-time structured data messages.

  $scope.messages     = [{data:"testing 1 2 3"}];
  $scope.msgChannel   = 'MySpeech';

We initialize the Pubnub object with our PubNub publish and subscribe keys mentioned above, and set a scope variable to make sure the initialization only occurs once. NOTE: this uses the v3 API syntax.

  if (!$rootScope.initialized) {
    Pubnub.init({
      publish_key: 'YOUR_PUB_KEY',
      subscribe_key: 'YOUR_SUB_KEY',
      ssl:true
    });
    $rootScope.initialized = true;
  }

The next thing we'll need is a real-time message callback called msgCallback; it takes care of all the real-time messages we need to handle from PubNub. In our case, we have only one scenario – an incoming message containing text to speak. We push the message object onto the scope array and pass it to the sayIt() function for text-to-speech translation (we'll cover that later). The push() operation should be in a $scope.$apply() call so that AngularJS gets the idea that a change came in asynchronously.

  var msgCallback = function(payload) {
    $scope.$apply(function() {
      $scope.messages.push(payload);
    });
    $scope.sayIt(payload.data);
  };

The publish() function takes the contents of the text input, publishes it as a structured data object to the PubNub channel, and resets the text box to empty.

  $scope.publish = function() {
    Pubnub.publish({
      channel: $scope.msgChannel,
      message: {data:$scope.toSend}
    });
    $scope.toSend = "";
  };

In the main body of the controller, we subscribe() to the message channel (using the JavaScript v3 API syntax) and bind the events to the callback function we just created.

  Pubnub.subscribe({ channel: [$scope.msgChannel, $scope.prsChannel], message: msgCallback });

Lastly, we define the sayIt() function, which takes a text string and passes it to the Text-To-Speech engine. So easy!

  $scope.sayIt = function (theText) {
    window.speechSynthesis.speak(new SpeechSynthesisUtterance(theText));
  };

We mustn't forget close out the HTML tags accordingly.

});
</script>
</body>
</html>

Not too shabby for about seventy lines of HTML & JavaScript! For your convenience, this code is also available as a Gist on GitHub, and a Codepen as well. Enjoy!

Additional Features

There are a few other features worth mentioning in the Web Speech API. Somewhat disappointingly, we weren't able to get voice selection to work – hopefully this will be fixed in future Chrome releases.

var msg = new SpeechSynthesisUtterance("hello there");
msg.volume = 1; // 0 to 1
msg.rate = 1; // 0.1 to 10
msg.pitch = 2; //0 to 2
msg.lang = 'en-US';

Despite that one issue, there are several adjustable parameters that did work for the text-to-speech engine:

Volume is the relative volume of the speech on a scale of 0 (soft) to 1 (loud).
Rate is the rate of speech, from 0.1 (10%) to 10 (10x).
Pitch is the pitch of the spoken text, from 0 (lower) to 2 (higher).
Lang is the language model to use (based on the ISO code).

Conclusion

Thank you so much for joining us in the Text-To-Speech article of our Voice Activation series! Hopefully it's been a useful experience learning about voice-enabled technologies. In future articles, we'll dive deeper into the Speech APIs and use cases for dictation and voice commands in real time web applications.

Stay tuned, and please reach out anytime if you feel especially inspired or need any help!