Step 4: How Twitr_janus speaks using text-to-speech (in Processing)
Speaking open dataAn essential point I was trying to test with Twitr_janus was whether I could get a puppet to speak open data over the web. Initially it was intended this would just be tweets from @twitr_janus account on Twitter and this was how Twitr_janus got its name.
Twitr_janus did indeed sucessfully speak tweets, stripped from the Twitter API with Processing.
Making Twitr_janus speak tweets was done without using API keys and by parsing the API string, rather than referencing fields properly. This was to avoid having to register as a Twitter developer, etc. This crude method had some limitations, for example tweets with control characters, confused the parsing scripts leading to messages being truncated when being decoded.
The parsing model worked much better with Google spreadsheets, where the raw data could be appended with extra stop data to help the parsing process using expressions in the spreadsheet fields. Google spreadsheet data was not only easy to use for speech, it was possible to easily use it for eyeball control. Because the Google spreadsheet method is the easier and more versatile of the two approaches, this is what is described below .
How data is sent, coded and decoded, step by step
It helps to think through the flow of data...Before starting on this I found it helpful to scribble down a flow diagram to get a feel for the building blocks needed. The mouth and TTS represent the function of text-to-speech conversion.
This is not a technical drawing!
Part 1 - Entering data in the Google spreadsheetThere were three pieces of data that needed to be sent from the spreadsheed, to be decoded by Processing. These were the two variables eyeballUpDown_stop (columns F) and eyeballLeftRight_stop (column G) which are coded positioning data. Later once decoded they would be used to drive servos with an Arduino attached to the puppet head. The third piece of data was text_stop, which is what was to be further processed in Processing to create the text to speech.
In the final version only two pieces of data were sent. The speech data, and a single eyeball data value. This may cause some confusion when interpreting the code! (eyeballUpDown was used, though not renamed),
A single eyeball position variable could be used instead of two because the data being sent simply represented one of 25 positions. Although two control values are needed by the Arduino to position the eyeballs (one for the up/down servo, one for the left/right servo) the single variable sent was used to access corresponding pairs of values, stored in an array inside the Arduino sketch.
In the cells, you can see that the data has been preceded by the ¬ character. This is added to whatever data is entered manually using a concatonating cell expression. It is used as a stop character to delimit the data strings later. These characters will show up in the RSS feed and the Processing script uses them to tell where one piece of data stops and the next starts. (control-character delimitation)
Initially data was entered manually into fields in the spreadsheet, as below. This is not ideal, as you have to know the exact position values to send, which is hard to remember and easy to mess up...
To avoid manually entering data into the spreadsheet, the built-in Google form was used. This is available for any Google spreadsheet was used.
Which looks like this...
However, the standard Google form still needed the eyeball numeric positioning values to be entered exactly, so it needed to be modified.
To create a more useful form with easy control over preset values for the eyeball variables the basic html of the form was transferred to a web page (an html widget on a NetVibes page), where it could be pimped up a bit.
The form in Netvibes as it looked to the puppet operator
The free text inputs were swapped for radio button inputs with preset values and corresponding human-readable position text.
The main thing was to still use the original Google field names so that the data options would all be fired into the same cell in the spreadsheet when the new form was submitted
You can see this in the html view of the form below. All the options follow the same pattern as below
- xxx is a control value that will send data that corresponds to the physical eyeball position "positionxxx"
- The value of xxx is actually a reference number to the value within a specific element of an array (there are 25 different preset positions, hence there being a radio button needed for each integer value between 1 and 25, used to referenc the array values between array and array
- positionxxx is a plain english description to show the operator, to allow them to choose a target eyeball position
- "entry.1.single" is the Google field name that must be kept the same, so it will put the value xxx into the correct cell in the spreadsheet. This is the same for each radio button, because the different values are effectively choice of values to put in that one field
By reworking the form, a more visual interface was created, so it was easier to see where the eyeballs would move, whilst still allowing speech text to be entered.
The other line that is important that is also taken from the orginal Google form html is this one:
It's the form submit action, and must be kept the same.
This technique of pimping a simple Google form has some advantages:
- It allows you other possibilities like adding continuous sliders using the HTML 5 feature <input type="range"/>
- you can create a method of injecting data into the spreadsheet without any form of API key. You just need to know how to tweak html form controls and values.
- you can input the data in one field, but pull the data out from another field that uses the input data, but modified in some way to extend its versatility, as required
- on submitting the form, Google will take you back to the original form not your pimped form, so you need to do a page refresh after each submission to reload you form
The form in Netvibes - html source code (image below)
Part 2 - getting the data out at the other end of the webThe data entry method described above represents the first link in the rather rubbish data flow diagram shown at the top of the page (although it shows Twitter as the data source, not Google). The data entry step happens on a control device, used by the operator remotely from the Twitr_janus puppet head. It is, in effect, the primary control interface.
At the other end of the web, Twitr_janus' head was connected to a separate computer running its Processing brain sketch. This was polling an RSS data feed from the published spreadsheet. To get this feed, the spreadsheet had to be published. When you publish a Google spreadsheet, it is given a public RSS feed, with a dedicated URL. This is used later in the Processing script to parse out the data. The Url looks for a Google RSS will look like this...
And the output looks like this...
In the RSS output, the stop character ¬ is clearly visible (second from last line, before the fieldnames: "eyeballupdownstop", "eyeballleftrightstop" and "textstop", and the corresponding values of 13, 22 and "Hello my name is..."
The Processing sketch that is Twitr_janus' brain is polling this URL repeatedly, and uses the ¬ character to strip out the data...
Here is the code that is parsing the google spreadsheet feed to extract the control data and passing it into an array. It is looking for the ¬ character first, then the < character
String [ ] texty = loadStrings(gssApiString);
String [ ] texty2 = split (texty, '¬'); // pulling out data with stop character
String [ ] texty3 = split (texty2, '<'); // get rid of trailing text after <
gssText = texty3;
This is then checked in Processing against the last received data. If it is different, then a new instruction has been received.
Part 3 - turning the data into speechAny new data is passed to a Text-To-Speech library, and comes out spoken in a fairly crude raspy computer-generated voice.
Credit where it's due.The library is GURU TTS, available from http://www.local-guru.net/projects/ttslib/ttslib-0.3.zip
A big shout out to the person who made this. The blog from which this was downloaded is a bit flaky and It's not that clear who the author actually was, but it appears to someone called Nikolaus Gradwohl. I hope that's right!
The guru tts library was downloaded and had to be installed into the Processing folders, so it could be imported into the Processing sketch.
The blog it features on is here:
This in short, is what enables Twitr_janus to talk by speaking data.
Part 4 - making the jaw move in time to the speech
The sound output from the generated speech needed to make the jaw move. That is, it needed to be lip-synced. An output lead was connected to the computer, and this was passed trough a simple audio amplifier circuit, salvaged from an old computer speaker (this is shown here to the right of the Chelsea Buns).
This gave a large enough sound wave to detect reliably (about +-3v) ...
To make Twitr_janus' jaw move in time to its speech, the audio output from the Processing text-to-speech needed to be lip-synched to the jaw mechanism.
The basic idea is that the Arduino script repeatedly checks the audio for peaks, and uses these to trigger the motor on and off. This is illustrated (rather roughly) below...
The amplified laptop audio output signal was fed directly to the analog input of the Arduino board. On the Arduino, a control sketch repeatedly checked the peak voltage Arduino converts the analog input into a number, which it checked against a preset peak threshold value.
If the signal rose above the peak, the Arduino triggered a relay circuit to power on a 12V car door actuator (a linear motor). If the voltage dropped below the peak it would cut the power. This gave a jerky motion based on the peaks of the speech.
In the Arduino sketch, th code looked like this...
// @@@@@@@ this function is used if you are using raw audio output from an analog amplifier into the Analog pin 0
valueAnalogIn = analogRead(analogInput); // This is checking for output above a threshold voltage to trigger jaw signal
}// @@ end threshold checking //
Here you can see the hinged mouth of the puppet, to which the car door actuator was attached...
For a detailed look at how the Processing brain works, you can read command by command descriptions on this post on my Making Weird Stuff blog: makingweirdstuff.blogspot.co.uk/2012/08/twitrjanus-is-now-speaking-data-sent.html
Although this description applies to a Google spreadsheet RSS feed as a data source, the same principle applies to a string obtained by calling the Twitter API.