Step 6Write Software
Primary Components
CMU Sphinx is an open source voice recognition project maintained by Carnegie Mellon. The system consists of two parts: recognizer code and files with voice model and language model. It was easy to compile library code for Android. There is a great example posted by CMU Sphinx's creators. One can teach CMU Sphinx their own pronunciation. All one has to do is to record 20 sentences and run generated files thought a supplied tool. This can significantly increase recognition quality. What is more, one can build a language model. This would basically tell recognizer what words and phrases to expect. In my case a primary phrase was "call name", where name is one of the names from my address book. Having such model also increases recognition quality.
One might ask: why not use Google Voice? Unfortunately, it is really bad at understanding my pronunciation. And it also not so good at recognizing names.
One might ask: why not use special micro controller? I have certainly considered this approach. One solution I found was Sensory. Unfortunately, it looked too expensive. Well, it seemed like I would have to do the same amount of work, as with CMU Sphinx and it will result in comparable quality, but I would have to pay for the chip.
"No speech generator" – I was very convinced in this after trying several different generators. All text-to-speech engines created a very un-natural voice. So, I had to ask a human to record all phrases that my phone can possibly tell. What is more, I had her read each phrase several time. During playback I pick a random version of the phrase; this creates a strong illusion of a real human on the other end.
PJSIP – is an open-source implementation of the SIP stack. In other words, it is open VoIP library. I didn't have much trouble with it: downloaded, compiled and used it. CSipSimple is a big project open source that also uses it. This project very helpful, as it contained some great usage examples.
One might ask: why not use Skype? This was my original idea. I've subscribed to Skype Developer Program. Unfortunately reading license agreement revealed that Skype SDK can not be installed on any devices controlled by Android.
One might ask: why not SIP stack that is built into Android? Unfortunately, the stack has been added only in Android 2.3. Archos 28 is running 2.2.
Workflow
When telephone is off the hook:
- Wait one second
- Say "Number, please!"
- Start voice recognition
- If recognized "call name", go to next, otherwise say "Sorry, I didn't get that" and go to 3
- Say "Calling name..."
- Start voice recognition
- If recognized "no" or "stop" go to 2, otherwise go to next
- Place a VoIP call
- Say "Call placed"
- Wait until the call is terminated
- Say "Call terminated"
Android App Format
Phone application is actually a background service. There is also a light-wait user application that displays current status. The services starts on app startup or on user app launch.
Where to Find Source
All code that I wrote could be found on google code. You would also need to download and compile PJSIP and CMU Sphinx.
| « Previous Step | Download PDFView All Steps | Next Step » |
![]() |
Add Comment
|











































