You speak, and your device will listen... even if your device is halfway across the world!
This instructable will teach you how to use a voice recognition system based on the Intel RealSense camera to send RESTful commands over the internet to other devices, thus enabling you to use only your voice to control those devices.
Teachers! Did you use this instructable in your classroom?
Add a Teacher Note to share how you incorporated it into your lesson.
Step 1: Devices Required
This instructable is based on the Intel RealSense Camera F200 and the Intel Galileo Gen 2 platform.
The Intel RealSense Camera F200 is a gesture-based Human-Computer Interaction platform. It is a bundle of consumer grade 3D cameras and microphones with a powerful machine perception library to support third-party software developers
The Intel Galileo Gen 2 board will be used to simulate a device connected over the internet, listening to the RESTful commands. In order to visualize the execution of the commands, we will also use a Grove LCD Display from the Grove Starter Kit Plus - Intel IoT Edition for Intel Galileo Gen 2 Developer Kit:
Observation: The camera, at the time of this writing, was not available for regular purchase, but a developer registered in the Intel Developer Program was able place an order.
Step 2: Configuring the Intel RealSense Camera F200
In order to be able to setup the camera, you will need a computer with a 3rd generation Intel Core processor and a USB 3 port, otherwise the camera will not be recognized and the installation step will fail.
There is a page, at the Intel Developer Zone, with a step-by-step tutorial for installing the camera:
At this page you can download the latest driver, SDK and firmware updating tool.
The main challenge to correctly set up this camera is not the installation step, which is easy to follow through the wizards provided, but to ensure that the camera will continue to function after your first reboot. There are some issues regarding the USB drivers for Windows 8, where the camera erratically is shown to be connected and disconnected. Although not recommended, the link bellow discusses the question and provides a Windows 7 USB 3.0 stack to replace the Windows 8 stack:
If you don´t want to mess up with your USB 3.0 driver, you can use a special shutdown command whenever the camera behaves erratically: cmd.exe > shutdown /s /t 0
Step 3: Galileo Hardware & Software Setup
In order to to test our command-recognition system, we developed a small Galileo based web server to receive requests and do your bidding!
We assume that you already prepared your Edison to start programming. There are several tutorials here on Instructables to get started with Edison. It is advised to test your Edison/Galileo with some templates - such as the "hello world" of electronics, the blinking led.
Since we wanted to build a platform to execute voice commands given over the internet, we built a simple display to show every command received, using the Seeed Studio Groove LCD Display. If you want to add any behavior - through sensors and actuators - to the command server later, its easy to hack. Remember you have to connect the display to an I2C port.
To control the Seeed Studio Groove LCD Display and to create the web server, we choose an API called Cylon (http://cylonjs.com/). It is listed in the package.json file of our project and will be installed automatically when you upload the code to the Galileo.
Step 4: The Command Recognizer: Discussion & Source Code
The command recognizer is a .NET application built to provide support for voice recognition. It is based on the Intel RealSense Camera SDK, which has incredible features that allow word and phrase recognition of many different languages - we installed the Portuguese package, since English is the default, and used both in the application.
The program works by recognizing words and phrases spoken and comparing them to a dictionary of commands. Then, if there is a match to any command, it is executed. Commands are RESTFul requests to a device chosen in the applications interface, and are described in a JSON file contained in the "release" and "debug" directories of the .NET project. You can edit those files to add, edit or remove commands from the list. The commands used for testing purposes, both in english and portuguese, are:
"forward" : "http://IP:PORT/api/robots/JohnnyTwo/commands/forward"
"left" : "http://IP:PORT/api/robots/JohnnyTwo/commands/turnLeft"
"right" : "http://IP:PORT/api/robots/JohnnyTwo/commands/turnRight"
"stop" : "http://IP:PORT/api/robots/JohnnyTwo/commands/brake"
"frente" : "http://IP:PORT/api/robots/JohnnyTwo/commands/forward"
"atrás" : "http://IP:PORT/api/robots/JohnnyTwo/commands/backward"
"direita" : "http://IP:PORT/api/robots/JohnnyTwo/commands/turnRight"
"parar" : "http://IP:PORT/api/robots/JohnnyTwo/commands/brake"
In the left column we have the commands that can be recognized and, in the right column, the RESTFul requests that will be made by the application. The IP and PORT will be replaced by the information provided by the user after starting the application.
Step 5: Using the Application
One important thing you need to know: There is no need of the Intel RealSense Camera to use the program, it was designed to accept any audio source stream available in the device to work. We just need its SDK!
Before starting, you need to:
- Provide the IP address and the port of the device that will serve the requests
- Chose the audio source stream
- Chose the language in which the commands will be given
After this initial configuration, you will just have to press the start button and it is done, you can give commands to your device!