Manipit - IRONMAN JARVIS-like Hand Motion Tracking With Painted Gloves

***NOTE(May, 12, 2015)***
If you can't watch the video from the embedded file, you can watch it here!!
********************************

Have you watched the movie, "Iron Man" ?
In the movie, Tony manipulates virtual objects by his hand motions.
It's really cool! So why don't I make it myself ?
That is what I've done! The name of my project is "Manipit(Manipulate It)".

This project is a hand motion tracking system that utilizes Xtion PRO LIVE ( which is a kind of camera like Kinect) and painted gloves. Because it can recognize your hand pose (3D position, 3D orientation and grasping), you can input commands with your hand motions. With Manipit, for example, you can fiddle with 3D computer graphics on a display and/or manipulate a robot arm intuitively. It's quite similar to the interface you see in the Iron man clip.

Manipit requires the user to wear painted gloves. Fortunately, no sensors are required on the gloves. The Xtion detects the position and area of each distinct color. Manipit recognizes hand position with neural network, which is a kind of artificial intelligence technology at the core of this software.

NOTE:
Manipit consists of 2 parts.
One part is hand tracking, which detects where the hand is.
The other part is posture recognition, which detects the posture of the hand.
Since the former is not so difficult, I'll talk about the latter from step 1 to step 4.

Step 1: Make a Colored Glove

What is the best way to find out a hand pose (particulaly, orientation) with an RGB camera ?
One of my first thoughts is to "use painted glove".
Intuitively, it could give a lot of information on the hand pose, right ?

For the painted glove, I used a white glove and 6 different paints for fabric, which can be found at hobby shops. This glove is used for hand pose recognition, so I painted as follows.

Thumb - Yellow
Index finger - Green
Middle Finger - Purple
Ring Finger - Pink
Pinky Finger - Water
Palm - Blue

During the painting process, it is important that you wear the glove while the paint dries. So, I recommend that before starting, you find your favorite movie (eg. Iron Man) to watch while waiting.

Now I have a painted glove, and I'd like to estimate a hand pose by utilizing these distinct colors. But ... how ? The answer is described at next step!

Step 2: Implement Neural Network

How can we estimate a hand pose by utilizing these colors?

I think neural network should work fine! It is one technique of artificial intelligence (AI). I don't tell about the its detail, however I will explain the overview below.

Neural network is a powerful method to deduce information (hand pose) from other information(each color position, area and so on). Neural network consists of "neurons", which are represented by circles. Neurons consist of layers and layers are connected to each other.

The configuration of neural network is as follows:

The number of neuron at input layer(the first layer) is 19.
6 - center of position of each color region
6 - area of each color region
6 - aspect ratio of each color region
1 - aspect ratio of whole hand region
The number of neuron at hidden layer(the second layer) is 80 ~ 250.
I decided the number experimentally.
The number of neuron at output layer(the last layer) is 8.
3 - variables to represent an orientation(Euler angles)
5 - angle of each finger

If you'd like to the detail of the inside, I uploaded code here(github).

Actually, neural network cannot be used out of box. It has huge amount of parameters and we have to tune all of them properly! But don't worry, we have an amazing algorithm to do so. It's called Back Propagation. And it requires "sample data" to automatically tune parameters.

What is sample data? How can we produce these data? Go to next step!

Step 3: Make a 3D Model of Painted Glove

What are sample data ?
For automatic tuning of parameters of neural network, we have to teach neural network in a way such as this:
"If you have RGB hand image like this, the orientation is blur blur blur ..."
"And if you have the image like that, the orientation is blur blur blur ..."
It means that we need sample sets which consist of RGB hand images and relevant hand pose information.

Then let's make a 3D model of painted glove!
I used Blender, an awesome rendering freeware, to build the model.
Here, hand is not rigid, so I built a deformable model(See picture). This model is called a skinned mesh.

What is the 3D model for?
It's for automatic tuning of neural network's parameters.
Let's try that next!

Step 4: Tune Up Neural Network's Parameters

Here, I use sample data(See picture), which I made at the previous step,
to tune parameters of the neural network.
Once the parameters are tuned, neural network can estimate hand pose with color information!

After building the 3D model of hand, I implemented a C++ parser of the model and collected about 3000 images. Then, I successfully used the back propagation algorithm to tune the parameters.

Now I have most of what is required.
The last step is to track my actual hand and manipulate the CG hand on the display.

Step 5: Combining Hand Tracking With Neural Network Based Hand Recognition

This is the last step.
In order to recognize hand pose, the system needs to detect my hand position.
Therefore, I implemented hand tracking system which can do the following:

Subtract background image
Detect 3D hand position by using motion tracking library, NITE(This library cannot detect a hand pose)
Extract each color region via image processing
Use Xtion PRO LIVE sensor(See picture)

By combining the tracking system with the full tuned neural network,
Manipit, a hand motion tracking system, is now completed.
How does it work ? See the video.
The CG hands on the display are synchronized with my actual hands!

Even if I'm grabbing, Manipit can recognize my hand pose.
I think that this point is the big difference from existing products and research.

Step 6: For More Details ...

Manipit was developed with Ubuntu 14.04 and ROS, which is a middleware for robot researchers.

I used:

OpenGL to show virtual world
OpenCV to do image processing
OpenNI to use Xtion Pro Live
NITE to detect 3D hand position

Currently, I'm trying to improve Manipit so that it works without gloves.
You can follow my activities on github.

Music: http://www.bensound.com/royalty-free-music

Thank you!