By Rashmi Mudduluru and Nicole Riley for CSE 599 Prototyping Interactive Systems
For this project we focused on creating a program for classifying how good a dance gesture is against experts completing such a dance gesture. This project includes two parts: 1) the physical computing part that is used to sense gestures and is worn and 2) the software that is involved in showing the live gestures, segmenting them, recording them for training, scoring the gestures, and displaying the segmented gesture versus an aggregate gesture.
The dance gestures in this project all come from the music video for Single Ladies by Beyonce and we chose 5 of them that are identifiable with one hand. The gestures that we chose are pulling (Beyonce punching at ground in rhythm), Dont pay Attention (Beyonce puts arm opposite body and then flicks wrist up to upper right side to the rhythm of "don't pay him any attention"), Flip (Beyonce flips her wrist in rhythm on the side of her body), Clap (overhead clapping to rhythm of the song) and Elephant Arm (Beyonce punches towards ground and then puts left arm across right and then wiggles the right arm).
The dance moves being performed are shown below:
Dont pay Attention:
In the initial training to create expert gestures, Rashmi and I served as the experts. We watched a variety of videos to learn the dance moves, and once we felt comfortable with them, we recorded 5 gestures of each type for both of us. The gestures of each type were done alongside the Beyonce Single Ladies video (we did the gesture in the rhythm she does in the video).
Tutorial videos used in Training to learn the moves (note we use direct links so that you do not start at the beginning of the video; in instructables you cannot embed a video that starts at a different time than the start):
Pulling (stop at 11:18)
Dont pay Attention (stop at 2:55):
Flip (stop at 13:14):
Clap (stop at 3:45):
Elephant arm (stop at 4:23):
Single Ladies Videos Used in Training (note we use direct links so that you do not start at the beginning of the video; in instructables you cannot embed a video that starts at a different time than the start):
Pulling (stop at 0:45)
Dont pay Attention(stop at 1:15)
Flip (stop at 1:55)
Clap (stop at 2:36)
Elephant Arm (end at 2:55)
Arduino Leonardo (for prototyping)
Breadboard (for prototyping)
Wires to use in the breadboard (for prototyping)
Wires for Soldering
a Cloth or Glove
usb extension cord
Step 1: Hardware: Arduino Circuit and Wearables
To create a gesture recognizer, we needed to wire up the arduino to an accelerometer. Be began doing this on a breadboard and with the Arduino leonardo
To wire up the accelerometer we used a wiring diagram https://learn.adafruit.com/adafruit-analog-acceler...
We tried multiple ways of placing the board on a user’s wrist so that the accelerometer remains stable and doesn’t change orientation across users. Tying up the board to someone’s wrist wasn’t the most optimal solution as it restricted some of the dance moves in addition to causing the accelerometer to tilt. After some experimentation, we figured that having the accelerometer placed on back of the fingers and wrapping it around with a rubber band was most stable and gave us consistent recordings.
We then soldered an Arduino feather (which took up much less space) and the accelerometer to a perf board. We tried sewing this board to a glove so that it is easily wearable. Even with this setup, the perf board wasn’t stable enough. We finally used the same idea that we had with the bread bread which was to tie the board to a user’s fingers. This setup was steady and the smaller size of Arduino feather made it more comfortable to wear.
Step 2: Software: Arduino, Recording, and Live Classification
Our software part consists of two components - training and live gesture matching.
For both parts, we uploaded this arduino code (credit to Jon Froehlich) to either the feather or the Leonardo.
For our training part (expert pre-recorded gestures), we initially recorded 5 trials of each dance move with “Processing” that uses the spacebar to demarcate the beginning and the end of a gesture. We did this by editing some code for live gesture recording from Jon Froehlich. We did the recording with the two of us on whatever prototype was the most recent at the time. It was important to record with a usb extension to have enough room for the dance gesture. We initially tried classifying the gestures by training a linear SVM using Recursive Feature elimination. While this worked well on our individual gesture sets, using the same features performed poorly across users (i.e The classifying features varied widely for Nicole’s and Rashmi’s gestures).
2) Live Gestures
Our initial plan was to classify the live gesture and also provide a score for how well the gesture matched up with that of the expert’s. However, since we observed a lot of variance for gestures across users, we weren’t able to identify a set of features that would be accurate for all users. So, instead of classifying and matching the dance gestures, we fell back to just scoring a dance move in relation to that of the expert. In this setting, we display the next dance move that the user should perform. The live gesture recognizer segments a gesture when any significant movement is made. We shape match this gesture with that of the next expected gesture. We used the aggregate mean values of preprocessed (smoothed) x, y, z and magnitude values for comparison. Since we observed variance between the gestures recorded by us (Nicole and Rashmi), we shape match the live gesture separately with the two aggregates and provide a score that is lower. For the shape matching part, we use euclidean distance to calculate scores. In our testing, we found that Dynamic Time Warping produced the most accurate results but caused the program to hang when used in in real time.
In future, we would like to explore ways to better match the gestures based on the rhythm of the music which would improve precision greatly. A better classification approach that can account for the variance across users would improve our project greatly but we couldn’t dive deeper into this for want of time.