Introduction: Gesture Based Photo Booth Tool
In this project, we built a gesture based photo booth type tool. Through a variety a gestures, the user can control draw on or filter their webcam image. With our tool, you can swipe through filters including greyscale and edge detection among others. You can also push a button on our controller to draw abstract art on top of your image. Finally, you can either save your image or undo all changes through circle and X gestures.
In this tutorial, we will go through the steps to build an arduino based controller, design an enclosure for the controller, record gesture data, train an SVM on the recorded data, and implement the photo booth-like interface.
Supplies
Arduino Leonardo
Perf Board
ADXL335 3-axis accelerometer
Tactile push switch
Solid core wire
3D CAD tool and 3D printer
Laptop with webcam
Step 1: Arduino Circuit Design
Our circuit diagram is pictured here. We ended up soldering the accelerometer to a small perf board and soldering wires to the tactile push switch to allow us to raise the button to the top of the controller enclosure. We followed this tutorial for wiring our accelerometer: https://learn.adafruit.com/adafruit-analog-accelerometer-breakouts/arduino-wiring , and this tutorial for our button wiring: https://learn.adafruit.com/adafruit-arduino-lesson-6-digital-inputs/overview.
Step 2: Enclosure Design
We designed our enclosure in Autodesk Fusion 360. We followed the following video for our snap enclosure: https://youtu.be/VVmOtM60VWw . Our enclosure has finger grips, a USB slot, a hole to fit the tactile push switch, and a snap lid. You can see our design files here: https://gitlab.cs.washington.edu/eafurst/599h_final_project/tree/master/EnclosureDesign.
We printed using an Ultimaker 3 printer with a 0.2 mm layer height and infill at 20%. We also printed supports with the body of the case. Because we used a more coarse layer height, we allowed for some more wiggle room in our design (i.e. a slightly larger USB slot and a larger hole for the push switch).
Step 3: Gesture Recording
Using the controller constructed in steps 1 and 2, we recorded training data using simple Arduino code and Processing. We programmed our Arduino to send the Arduino time stamp and the x, y, and z values for the accelerometer over the Serial port. For gesture recording, we ignore the button portion of our controller.
The Processing program records 10 samples of each gesture and writes the Processing time stamp, Arduino time stamp, and x, y, and z accelerometer values to a file. It creates one large full stream CSV file and individual gesture files for each sample of each gesture. The gestures are segmented using cues from the space bar; that is, the user presses the space bar to both begin and end a gesture recording.
The gestures we allow in our tool are:
- swipe right, to swipe through filters
- clockwise or counter-clockwise circle, to capture and save an image
- mid-air X, to undo all changes and revert to the original image
Our Arduino and Processing code is available here: https://gitlab.cs.washington.edu/eafurst/599h_final_project/tree/master/GestureRecorder
Step 4: Training the SVM
We chose to use Python to implement the remainder of our photo booth tool. We leveraged several python libraries to simplify training and testing a model for gesture classification.
We chose to train an SVM using the default RBF kernel for classification. An overview of the sklearn SVM library can be found here: https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html.
We selected 19 features based on our recorded gesture data for training and classification, and trained on a gesture set including 10 trials for each gesture. We did some processing on the data including detrending and smoothing to allow for less noise in our model. We calculated the magnitude for each signal when loading in the gesture CSV files. We also trained a scaler to scale the features in our model. Once our model was trained, we saved the model and feature scaler to a file for use in our main photo booth program.
The features we selected include:
- number of zero crossings
- number of peaks
- average peak height
- max x, y, z, and mag values
- min x, y, z, and mag values
- mean x, y, z, and mag values
- standard deviation of x, y, z, and mag values
Our code for exploring our data and training our SVM can be found here: https://gitlab.cs.washington.edu/eafurst/599h_final_project/blob/master/GestureRecognizer/trainSVMGestureRecognizer.ipynb
Step 5: Real Time Classification
Four real time classification, we program our Arduino to send a button flag (0,1), the Arduino time stamp, and the x, y, and z accelerometer values over the Serial port. Our python classification program reads in lines from Serial and looks for gestures. To segment the stream and find gestures, we use threshold values. To detect the beginning of a segment, we use a min max threshold value of 90. For a continuing event, our min max threshold value is 25. If an event is detected, the event signal is saved and passed to our classifier.
We then process the signal like we did when training our SVM; we calculate magnitude, smooth, and detrend. We then calculate the 19 selected features for the event signal and scale them using the saved scaler from training. These features are then passed to the saved SVM and a predicted label is returned. The predicted label is then used to control the photo booth program described in the next step.
You can see our code for event segmentation and classification here: https://gitlab.cs.washington.edu/eafurst/599h_final_project/blob/master/GestureRecognizer/gesture_rec.py
Step 6: Photo Booth Implementation
We used OpenCV and Python to implement our photo booth program. OpenCV provides an easy interface for using an attached webcam, as well as fast implementations of some filters.
We use the gesture interface designed before to navigate through the features of the photobooth application:
- Swipe Right - holding the box and swiping right flips through the different filter options, one by one
- Button Press - pressing the button while moving the box draws a design of circles on the screen
- Circle - drawing a circle (in either direction) triggers the countdown to take a photo
- Revert to original - drawing an 'X' symbol reverts the photobooth filters and erases any existing drawings
The 'Swipe Right' mode tabs through different filters, all implemented in OpenCV: grayscale, edge detection, cartoonize, and colormap.
The paintbrush mode is activated using the button press and waving the controller. We map accelerometer values in X and Y to 2D coordinates in the screen space of the photobooth; while this method is not precise, it produces a nice effect.
Our implementation code has more details: https://gitlab.cs.washington.edu/eafurst/599h_final_project/blob/master/GestureRecognizer/accel_rec.py
Step 7: Potential Obstacles / Challenges
If you choose to work with or extend this design, you may run into some challenges. Here we describe ones we encountered and how we overcame them:
- Slow filter implementations - we run the webcam at 30 FPS, the speed it captures frames at. This means any filters you choose to implement need to also run at that rate or faster. We wanted to include some blurring filters or more complex painterly effects, but didn't get them to run fast enough.
- Matching accelerometer sample speeds with webcam frame rates - to avoid complicated multiprocessing code, we kept all the logic for running the gesture recognition and photobooth application in the same loop. However, the accelerometer sample rate ran much faster than the webcam, and serial data would get backed up as we processed all the filter frames. To resolve this, we added a buffer that reads up to 10 serial lines at a time (see `accel_rec.py:466 - 471`). You may need to adjust these values depending on the accelerometer or Arduino used.
- Recorded gesture set - ensure that you have recorded at least 10 trials for each gesture you want to include in your application. It's also important to ensure your test recordings are made under similar position and environment conditions to the intended use situations. In our case, we made our gesture recordings before printing the enclosure, and didn't initially account for small movements and hand positioning in the enclosure. If you encounter this issue, you will likely need to retrain your model, like we did.