without conversation, there is no self-and-other

- Paul Pangaro

If conversation and identity arise together, as proposed by Gordon Pask’s Conversation Theory, what are the consequences (or limitations) of communicating without an other? Can we just as easily construct our identity and our environment while talking to ourselves? What is the sound of a one-sided conversation?

memememe#selfie is a sculpture that explores some of these questions, in the context of our contemporary preoccupation with trending, liking, following, etc. It might be the first instance of artificial unintelligence.

It’s also the first step of a longer project being developed in collaboration with Radamés Ajna, called memememe, about physical communication protocols and machine body language.

Step 1: Android + OpenCV

First step in getting phones to recognize themselves, is to install OpenCV on them.

Good news!!! The OpenCV people have a framework that simplifies writing, compiling and running OpenCV 2.4 apps for Android devices. You no longer have to download OpenCV source and (cross-)compile it like it’s 2010.

Instead, this is what we did:

1. Installed Android SDK, NDK, and Eclipse environments and plug-ins. There are lots of steps here, but also lots of resources online. The official Android SDK installation guide, the Android NDK installation guide, and the OpenCV intro to Android setup guide.

I wanted to try out the new Android Studio IDE, but as it didn’t support NDK development at the time, decided to stick with Eclipse, and used the Manual environment setup section of the OpenCV guide.

2. Installed OpenCV SDK Downloaded the OpenCV for Android library and examples from sourceforge, and followed the steps in this OpenCV setup guide.

3. Installed OpenCV Manager app from the play store The OpenCV Manager is an app that will check for OpenCV dynamic libs already installed on a phone, and download new libraries when needed. More info is available on this OpenCV page.

4. RTFM (or in this case, a short guide). This is a pretty good tutorial for getting a project started: it shows how to initialize OpenCV and get frames from the camera.

After this, we tested the face detection example on some still images and on ourselves.

Money! Even in the dark! On an old phone with Android 2.3.4!

Step 2: Haar Cascades -- an Intro

For better or worse, most cell phones and digital cameras today can detect human faces, and, as seen in the previous step, it doesn’t take too much effort to get simple face detection code running on an Android phone (or any other platform), using OpenCV.

This is all thanks to the Viola-Jones algorithm for face detection, using Haar-based cascade classifiers. There is lots of information about this online, but a very nice explanation can be found on the OpenCV website.

It’s basically a machine learning algorithm that uses a bunch of images of faces and non-faces to train a classifier that can later be used to detect faces in realtime.

The algorithm implemented in OpenCV can also be used to detect other things, as long as you have the right classifiers. My OpenCV distribution came with classifiers for eyes, upper body, hands, frontal face and profile face. While looking for information about how to train them, I found classifiers for bananas, pens and iPhones.

Actually, that last link is for more than just iPhones. In his Mirror Test project, Jeff Thompson is actually training computers to recognize themselves in a “non-utilitarian, flawed poetic/technological act”.

Similar to what we want, but since we have a very specific phone to detect, we decided to train our own classifier.

The OpenCV tutorial for Training Cascade Classifiers is a pretty good place to start. It explains the 2 binary utilities used in the process (opencv_createsamples and opencv_traincascade), and all of their command line arguments and options, but it doesn’t really give an example of a flow to follow, nor does it explain all the possible uses for the opencv_createsamples utility.

On the other hand, Naotoshi Seo’s tutorial is actually quite thorough and explains the 4 different uses for the opencv_createsamples utility. Thorsten Ball wrote a tutorial using Naotoshi Seo’s scripts to train a classifier to detect bananas, but it requires running some perl scripts and compiling some C++… too much work…

Jeff also has some nice notes about how he prepared his data, and a script for automatically iterating over a couple of options for the 2 utilities.

The way we did it was inspired by all of these tutorials, with some minor modifications and optimizations.

Step 3: Haar Cascades -- Gathering the Negatives

This is where we gather about 1000 images of non-phones. Some people use video for this… we followed Jeff and took them from this repository, with this command:

cd negativeImageDirectory
wget -nd -r -A "neg0*.jpg" http://tutorial-haartraining.googlecode.com/svn/trunk/data/negatives/

Creating a collection file for these is pretty easy using the following command:

cd negativeImageDirectory
ls -l1 *.jpg > negatives.txt

Step 4: Haar Cascades -- Gathering the Positives

This is where we gather about 1000 images of our phone. Some people use video, some people use scripts… we used scripts.

1. Pictures

This is where we take pictures of our phone. We don’t need 1000 of them. Somewhere between 15 and 20 should be enough. This is what our images looked like:

Since our object is pretty black, we used a white background, and took high-contrast pictures in order to make the next step easier. Also, the pictures don’t have to be large because OpenCV will shrink them anyway: ours were 1024×773.

2. Process

This is where we use a Processing script to read the images and mark where the object is. Since we used high-contrast and a white background, it’s pretty easy to get an initial guess by just keeping track of the min/max x- and y- positions of dark pixels. What is important here is to make sure that the aspect ratio of all of the marked objects is the same. In our case, this was 1:1, and the script makes sure all the marked images follow that:

In addition to cropping the image, the Processing script also spits out a text file that has information about where the object is on the original image. This is what Naotoshi call a description file format.

3.Make 100s

This is where we venture from Naotoshi… First, we run the following command for each of our cropped images:

opencv_createsamples -img cropped00.jpg \
-bg negativeImageDirectory/negatives.txt \
-info sampleImageDirectory/cropped00.txt \
-num 128 -maxxangle 0.0 -maxyangle 0.0 -maxzangle 0.3 \
-bgcolor 255 -bgthresh 8 -w 48 -h 48

Where cropped00.jpg is one of the cropped images from the Processing script, negatives.txt is the collection file for the negative images, cropped00.txt is where the opencv_createsamples utility will write its output description file.

This will generate 128 images by placing a slightly rotated and slightly brighter/darker version of cropped00.jpg on top of a randomly selected negative image. And because we used a white background when we took our pictures, specifying 255 as the -bgcolor makes the white on the cropped image transparent, giving us 128 images like these:

Running this command also generates a description file with information about where the cell phone is in each of the 128 images.

4. Make 1000s
If we had 15 pictures, running the previous step on each of them would have produced 1920 pictures of cell phones floating in random places. What we have to do now is collect all of them into a single .vec file before we can run the training utility.

First, we collect all 15 description files into one, by running this command:

cd sampleImageDirectory
cat cropped*.txt > positives.txt

Then, we can combine all of them into a single .vec file using this command:

opencv_createsamples -info sampleImageDirectory/positives.txt \
-bg negativeImageDirectory/negatives.txt \
-vec cropped.vec \
-num 1920 -w 48 -h 48

This will create 1920 cropped images of the cell phone, where each is rotated slightly different, and with a different background.

Step 5: Haar Cascades -- Training the Cascade

This is where we train a Haar Cascade Classifier using another OpenCV utility. Armed with about 1000 negative images and 2000 positive images, we can run this command to start training:

opencv_traincascade -data outputDirectory -vec cropped.vec \
-bg negativeImageDirectory/negatives.txt \
-numPos 1000 -numNeg 600 -numStages 20 \
-precalcValBufSize 1024 -precalcIdxBufSize 1024 \
-featureType HAAR \
-minHitRate 0.995 -maxFalseAlarmRate 0.5 \
-w 48 -h 48</p>

Most of these are the default values, one notable exception is the increase memory usage from 512Mb to 2Gb. Also, another thing to note, -numPos and -numNeg should be less than the total number of images actually available and described in the description/collection files. We found this out by trial and error, but it seems like the opencv_traincascade utility slowly increases the number of “consumed” images as it goes through the training stages, in order to meet the -minHitRate and -maxFalseAlarmRate, and when there are not enough images to consume, it crashes. For example, we specified -numPos 1000 for our runs, but by stage 10, it was “consuming” 1030 images.

If all goes well, a cascade.xml file should show up in the outputDirectory after a couple of hours (or days).

We wrote a script that automates most of this process.

With these settings, it took our training about 24 hours to complete. While waiting for the 20 stages to finish, the same opencv_traincascade command can be run in parallel to create a partial cascade file from the stages that are already complete. For example, the following command will generate a cascade from the first 10 stages of classifiers in the output directory:

opencv_traincascade -data outputDirectory -vec cropped.vec \
-bg negativeImageDirectory/negatives.txt \
-numPos 1000 -numNeg 600 -numStages 10 \
-precalcValBufSize 1024 -precalcIdxBufSize 1024 \
-featureType HAAR \
-minHitRate 0.995 -maxFalseAlarmRate 0.5 \
-w 48 -h 48</p>

It’s basically the same command, but with -numStages set to 10.

Step 6: Haar Cascades -- Testing

Some initial tests were done on a laptop computer, using ofxCv in openFrameworks.

Then, some more tests were done using an Android phone.

Hooray !!! It recognizes itself and its friends !!!

Step 7: memememe#selfie

Now the only thing missing is some logic to differentiate between detecting a phone (any phone), or detecting itself. We did this by simply flashing some stuff on the screen, and trying to detect the glow in the camera image.

Once the phone detects itself, it takes a selfie and posts it to its tumblr.

The code for this is on github.

The tumblr is here.

More info about the project.

And, some relevant blog posts, and a list of resources.