Introduction: Eye Tracking Animatronic Face

This is the final project for my UIC ME411 Mechatronics course: an interactive animatronic face! This build achieves realistic motion with full-range eye tracking, automatic blinking, and a moving mouth. It's designed to be responsive, beginning its simulated speech only after recognizing a thumbs-up hand gesture.

Supplies

General Electronics:

  1. Webcam for computer vision - Can use any webcam or integrated camera
  2. Laptop/Computer to use Arduino IDE + Python
  3. Power Supply (I managed without one, but this caused the microcontroller to brown out. I worked out in software to reset the port connection when there was no data transfer, but this is not the smart way to do it)


Eye Mechanism/Mouth:

  1. Servo Motors x7 - can use less expensive variety: Servo motor
  2. Breadboard
  3. M-M Jumper Wires
  4. ELEGOO UNO R3 Board + Cable
  5. Steel Wire
  6. Various building materials: Hot Glue Gun, Popsicle Sticks, Balsa Wood, Cardboard
  7. Access to 3D printer (see Step 1)


Costume Items:

  1. Wig
  2. Glasses
  3. 3D Printed Nose & Ears

Project code provided at the end

Step 1: Building the Eyes Mechanism

Follow the fantastic tutorial by instructables creator: MorganManly

Step 2: Computer Vision

This system integrates OpenCV (Python) for computer vision processing with microcontroller control via serial communication. It implements a spatial mapping logic to translate the target's pixel coordinates (0-640 range in X and Y) captured by the camera into precise microcontroller signals that control the physical limits of the eye's motion (left, right, up, and down).

Step 3: Building the Frame

A robust structural frame was fabricated using accessible materials and secured with hot glue. This framework was designed to precisely mount and stabilize the critical components, including the eye mechanism, camera module, and decorative costume elements. The frame was then painted white to look fine as wine.

Step 4: Include Facial Identifiers

The final hardware modifications were implemented to achieve a complete facial structure, adding a dynamic mouth mechanism, a nose, and ears. These components utilized 3D printable graciously provided by the open source community:

  1. Nose Model: 3DTux (Printables)
  2. Ear Model: Peter Farell (Printables)



Step 5: Integrate Gesture Controls

The final development phase focused on integrating gesture control capabilities into the computer vision system. This was achieved by leveraging Google's opensource media pipe libraries to analyze hand position and orientation in real-time. The gesture detection was then synced to mouth motion in order to play audio and similate speaking functionality.

Step 6: Python Code

import cv2
import serial
import time
import mediapipe as mp
import pygame
import os

# --- CONFIGURATION ---
ARDUINO_PORT = 'COM3' # Ensure this matches your port
BAUD_RATE = 9600
BUFFER_SECONDS = 1.0
AUDIO_FILE = 'success.mp3'
AUDIO_COOLDOWN = 12.0

print(f"Attempting to connect to {ARDUINO_PORT}...")


# --- SETUP SERIAL FUNCTION ---
def connect_arduino():
try:
ser = serial.Serial(ARDUINO_PORT, BAUD_RATE, timeout=1)
time.sleep(2) # Allow Arduino to reset
print(f"SUCCESS: Connected to Arduino on {ARDUINO_PORT}")
return ser
except Exception as e:
print(f"ERROR: Could not connect to Arduino. {e}")
return None


arduino = connect_arduino()


# --- HELPER: ROBUST SERIAL WRITE ---
def send_command(command_bytes):
global arduino
if arduino is None:
# Try to reconnect occasionally if lost
arduino = connect_arduino()
if arduino is None: return

try:
arduino.write(command_bytes)
except (serial.SerialException, OSError, PermissionError) as e:
print(f"CONNECTION LOST: {e}")
print("Attempting to reconnect...")
try:
arduino.close()
except:
pass
arduino = None
# Immediate retry
arduino = connect_arduino()


# --- SETUP AUDIO ---
try:
pygame.mixer.init()
if os.path.exists(AUDIO_FILE):
sound_effect = pygame.mixer.Sound(AUDIO_FILE)
print("Audio system initialized.")
else:
print(f"WARNING: Audio file '{AUDIO_FILE}' not found.")
sound_effect = None
except Exception as e:
print(f"Audio Error: {e}")
sound_effect = None

# --- SETUP COMPUTER VISION ---
face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')
mp_hands = mp.solutions.hands
mp_drawing = mp.solutions.drawing_utils
hands = mp_hands.Hands(
static_image_mode=False,
max_num_hands=1,
min_detection_confidence=0.7,
min_tracking_confidence=0.5
)

cap = cv2.VideoCapture(0, cv2.CAP_DSHOW)
if not cap.isOpened():
print("Error: Could not open video.")
exit()

# --- LOGIC VARIABLES ---
last_seen_time = time.time()
last_audio_time = 0


def is_thumbs_up(hand_landmarks):
thumb_tip = hand_landmarks.landmark[4]
thumb_ip = hand_landmarks.landmark[3]
index_tip = hand_landmarks.landmark[8]
index_pip = hand_landmarks.landmark[6]
middle_tip = hand_landmarks.landmark[12]
middle_pip = hand_landmarks.landmark[10]
ring_tip = hand_landmarks.landmark[16]
ring_pip = hand_landmarks.landmark[14]
pinky_tip = hand_landmarks.landmark[20]
pinky_pip = hand_landmarks.landmark[18]

thumb_is_up = thumb_tip.y < thumb_ip.y
index_folded = index_tip.y > index_pip.y
middle_folded = middle_tip.y > middle_pip.y
ring_folded = ring_tip.y > ring_pip.y
pinky_folded = pinky_tip.y > pinky_pip.y

return thumb_is_up and index_folded and middle_folded and ring_folded and pinky_folded


print("Tracking started. Press 'q' to quit.")

while True:
ret, frame = cap.read()
if not ret: break

frame = cv2.flip(frame, 1)
rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

# --- 1. FACE TRACKING ---
faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=8, minSize=(60, 60))

if len(faces) > 0:
last_seen_time = time.time()
target_face = sorted(faces, key=lambda f: f[2] * f[3], reverse=True)[0]
(x, y, w, h) = target_face
center_x = x + (w // 2)
center_y = y + (h // 2)

data = f"{center_x},{center_y}\n"
send_command(data.encode('utf-8'))

cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)
cv2.putText(frame, "LOCKED", (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 255, 0), 2)
else:
time_since_loss = time.time() - last_seen_time
if time_since_loss > BUFFER_SECONDS:
send_command(b"CLOSE\n")
cv2.putText(frame, "SLEEPING", (50, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)
else:
remaining = round(BUFFER_SECONDS - time_since_loss, 1)
cv2.putText(frame, f"SEARCHING... ({remaining}s)", (50, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 165, 255), 2)

# --- 2. HAND TRACKING ---
result = hands.process(rgb_frame)
if result.multi_hand_landmarks:
for hand_landmarks in result.multi_hand_landmarks:
mp_drawing.draw_landmarks(frame, hand_landmarks, mp_hands.HAND_CONNECTIONS)
if is_thumbs_up(hand_landmarks):
cv2.putText(frame, "THUMBS UP!", (50, 100), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)

current_time = time.time()
if sound_effect and (current_time - last_audio_time > AUDIO_COOLDOWN):
try:
sound_effect.play()
print("Audio Playing...")
# Send SPEAK command safely
send_command(b"SPEAK\n")
print("Sent SPEAK signal")
last_audio_time = current_time
except Exception as e:
print(f"Play Error: {e}")

cv2.imshow('Animatronic Eye Tracking', frame)
if cv2.waitKey(1) == ord('q'):
break

cap.release()
if arduino:
arduino.close()
cv2.destroyAllWindows()

Step 7: Arduino Code

#include <Servo.h>

// --- PIN ASSIGNMENTS ---
// Define which Arduino digital pin controls each servo motor.
const int L_BLINK_PIN = 3;
const int L_VERT_PIN = 11;
const int L_HORI_PIN = 10;
const int R_BLINK_PIN = 5;
const int R_VERT_PIN = 9;
const int R_HORI_PIN = 6;
const int MOUTH_PIN = 12;

// --- LEFT EYE SERVO POSITIONS (Angles in degrees) ---
const int L_BLINK_CLOSED = 50;
const int L_BLINK_OPEN = 140;
const int L_VERT_DOWN = 60;
const int L_VERT_CENTER = 90;
const int L_VERT_UP = 120;
const int L_HORI_LEFT = 60;
const int L_HORI_CENTER = 100;
const int L_HORI_RIGHT = 110;

// --- RIGHT EYE SERVO POSITIONS (Angles in degrees) ---
const int R_BLINK_CLOSED = 145;
const int R_BLINK_OPEN = 70;
const int R_VERT_DOWN = 40;
const int R_VERT_CENTER = 80;
const int R_VERT_UP = 100;
const int R_HORI_LEFT = 40;
const int R_HORI_CENTER = 85;
const int R_HORI_RIGHT = 110;

// --- MOUTH SERVO POSITIONS (Angles in degrees) ---
const int MOUTH_CLOSED = 89;
const int MOUTH_OPEN = 100;

// --- SERVO OBJECTS ---
// Create a Servo object for each physical servo motor.
Servo lBlinkServo;
Servo lVertServo;
Servo lHoriServo;
Servo rBlinkServo;
Servo rVertServo;
Servo rHoriServo;
Servo mouthServo;

// --- BLINKING VARIABLES ---
// Variables for natural, non-blocking blinking timing.
unsigned long lastBlinkTime = 0;
int blinkInterval = 3000;

// --- MOUTH ANIMATION VARIABLES ---
// Variables to control the mouth movement timing based on serial command.
unsigned long speakStartTime = 0; // Tracks when mouth movement should begin
unsigned long speakEndTime = 0; // Tracks when mouth movement should end
unsigned long lastMouthToggleTime = 0;
bool isMouthOpen = false;

void setup() {
Serial.begin(9600);
// Attach all eye servos to their assigned pins
lBlinkServo.attach(L_BLINK_PIN);
lVertServo.attach(L_VERT_PIN);
lHoriServo.attach(L_HORI_PIN);
rBlinkServo.attach(R_BLINK_PIN);
rVertServo.attach(R_VERT_PIN);
rHoriServo.attach(R_HORI_PIN);
// Attach the mouth servo
mouthServo.attach(MOUTH_PIN);

// Initialize the system to the default, "awake" state
openEyes();
lookCenter();
mouthServo.write(MOUTH_CLOSED);
}

void loop() {
// Check for incoming serial data from the computer vision script
if (Serial.available() > 0) {
String data = Serial.readStringUntil('\n');
data.trim();

// --- COMMAND: SLEEP ---
if (data == "CLOSE") {
closeEyes();
}
// --- COMMAND: SPEAK (Trigger mouth movement) ---
else if (data == "SPEAK") {
triggerSpeaking();
}
// --- COMMAND: TRACK (Receiving X,Y coordinates) ---
else {
// Force eyes open if the vision system sends a tracking command
openEyes();
// Parse the incoming string for X and Y values (e.g., "320,240")
int commaIndex = data.indexOf(',');
if (commaIndex > 0) {
int x = data.substring(0, commaIndex).toInt();
int y = data.substring(commaIndex + 1).toInt();
moveEyes(x, y);
}
}
}

// --- MOUTH ANIMATION (Runs continuously without blocking) ---
updateMouth();

// --- NATURAL BLINKING (Runs continuously without blocking) ---
if (lBlinkServo.read() == L_BLINK_OPEN) {
if (millis() - lastBlinkTime > blinkInterval) {
performQuickBlink();
lastBlinkTime = millis();
// Randomize the next blink interval for a natural look
blinkInterval = random(2000, 5000);
}
}
}

// ------------------------------------------------------------------
// --- CORE ACTION FUNCTIONS ---
// ------------------------------------------------------------------

void closeEyes() {
lBlinkServo.write(L_BLINK_CLOSED);
rBlinkServo.write(R_BLINK_CLOSED);
}

void openEyes() {
lBlinkServo.write(L_BLINK_OPEN);
rBlinkServo.write(R_BLINK_OPEN);
}

void performQuickBlink() {
lBlinkServo.write(L_BLINK_CLOSED);
rBlinkServo.write(R_BLINK_CLOSED);
delay(150); // Quick shut duration
lBlinkServo.write(L_BLINK_OPEN);
rBlinkServo.write(R_BLINK_OPEN);
}

// Initiates the timing for the mouth movement sequence
void triggerSpeaking() {
unsigned long currentMillis = millis();
// Start the actual mouth toggling after a short pause (2.5 seconds)
speakStartTime = currentMillis + 2500;
// Set the total duration of the mouth movement (10 seconds after start)
speakEndTime = speakStartTime + 10000;
}

// Toggles the mouth open/closed state during the speaking interval
void updateMouth() {
unsigned long currentMillis = millis();

// Check if we are within the designated speaking window
if (currentMillis >= speakStartTime && currentMillis < speakEndTime) {
// Toggle the mouth position every 200ms for animation
if (currentMillis - lastMouthToggleTime > 200) {
lastMouthToggleTime = currentMillis;
isMouthOpen = !isMouthOpen;
if (isMouthOpen) {
mouthServo.write(MOUTH_OPEN);
} else {
mouthServo.write(MOUTH_CLOSED);
}
}
} else {
// If outside the speaking window (waiting or finished), ensure it's closed
mouthServo.write(MOUTH_CLOSED);
isMouthOpen = false;
}
}

// Resets both eyes to the default center position
void lookCenter() {
lVertServo.write(L_VERT_CENTER);
lHoriServo.write(L_HORI_CENTER);
rVertServo.write(R_VERT_CENTER);
rHoriServo.write(R_HORI_CENTER);
}

// Maps X,Y pixel coordinates (from camera) to servo angles (physical limits)
void moveEyes(int x, int y) {
// Map X-coordinate (horizontal) to left and right eye servo angles
int lHoriAngle = map(x, 100, 540, L_HORI_LEFT, L_HORI_RIGHT);
int rHoriAngle = map(x, 100, 540, R_HORI_LEFT, R_HORI_RIGHT);
// Map Y-coordinate (vertical) to up and down servo angles
int lVertAngle = map(y, 0, 480, L_VERT_UP, L_VERT_DOWN);
int rVertAngle = map(y, 0, 480, R_VERT_UP, R_VERT_DOWN);

// Constrain the calculated angles to the physical limits defined by the constants
lHoriAngle = constrain(lHoriAngle, L_HORI_LEFT, L_HORI_RIGHT);
rHoriAngle = constrain(rHoriAngle, R_HORI_LEFT, R_HORI_RIGHT);
lVertAngle = constrain(lVertAngle, L_VERT_DOWN, L_VERT_UP);
rVertAngle = constrain(rVertAngle, R_VERT_DOWN, R_VERT_UP);

// Send the new positions to the servos
lHoriServo.write(lHoriAngle);
rHoriServo.write(rHoriAngle);
lVertServo.write(lVertAngle);
rVertServo.write(rVertAngle);
}

Step 8: Reflection

The most significant revelation during this project was the maturity and completeness of open-source documentation and resources available for computer vision. Contrary to my initial assumptions, where I expected computer vision integration to be the major bottleneck, the mechanical construction of the eye mechanism emerged as the primary engineering challenge. The assembly demanded precise manipulation of steel wire, and the absence of appropriate tools required... creative solutions.

Two core concepts can be pursued for future development:

  1. Gaze Tracking and Convergence: The current system lacks depth perception, as the eyes remain fixed even when a target approaches closely. A valuable feature would be to enable the eyes to converge (cross inward) as a detected target moves closer to the center of the field of view, creating the effect of realistic focusing.
  2. System Consolidation: The current architecture is distributed across several components (an Arduino-based board, a webcam, and a host laptop). Scaling down the entire system by combining the processing and control onto a single embedded platform, such as a Raspberry Pi or a Jetson Nano, would allow the entire project to be packaged into a highly integrated and compact unit.