This is the final project for my UIC ME411 Mechatronics course: an interactive animatronic face! This build achieves realistic motion with full-range eye tracking, automatic blinking, and a moving mouth. It's designed to be responsive, beginning its simulated speech only after recognizing a thumbs-up hand gesture.

Supplies

General Electronics:

Webcam for computer vision - Can use any webcam or integrated camera
Laptop/Computer to use Arduino IDE + Python
Power Supply (I managed without one, but this caused the microcontroller to brown out. I worked out in software to reset the port connection when there was no data transfer, but this is not the smart way to do it)

Eye Mechanism/Mouth:

Servo Motors x7 - can use less expensive variety: Servo motor
Breadboard
M-M Jumper Wires
ELEGOO UNO R3 Board + Cable
Steel Wire
Various building materials: Hot Glue Gun, Popsicle Sticks, Balsa Wood, Cardboard
Access to 3D printer (see Step 1)

Costume Items:

Wig
Glasses
3D Printed Nose & Ears

Project code provided at the end

Step 1: Building the Eyes Mechanism

Follow the fantastic tutorial by instructables creator: MorganManly

Step 2: Computer Vision

This system integrates OpenCV (Python) for computer vision processing with microcontroller control via serial communication. It implements a spatial mapping logic to translate the target's pixel coordinates (0-640 range in X and Y) captured by the camera into precise microcontroller signals that control the physical limits of the eye's motion (left, right, up, and down).

Step 3: Building the Frame

A robust structural frame was fabricated using accessible materials and secured with hot glue. This framework was designed to precisely mount and stabilize the critical components, including the eye mechanism, camera module, and decorative costume elements. The frame was then painted white to look fine as wine.

Step 4: Include Facial Identifiers

The final hardware modifications were implemented to achieve a complete facial structure, adding a dynamic mouth mechanism, a nose, and ears. These components utilized 3D printable graciously provided by the open source community:

Nose Model: 3DTux (Printables)
Ear Model: Peter Farell (Printables)

Step 5: Integrate Gesture Controls

The final development phase focused on integrating gesture control capabilities into the computer vision system. This was achieved by leveraging Google's opensource media pipe libraries to analyze hand position and orientation in real-time. The gesture detection was then synced to mouth motion in order to play audio and similate speaking functionality.

Step 6: Python Code

import cv2

import serial

import time

import mediapipe as mp

import pygame

import os

# --- CONFIGURATION ---

ARDUINO_PORT = 'COM3' # Ensure this matches your port

BAUD_RATE = 9600

BUFFER_SECONDS = 1.0

AUDIO_FILE = 'success.mp3'

AUDIO_COOLDOWN = 12.0

print(f"Attempting to connect to {ARDUINO_PORT}...")

# --- SETUP SERIAL FUNCTION ---

def connect_arduino():

try:

ser = serial.Serial(ARDUINO_PORT, BAUD_RATE, timeout=1)

time.sleep(2) # Allow Arduino to reset

print(f"SUCCESS: Connected to Arduino on {ARDUINO_PORT}")

return ser

except Exception as e:

print(f"ERROR: Could not connect to Arduino. {e}")

return None

arduino = connect_arduino()

# --- HELPER: ROBUST SERIAL WRITE ---

def send_command(command_bytes):

global arduino

if arduino is None:

# Try to reconnect occasionally if lost

arduino = connect_arduino()

if arduino is None: return

try:

arduino.write(command_bytes)

except (serial.SerialException, OSError, PermissionError) as e:

print(f"CONNECTION LOST: {e}")

print("Attempting to reconnect...")

try:

arduino.close()

except:

pass

arduino = None

# Immediate retry

arduino = connect_arduino()

# --- SETUP AUDIO ---

try:

pygame.mixer.init()

if os.path.exists(AUDIO_FILE):

sound_effect = pygame.mixer.Sound(AUDIO_FILE)

print("Audio system initialized.")

else:

print(f"WARNING: Audio file '{AUDIO_FILE}' not found.")

sound_effect = None

except Exception as e:

print(f"Audio Error: {e}")

sound_effect = None

# --- SETUP COMPUTER VISION ---

face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')

mp_hands = mp.solutions.hands

mp_drawing = mp.solutions.drawing_utils

hands = mp_hands.Hands(

static_image_mode=False,

max_num_hands=1,

min_detection_confidence=0.7,

min_tracking_confidence=0.5

)

cap = cv2.VideoCapture(0, cv2.CAP_DSHOW)

if not cap.isOpened():

print("Error: Could not open video.")

exit()

# --- LOGIC VARIABLES ---

last_seen_time = time.time()

last_audio_time = 0

def is_thumbs_up(hand_landmarks):

thumb_tip = hand_landmarks.landmark[4]

thumb_ip = hand_landmarks.landmark[3]

index_tip = hand_landmarks.landmark[8]

index_pip = hand_landmarks.landmark[6]

middle_tip = hand_landmarks.landmark[12]

middle_pip = hand_landmarks.landmark[10]

ring_tip = hand_landmarks.landmark[16]

ring_pip = hand_landmarks.landmark[14]

pinky_tip = hand_landmarks.landmark[20]

pinky_pip = hand_landmarks.landmark[18]

thumb_is_up = thumb_tip.y < thumb_ip.y

index_folded = index_tip.y > index_pip.y

middle_folded = middle_tip.y > middle_pip.y

ring_folded = ring_tip.y > ring_pip.y

pinky_folded = pinky_tip.y > pinky_pip.y

return thumb_is_up and index_folded and middle_folded and ring_folded and pinky_folded

print("Tracking started. Press 'q' to quit.")

while True:

ret, frame = cap.read()

if not ret: break

frame = cv2.flip(frame, 1)

rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

# --- 1. FACE TRACKING ---

faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=8, minSize=(60, 60))

if len(faces) > 0:

last_seen_time = time.time()

target_face = sorted(faces, key=lambda f: f[2] * f[3], reverse=True)[0]

(x, y, w, h) = target_face

center_x = x + (w // 2)

center_y = y + (h // 2)

data = f"{center_x},{center_y}\n"

send_command(data.encode('utf-8'))

cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)

cv2.putText(frame, "LOCKED", (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 255, 0), 2)

else:

time_since_loss = time.time() - last_seen_time

if time_since_loss > BUFFER_SECONDS:

send_command(b"CLOSE\n")

cv2.putText(frame, "SLEEPING", (50, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)

else:

remaining = round(BUFFER_SECONDS - time_since_loss, 1)

cv2.putText(frame, f"SEARCHING... ({remaining}s)", (50, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 165, 255), 2)

# --- 2. HAND TRACKING ---

result = hands.process(rgb_frame)

if result.multi_hand_landmarks:

for hand_landmarks in result.multi_hand_landmarks:

mp_drawing.draw_landmarks(frame, hand_landmarks, mp_hands.HAND_CONNECTIONS)

if is_thumbs_up(hand_landmarks):

cv2.putText(frame, "THUMBS UP!", (50, 100), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)

current_time = time.time()

if sound_effect and (current_time - last_audio_time > AUDIO_COOLDOWN):

try:

sound_effect.play()

print("Audio Playing...")

# Send SPEAK command safely

send_command(b"SPEAK\n")

print("Sent SPEAK signal")

last_audio_time = current_time

except Exception as e:

print(f"Play Error: {e}")

cv2.imshow('Animatronic Eye Tracking', frame)

if cv2.waitKey(1) == ord('q'):

break

cap.release()

if arduino:

arduino.close()

cv2.destroyAllWindows()

Step 7: Arduino Code

#include <Servo.h>

// --- PIN ASSIGNMENTS ---

// Define which Arduino digital pin controls each servo motor.

const int L_BLINK_PIN = 3;

const int L_VERT_PIN = 11;

const int L_HORI_PIN = 10;

const int R_BLINK_PIN = 5;

const int R_VERT_PIN = 9;

const int R_HORI_PIN = 6;

const int MOUTH_PIN = 12;

// --- LEFT EYE SERVO POSITIONS (Angles in degrees) ---

const int L_BLINK_CLOSED = 50;

const int L_BLINK_OPEN = 140;

const int L_VERT_DOWN = 60;

const int L_VERT_CENTER = 90;

const int L_VERT_UP = 120;

const int L_HORI_LEFT = 60;

const int L_HORI_CENTER = 100;

const int L_HORI_RIGHT = 110;

// --- RIGHT EYE SERVO POSITIONS (Angles in degrees) ---

const int R_BLINK_CLOSED = 145;

const int R_BLINK_OPEN = 70;

const int R_VERT_DOWN = 40;

const int R_VERT_CENTER = 80;

const int R_VERT_UP = 100;

const int R_HORI_LEFT = 40;

const int R_HORI_CENTER = 85;

const int R_HORI_RIGHT = 110;

// --- MOUTH SERVO POSITIONS (Angles in degrees) ---

const int MOUTH_CLOSED = 89;

const int MOUTH_OPEN = 100;

// --- SERVO OBJECTS ---

// Create a Servo object for each physical servo motor.

Servo lBlinkServo;

Servo lVertServo;

Servo lHoriServo;

Servo rBlinkServo;

Servo rVertServo;

Servo rHoriServo;

Servo mouthServo;

// --- BLINKING VARIABLES ---

// Variables for natural, non-blocking blinking timing.

unsigned long lastBlinkTime = 0;

int blinkInterval = 3000;

// --- MOUTH ANIMATION VARIABLES ---

// Variables to control the mouth movement timing based on serial command.

unsigned long speakStartTime = 0; // Tracks when mouth movement should begin

unsigned long speakEndTime = 0; // Tracks when mouth movement should end

unsigned long lastMouthToggleTime = 0;

bool isMouthOpen = false;

void setup() {

Serial.begin(9600);

// Attach all eye servos to their assigned pins

lBlinkServo.attach(L_BLINK_PIN);

lVertServo.attach(L_VERT_PIN);

lHoriServo.attach(L_HORI_PIN);

rBlinkServo.attach(R_BLINK_PIN);

rVertServo.attach(R_VERT_PIN);

rHoriServo.attach(R_HORI_PIN);

// Attach the mouth servo

mouthServo.attach(MOUTH_PIN);

// Initialize the system to the default, "awake" state

openEyes();

lookCenter();

mouthServo.write(MOUTH_CLOSED);

}

void loop() {

// Check for incoming serial data from the computer vision script

if (Serial.available() > 0) {

String data = Serial.readStringUntil('\n');

data.trim();

// --- COMMAND: SLEEP ---

if (data == "CLOSE") {

closeEyes();

}

// --- COMMAND: SPEAK (Trigger mouth movement) ---

else if (data == "SPEAK") {

triggerSpeaking();

}

// --- COMMAND: TRACK (Receiving X,Y coordinates) ---

else {

// Force eyes open if the vision system sends a tracking command

openEyes();

// Parse the incoming string for X and Y values (e.g., "320,240")

int commaIndex = data.indexOf(',');

if (commaIndex > 0) {

int x = data.substring(0, commaIndex).toInt();

int y = data.substring(commaIndex + 1).toInt();

moveEyes(x, y);

}

// --- MOUTH ANIMATION (Runs continuously without blocking) ---

updateMouth();

// --- NATURAL BLINKING (Runs continuously without blocking) ---

if (lBlinkServo.read() == L_BLINK_OPEN) {

if (millis() - lastBlinkTime > blinkInterval) {

performQuickBlink();

lastBlinkTime = millis();

// Randomize the next blink interval for a natural look

blinkInterval = random(2000, 5000);

}

// ------------------------------------------------------------------

// --- CORE ACTION FUNCTIONS ---

// ------------------------------------------------------------------

void closeEyes() {

lBlinkServo.write(L_BLINK_CLOSED);

rBlinkServo.write(R_BLINK_CLOSED);

}

void openEyes() {

lBlinkServo.write(L_BLINK_OPEN);

rBlinkServo.write(R_BLINK_OPEN);

}

void performQuickBlink() {

lBlinkServo.write(L_BLINK_CLOSED);

rBlinkServo.write(R_BLINK_CLOSED);

delay(150); // Quick shut duration

lBlinkServo.write(L_BLINK_OPEN);

rBlinkServo.write(R_BLINK_OPEN);

}

// Initiates the timing for the mouth movement sequence

void triggerSpeaking() {

unsigned long currentMillis = millis();

// Start the actual mouth toggling after a short pause (2.5 seconds)

speakStartTime = currentMillis + 2500;

// Set the total duration of the mouth movement (10 seconds after start)

speakEndTime = speakStartTime + 10000;

}

// Toggles the mouth open/closed state during the speaking interval

void updateMouth() {

unsigned long currentMillis = millis();

// Check if we are within the designated speaking window

if (currentMillis >= speakStartTime && currentMillis < speakEndTime) {

// Toggle the mouth position every 200ms for animation

if (currentMillis - lastMouthToggleTime > 200) {

lastMouthToggleTime = currentMillis;

isMouthOpen = !isMouthOpen;

if (isMouthOpen) {

mouthServo.write(MOUTH_OPEN);

} else {

mouthServo.write(MOUTH_CLOSED);

}

} else {

// If outside the speaking window (waiting or finished), ensure it's closed

mouthServo.write(MOUTH_CLOSED);

isMouthOpen = false;

}

// Resets both eyes to the default center position

void lookCenter() {

lVertServo.write(L_VERT_CENTER);

lHoriServo.write(L_HORI_CENTER);

rVertServo.write(R_VERT_CENTER);

rHoriServo.write(R_HORI_CENTER);

}

// Maps X,Y pixel coordinates (from camera) to servo angles (physical limits)

void moveEyes(int x, int y) {

// Map X-coordinate (horizontal) to left and right eye servo angles

int lHoriAngle = map(x, 100, 540, L_HORI_LEFT, L_HORI_RIGHT);

int rHoriAngle = map(x, 100, 540, R_HORI_LEFT, R_HORI_RIGHT);

// Map Y-coordinate (vertical) to up and down servo angles

int lVertAngle = map(y, 0, 480, L_VERT_UP, L_VERT_DOWN);

int rVertAngle = map(y, 0, 480, R_VERT_UP, R_VERT_DOWN);

// Constrain the calculated angles to the physical limits defined by the constants

lHoriAngle = constrain(lHoriAngle, L_HORI_LEFT, L_HORI_RIGHT);

rHoriAngle = constrain(rHoriAngle, R_HORI_LEFT, R_HORI_RIGHT);

lVertAngle = constrain(lVertAngle, L_VERT_DOWN, L_VERT_UP);

rVertAngle = constrain(rVertAngle, R_VERT_DOWN, R_VERT_UP);

// Send the new positions to the servos

lHoriServo.write(lHoriAngle);

rHoriServo.write(rHoriAngle);

lVertServo.write(lVertAngle);

rVertServo.write(rVertAngle);

}

Step 8: Reflection

The most significant revelation during this project was the maturity and completeness of open-source documentation and resources available for computer vision. Contrary to my initial assumptions, where I expected computer vision integration to be the major bottleneck, the mechanical construction of the eye mechanism emerged as the primary engineering challenge. The assembly demanded precise manipulation of steel wire, and the absence of appropriate tools required... creative solutions.

Two core concepts can be pursued for future development:

Gaze Tracking and Convergence: The current system lacks depth perception, as the eyes remain fixed even when a target approaches closely. A valuable feature would be to enable the eyes to converge (cross inward) as a detected target moves closer to the center of the field of view, creating the effect of realistic focusing.
System Consolidation: The current architecture is distributed across several components (an Arduino-based board, a webcam, and a host laptop). Scaling down the entire system by combining the processing and control onto a single embedded platform, such as a Raspberry Pi or a Jetson Nano, would allow the entire project to be packaged into a highly integrated and compact unit.

Introduction: Eye Tracking Animatronic Face

Supplies

Step 1: Building the Eyes Mechanism

Step 2: Computer Vision

Step 3: Building the Frame

Step 4: Include Facial Identifiers

Attachments

Step 5: Integrate Gesture Controls

Step 6: Python Code

Step 7: Arduino Code

Step 8: Reflection