Introduction: DuvelBot - ESP32-CAM Beer Serving Robot

Following a hard day's work, nothing comes close to sipping your favorite beer on the couch. In my case, that's the Belgian blond ale "Duvel". However, after all but collapsing we are confronted with a most serious problem: the fridge containing my Duvel is an unbridgeable 20 feet removed from said couch.

While some light coercion from my side might move an occasional teenaged fridge scavenger to pour out my week's allowance of Duvel, the task of actually delivering it to its nearly exhausted progenitor is obviously one step too far.

Time to break out the soldering iron and keyboard...

DuvelBot is a no-frills AI-Thinker ESP32-CAM based driving webcam, that you can control from your smartphone, browser or tablet.

It's easy to adapt or expand this platform to less alcoholic uses (think SpouseSpy, NeighbourWatch, KittyCam...).

I built this robot mainly to learn a bit about the whole web programming and IoT stuff, of which I knew nothing about. So at the end of this Instructable is an elaborate explanation of how that works.

Many parts of this Instructable are based on the excellent explanations found at Random Nerd Tutorials, so please go give them a visit!


What you need:

The parts list isn't carved in stone and many parts can be obtained in a ton of different versions and from many different places. I procured most from Ali-Express. Like Machete said: improvise.



  • A fisheye camera with longer flex than the standard OV2460 camera provided with the ESP32-CAM module,
  • WiFi antenna with suitably long cable and Ultra Miniature Coax Connector, like this. The ESP32-CAM has an onboard antenna and the housing is plastic, so an antenna is not really needed, however I thought it looked cool, so...
  • Inkjet printable sticker paper for the top cover design.

The usual hardware tools: soldering iron, drills, screwdrivers, pliers...

Step 1: Building the Robot Platform

The schematic:

The schematic is nothing special. The ESP32-cam controls the motors via the L298N motor driver board, which has two channels. Motors of left and right side are placed in parallel and each side occupies one channel. Four small 10..100nF ceramic capacitors close to the motor pins are as always advisable to counter RF interference. Also, a large electrolytic cap (2200...4700uF) on the supply of the motor board as shown in the schematic, while not strictly needed, can limit the supply voltage ripple a bit (if you want to see a horror movie, then probe Vbat with an oscilloscope while the motors are active).

Note that both motor channels ENABLE pins are driven by the same pulse-width modulated (PWM) pin of the ESP32 (IO12). This is because the ESP32-CAM module doesn't have a ton of GPIOs (module's schematic included for reference). The robot's LEDs are driven by IO4, which also drives the onboard flash LED, so remove Q1 to prevent the flash LED to light in a closed housing.

Programming button, on/off switch, charging connector and programming connector are accessible underneath the robot. I could have done a much better job for the programming connector (3.5mm jack?), but the beer couldn't wait anymore. Also over-the-air-updates (OTA) would be nice to setup.

To put the robot in programming mode, press the programming button (this pulls IO0 low) and then switch it on.

Important: to charge the NiMH batteries of the robot, use a lab supply set (unloaded) to about 14V and current limited to 250mA.The voltage will adapt to the voltage of the batteries. Disconnect if the robot feels hot or the battery voltage reaches about 12.5V. An obvious improvement here would be to integrate a proper battery charger, but that's outside of the scope of this Instructable.

The hardware:

Please also see the notes in the pictures. The housing is mounted on the robot base using 4 M4 bolts and self-locking nuts. Note the rubber tubing used as distance spacers. Hopefully, this also gives some suspension to the Duvel, should the ride prove bumpy. The ESP32-CAM module and L298N motor board are mounted in the housing using plastic sticky feet (not sure of the right name in English), to prevent having to drill extra holes. Also the ESP32 is mounted on its own perfboard and pluggable pinheaders. This makes it easy to swap out the ESP32.

Don't forget: if you are going with an external WiFi antennna instead of the built-in one, then also solder the antenna-selection jumper on the underside of the ESP32-CAM board.

Print out the top logo in the file DuvelBot.svg on inkjet sticker paper (or design your own), and you're ready to go!

Step 2: Program the Robot

It is advisable to program the robot before you close it, to make sure everything works and no magic smoke appears.

You need the following software tools:

  • The Arduino IDE,
  • The ESP32 libraries, SPIFFS (serial peripheral flash file system), ESPAsync Webserver library.

The latter can be installed by following this randomnerdtutorial up to and including the section "organizing your files". I really could not explain it any better.

The code:

My code can be found at:

  • An Arduino sketch DuvelBot.ino,
  • A data subfolder which holds the files that are going to be uploaded to the ESP flash using SPIFFS. This folder contains the webpage that the ESP will serve (index.html), a logo image that is part of the webpage (duvel.png) and a cascaded style sheet or CSS file (style.css).

To program the robot:

  • Connect the USB-TTL converter as shown in the schematic,
  • File -> Open -> go to folder where DuvelBot.ino is at.
  • Change your network credentials in the sketch:
const char* ssid = "yourNetworkSSIDHere";<br>const char* password = "yourPasswordHere";<br>
  • Tools -> Board -> "AI-Thinker ESP-32 CAM" and select the appropriate serial port for your pc (Tools -> Port -> something like /dev/ttyUSB0 or COM4),
  • Open the serial monitor in the Arduino IDE,While pressing the PROG button (that pulls IO0 low), switch on the robot,
  • Check on the serial monitor that the ESP32 is ready for download,
  • Close the serial monitor (otherwise the SPIFFS upload fails),
  • Tools -> "ESP32 Sketch Data Upload" and wait for it to finish,
  • Switch off and on again holding the PROG button to return to programming mode,
  • Press the "Upload" arrow to program the sketch and wait for it to finish,
  • Open the serial monitor and reset the ESP32 by switching off/on,
  • Once it has booted, note down the ip address (something like and disconnect the robot from the USB-TTL converter,
  • Open a browser at this ip address. You should see the interface as in the picture.
  • Optional: set the mac-address of the ESP32 to a fixed ip address in your router (depends on router how to do).

That's it! Read on if you want to know how it works...

Step 3: How It Works

Now we come to the interesting part: how does it all work together?

I will try to explain it but please keep in mind Kajnjaps is not a web programming specialist. In fact, learning a bit of web programming was the whole premise of building DuvelBot. If I make obvious mistakes, please leave a comment!

Ok, after ESP32 is switched on, as usual in setup it initializes the GPIOs, associates them with PWM timers for motor and LED control. See here for more on the motor control, it's pretty standard.

Then the camera is configured. I deliberately kept the resolution quite low (VGA or 640x480) to avoid sluggish response. Note the AI-Thinker ESP32-CAM board has a serial ram chip (PSRAM) that it uses to store camera frames of larger resolution:

if(psramFound())<br>  {<br>    Serial.println("PSRAM found.");<br>    config.frame_size = FRAMESIZE_VGA; <br>    config.jpeg_quality = 12;<br>    config.fb_count = 2; //number of framebuffers see: <a href=""></a><br>  } <br>  else <br>  {<br>    Serial.println("no PSRAM found.");<br>    config.frame_size = FRAMESIZE_QVGA;<br>    config.jpeg_quality = 12;<br>    config.fb_count = 1;<br>  }<br>

Then the serial peripheral flash file system (SPIFFS) is initialized:

 //initialize SPIFFS<br>  if(!SPIFFS.begin(true))<br>    { Serial.println("An Error has occurred while mounting SPIFFS!");<br>      return;<br>    }<br><br>

SPIFFS acts like a little filesystem on the ESP32. Here it is used to store three files: the webpage itself index.html, a cascaded file stylesheet style.css, and a png image logo duvel.png. These files will be served by the ESP32 to whoever connects to it as a client. While it is possible and easy to serve the entire webpage from within the sketch by doing a server.send(...), very similar to doing a serial.println() on a large text string, it is easier to just serve a file instead, as this also works for images and other non-text data.

Next the ESP32 connects to your router (don't forget to set your credentials before uploading):

 //change credentials of your router here<br>const char* ssid = "yourNetworkSSIDHere";<br>const char* password = "yourPasswordHere";

//connect to WiFi<br>  Serial.print("Connecting to WiFi");<br>  WiFi.begin(ssid, password);<br>  while (WiFi.status() != WL_CONNECTED)<br>    {<br>      Serial.print('.');<br>      delay(500);<br>    }
//now connected to the router: ESP32 now has ip address<br>

To actually do something useful, we start an asynchronous webserver:

//create an AsyncWebServer object on port 80<br>AsyncWebServer server(80);

server.begin(); //start listening for connections<br>

Now, if you type in the ip address that was assigned to the ESP32 by the router in a browser's address bar, the ESP32 gets a request. This means it should respond to the client (you, or your browser) by serving it something, for example a webpage.

The ESP32 knows how to respond, because in setup the responses to all possible allowed requests have been registered using server.on(). For example, the main webpage or index (/) is handled like this:

  server.on("/", HTTP_GET, [](AsyncWebServerRequest *request){<br>    Serial.println(" / request received!");<br>    request->send(SPIFFS, "/index.html", String(), false, processor);<br>  });<br>

So if the client connects, the ESP32 responds by sending the file index.html from the SPIFFS filesystem. The parameter processor is the name of a function that preprocesses the html and replaces any special tags:

// Replaces placeholders in the html like %DATA%<br>// with the variables you want to show<br>// <p>Data: <strong>%DATA%</strong></p><br>String processor(const String& var)<br>{<br>  if(var == "DATA"){<br>    //Serial.println("in processor!");<br>    return String(dutyCycleNow);<br>  }    <br>  return String();<br>}<br>

Now, lets disect the webpage index.html itself. In general there are always three parts:

  1. html code: what elements should be shown (buttons/text/sliders/images etc.),
  2. style code, either in a separate .css file or in a ... section: what the elements should look like,
  3. javascript a ... section: how the webpage should act.

Once index.html loads into the browser (which knows it is html because of the DOCTYPE line), it runs into this line:

 <link rel="stylesheet" type="text/css" href="style.css"><br>

That is a request for a css style sheet. The location of this sheet is given in href="...". So what does your browser do? Right, it launches another request to the server, this time for style.css. The server captures this request, because it was registered:

  server.on("/style.css", HTTP_GET, [](AsyncWebServerRequest *request){<br>    Serial.println(" css request received");<br>    request->send(SPIFFS, "/style.css", "text/css");<br>  });<br>

Neat huh? Incidentally, it could have been href="/some/file/on/the/other/side/of/the/moon", for all your browser cared. It would go fetch that file just as happily. I won't explain about the stylesheet since it just controls appearances so it's not really interesting here, but if you want to learn more, check this tutorial out.

How does the DuvelBot logo appear? In index.html we have:

<p><img src="duvel"  class="logo"></p><br>

to which the ESP32 responds with:

server.on("/duvel", HTTP_GET, [](AsyncWebServerRequest *request){<br>    Serial.println("duvel logo request received!");<br>    request->send(SPIFFS, "/duvel.png", "image/png");<br>  });<br>

..another SPIFFS file, this time a complete image, as indicated by "image/png" in the response.

Now we come to the really interesting part: the code for the buttons. Let's focus on the FORWARD button:

 <button class="button" onmousedown="toggleCheckbox('forward');" ontouchstart="toggleCheckbox('forward');" onmouseup="toggleCheckbox('stop');" ontouchend="toggleCheckbox('stop');">FORWARD</button><br>

The class="..." name is only a name to link it to the stylesheet to customize the size, color, etc. The important parts are onmousedown="toggleCheckbox('forward')" and onmouseup="toggleCheckbox('stop')". These constitute the actions of the button (same for ontouchstart/ontouchend but for that is touchscreens/phones). Here, the button action calls a function toggleCheckbox(x) in the javascript section:

function toggleCheckbox(x){<br>     var xhr = new XMLHttpRequest();<br>"GET", "/" + x, true);<br>     xhr.send();<br>     //could do something with the response too when ready, but we don't<br>   }<br>

So pressing the forward button, immediately results in toggleCheckbox('forward') getting called. This function then launches an XMLHttpRequest "GET", of the location "/forward" which acts just like if you would have typed in your browser address bar. Once this request arrives at the ESP32, it is handled by:

server.on("/forward", HTTP_GET, [](AsyncWebServerRequest *request){<br>    Serial.println("received /forward");<br>    actionNow = FORWARD;<br>    request->send(200, "text/plain", "OK forward.");<br>  });

Now the ESP32 simply replies with a text "OK forward". Note toggleCheckBox() doesn't do anything with (or wait on) this response, however it could as shown later in the camera code.

In itself during this response, the program only sets a variable actionNow = FORWARD, as response to pressing the button. Now in the mainloop of the program, this variable is monitored with the goal of ramping up/down the PWM of the motors. The logic is: as long as we have an action that is not STOP, ramp up the motors in that direction until a certain number (dutyCycleMax) is reached. Then uphold that speed, as long as the actionNow hasn't changed:

void loop()<br>{<br>  currentMillis = millis();<br>  if (currentMillis - previousMillis >= dutyCycleStepDelay) <br>  {<br>    // save the last time you executed the loop<br>    previousMillis = currentMillis;<br><br>    //mainloop is responsible for ramping up/down the motors<br>    if(actionNow != previousAction)<br>    {<br>      //ramp down, then stop, then change action and ramp up<br>      dutyCycleNow = dutyCycleNow-dutyCycleStep;<br>      if (dutyCycleNow <= 0)<br>      { //if after ramping down dc is 0, set to the new direction,start at min dutycycle<br>        setDir(actionNow);<br>        previousAction = actionNow;<br>        dutyCycleNow = dutyCycleMin;<br>      }<br>     }<br>    else //actionNow == previousAction --> ramp up,except when direction is STOP<br>    {<br>      if (actionNow != STOP)<br>      {<br>        dutyCycleNow = dutyCycleNow+dutyCycleStep;<br>        if (dutyCycleNow > dutyCycleMax) dutyCycleNow = dutyCycleMax;<br>      }<br>      else dutyCycleNow = 0;<br>    }<br>    ledcWrite(pwmChannel,dutyCycleNow); //adjust the motor dutycycle <br>  }<br>}<br><br>

This slowly increases the speed of the motors, instead of just launching at full speed and spilling the precious precious Duvel. An obvious improvement would be to move this code to a timer interrupt routine, but it works as is.

Now if we release the forward button, your browser calls toggleCheckbox('stop'), resulting in an request to GET /stop. The ESP32 sets actionNow to STOP (and responds with "OK stop."), which ushers the mainloop to spin down the motors.

What about the LEDs? Same mechanism, but now we have a slider:

<input id="slide" type="range" min="0" max="100" step="1" value="0" class="slider"><strong><output id="sliderAmount"></output></strong><br>

In the javascript, the setting of the slider is monitored, such that on every change a call to get "/LED/xxx" happens, where xxx is the brightness value that the LEDs should be set at:

 var slide = document.getElementById('slide'),<br>    sliderDiv = document.getElementById("sliderAmount");<br>    slide.onchange = function() {<br>      var xhr = new XMLHttpRequest();<br>"GET", "/LED/" + this.value, true);<br>      xhr.send();<br>      sliderDiv.innerHTML = this.value;<br>      }<br>

Note that we used document.getElementByID('slide' ) to get the slider object itself, which was declared with id="slide" and that the value is output to a text element with id="sliderAmount" on every change.

The handler in the sketch catches all brightness requests by using "/LED/*" in the handler registration. Then the last part (a number) is split of and cast to an int:

server.on("/LED/*", HTTP_GET, [](AsyncWebServerRequest *request){<br>    Serial.println("led request received!");<br>    setLedBrightness((request->url()).substring(5).toInt());<br>    request->send(200, "text/plain", "OK Leds.");<br>  });<br>

Similar as described above, the radiobuttons control variables that set the PWM defaults, such that DuvelBot can drive slowly to you with the beer, careful not to spill that liquid gold, and fast back to the kitchen to fetch some more.

...So how does the camera image get updated without you having to refresh the page? For that we use a technique called AJAX (Asynchronous JavaScript and XML). The problem is that normally a client-server connection follows a fixed procedure: client (browser) makes request, server (ESP32) responds, case closed. Done. Nothing happens anymore. If only somehow we could trick the browser into regularly requesting updates from the ESP32...and that's exactly what we will do with this piece of javascript:

 setInterval(function(){<br>      var xhttp = new XMLHttpRequest();<br>"GET", "/CAMERA", true);<br>      xhttp.responseType = "blob";<br>      xhttp.timeout = 500;<br>      xhttp.ontimeout = function(){};<br>      xhttp.onload = function(e){<br>        if (this.readyState == 4 && this.status == 200) <br>        { <br>         //see: <a href=""></a><br>         //<br>         var urlCreator = window.URL || window.webkitURL;<br>         var imageUrl = urlCreator.createObjectURL(this.response); //create an object from the blob  <br>         document.querySelector("#camimage").src = imageUrl;  <br>         urlCreator.revokeObjectURL(imageurl)            <br>        }<br>      };<br>      xhttp.send();<br>    }, 250);<br>

setInterval takes as parameter a function and executes it every so often (here once per 250ms resulting in 4 frames/second). The function that is executed makes a request for a binary "blob" at the address /CAMERA. This is handled by the ESP32-CAM in the sketch as (from Randomnerdtutorials):

 server.on("/CAMERA", HTTP_GET, [](AsyncWebServerRequest *request){<br>    Serial.println("camera request received!");<br>    camera_fb_t * fb = NULL;<br>    //esp_err_t res = ESP_OK;<br>    size_t _jpg_buf_len = 0;<br>    uint8_t * _jpg_buf = NULL;<br>    //capture a frame<br>    fb = esp_camera_fb_get();<br>    if (!fb) {Serial.println("Frame buffer could not be acquired");return;}<br><br>    if(fb->format != PIXFORMAT_JPEG)//already in this format from config{<br>      bool jpeg_converted = frame2jpg(fb, 80, &_jpg_buf, &_jpg_buf_len);<br>      esp_camera_fb_return(fb);<br>      fb = NULL;<br>      if(!jpeg_converted){Serial.println("JPEG compression failed");return;}<br>    }<br>    else{<br>      _jpg_buf_len = fb->len;<br>      _jpg_buf = fb->buf;<br>    }<br>    //Serial.println(_jpg_buf_len);<br>    //send the formatted image<br>    request->send_P(200,"image/jpg", _jpg_buf, _jpg_buf_len);<br><br>    //cleanup<br>    if(fb){<br>      esp_camera_fb_return(fb);<br>      fb = NULL;<br>      _jpg_buf = NULL;<br>    } else if(_jpg_buf){<br>      free(_jpg_buf);<br>      _jpg_buf = NULL;<br>    }<br>  });<br>

The important parts are getting the frame fb = esp_camera_fb_get() converting it to a jpg (for the AI-Thinker it is already in that format) and sending the jpeg out: request->send_P(200,"image/jpg", _jpg_buf, _jpg_buf_len).

The javascript function then waits for this image to arrive. Then it just takes a bit of work to convert the received "blob" into an url that can be used as a source to update the image with id="caminage" in the html page.

phew, we're done!

Step 4: Ideas & Leftovers

The goal of this project for me was to learn just enough web programming to interface hardware to the web. Several extensions to this project are possible. Here are a few ideas:

  • Implement 'real' camera streaming as explained here and here and move it to a 2nd server as explained here on the same ESP32, but on the other CPU core, then import the camerastream into the html served by the 1st server using an </iframe>...</iframe>. This should result in faster camera updates.
  • Use access point (AP) mode so the robot is more standalone as explained here.
  • Expand with battery voltage measurement, deep-sleep capabilities etc. This is a bit difficult at the moment because the AI-Thinker ESP32-CAM doesn't have many GPIOs; needs expansion via uart and for example a slave arduino.
  • Convert to a cat-seeking robot that ejects cat treats from time to time on paw press of a big button, stream tons of nice cat pics during the day...

Please comment if you liked or have questions and thanks for reading!

Arduino Contest 2020

Participated in the
Arduino Contest 2020