Introduction: Play Video With ESP32
This Instructables show something about playing video and audio with ESP32.
Step 1: ESP32 Features & Limitations
Features
- 4 SPI bus, 2 SPI bus available for user space, they are SPI2 and SPI3 or called HSPI and VSPI. Both SPI buses can run at most 80 MHz. Theoretically it can push 320x240 16-bit color pixels to SPI LCD at 60 fps, but it has not yet counted the time overhead required for read and decode the video data.
- 1-bit / 4-bit SD bus can connect SD card in native protocol
- I2S internal DAC audio output
- over 100 KB RAM available for video and audio buffer
- Fair enough processing power to decode JPEG (play Motion JPEG) and LZW data compression (play Animated GIF)
- Dual-core version can split read data from SD card, decode and push to SPI LCD into parallel multi-tasks and boost the playback performance
Limitations
- not enough internal RAM to have double frame buffer for 320x240 in 16-bit color, it limited the multitask design. It can overcome a bit with external PSRAM though it is slower than internal RAM
- not enough processing power to decode mp4 video
- not all ESP32 version have 2 core, the multi-task sample only benefit on dual-core version
Step 2: Video Format
RGB565
Or called 16-bit color is a raw data format commonly used on the communication between MCU and color display. Each color pixel represented by a 16-bit value, the first 5-bit is red value, following 6-bit is green value and then 5-bit blue value. 16-bit value can make 65536 color variation so it also called 64K colors. So 1 minute 320x240@30 fps video will be sized: 16 * 320 * 240 * 30 * 60 = 2211840000 bits = 276480000 bytes or over 260 MB
Animated GIF
This is a common file format on the web since 1990s. It limit the color variation for each screen up to 256 colors and do not repeat store the pixel that as same color as previous frame. So it can much reduce the file size, especially when each animation frame not change too much details. The LZW compression is designed capable decoded by 1990s computer, so ESP32 also have fair enough processing power to decode it in real time.
Motion JPEG
Or called M-JPEG / MJPEG is a common video compression format for the video capture hardware with limited processing power. It actually simply a concatenation of still JPEG frames. Compare with MPEG or MP4, Motion JPEG no need computationally intensive technique of interframe prediction, every frame is independent. So it requirement less resource to encode and decode.
Ref.:
https://en.wikipedia.org/wiki/List_of_monochrome_a...
Step 3: Audio Format
PCM
A raw data format for digital audio. ESP32 DAC use 16-bit bit depth, that means each 16-bit data represent a digital sampled analog signal. Most video and song audio commonly use sample rate at 44100 MHz, that means 44100 sampled analog signal for each second. So, 1 minute mono audio PCM raw data will be sized: 16 * 44100 * 60 = 42336000 bits = 5292000 bytes or over 5 MB. The size of stereo audio will be double, i.e. over 10 MB
MP3
MPEG Layer 3 is a compressed audio format widely used for song compression since 1990s. It can dramatically reduce file size to under one-tenth of raw PCM format
Ref.:
Step 4: Format Conversion
This project use FFmpeg convert the video into ESP32 readable format.
Please download and install FFmpeg at their official site if not yet: https://ffmpeg.org
Convert to PCM audio
ffmpeg -i input.mp4 -f u16be -acodec pcm_u16le -ar 44100 -ac 1 44100_u16le.pcm
Convert to MP3 audio
ffmpeg -i input.mp4 -ar 44100 -ac 1 -q:a 9 44100.mp3
Convert to RGB565
ffmpeg -i input.mp4 -vf "fps=9,scale=-1:176:flags=lanczos,crop=220:in_h:(in_w-220)/2:0" -c:v rawvideo -pix_fmt rgb565be 220_9fps.rgb
Convert to Animated GIF
ffmpeg -i input.mp4 -vf "fps=15,scale=-1:176:flags=lanczos,crop=220:in_h:(in_w-220)/2:0,split[s0][s1];[s0]palettegen[p];[s1][p]paletteuse" -loop -1 220_15fps.gif
Convert to Motion JPEG
ffmpeg -i input.mp4 -vf "fps=30,scale=-1:176:flags=lanczos,crop=220:in_h:(in_w-220)/2:0" -q:v 9 220_30fps.mjpeg
Note:
- FFmpeg converted Animated GIF can be further optimizered by some web tools, you may search GIF optimizer to find one.
Step 5: Hardware Preparation
ESP32 Dev Board
Any dual-core ESP32 dev board should be ok, this time I am using a TTGO ESP32-Micro.
Color Display
Any color display that Arduino_GFX support should be ok, this time I am using a ILI9225 breakout board with SD card slot.
You can find Arduino_GFX supported color display list at Github:
https://github.com/moononournation/Arduino_GFX
SD Card
Any SD card should be ok, this time I am using a SanDisk "normal speed" 8 GB micro SD with SD adaptor.
Audio
If you want to use headphone only, simply connect headphone pins to pin 26 and GND can listen the audio. Or you can use a tiny amplifier to play audio with speaker.
Others
Some breadboards and breadboard wires
Step 6: SD Interface
ILI9225 LCD breakout board also included a SD crd slot breakout pins. It can be used as SPI bus or 1-bit SD bus. As mentioned in my previous instructables, I prefer using 1-bit SD bus, so this project will base on 1-bit SD bus.
Step 7: Put It Together
The above pictures show the testing platform I am using in this project. The white breadboard is 3D printed, you can download and print it at thingiverse: https://www.thingiverse.com/thing:4552162
The actual connection depends on which hardware you have in hand.
Here are the connection summary:
ESP32 Vcc -> LCD Vcc GND -> LCD GND GPIO 2 -> SD D0/MISO -> 1k resistor -> Vcc GPIO 14 -> SD CLK GPIO 15 -> SD CMD/MOSI GPIO 18 -> LCD SCK GPIO 19 -> LCD MISO GPIO 22 -> LCD LED GPIO 23 -> LCD MOSI GPIO 27 -> LCD DC/RS GPIO 33 -> LCD RST
Step 8: Program
Arduino IDE
Download and install Arduino IDE if you are not yet do it:
https://www.arduino.cc/en/main/software
ESP32 Support
Follow the Installation Instructions to add ESP32 support if you re not yet do it:
https://github.com/espressif/arduino-esp32
Arduino_GFX Library
Download latest Arduino_GFX libraries: (press "Clone or Download" -> "Download ZIP")
https://github.com/moononournation/Arduino_GFX
Import libraries in Arduino IDE. (Arduino IDE "Sketch" Menu -> "Include Library" -> "Add .ZIP Library" -> select downloaded ZIP file)
ESP8266Audio
Download latest ESP8266Audio libraries: (press "Clone or Download" -> "Download ZIP")
https://github.com/earlephilhower/ESP8266Audio
Import libraries in Arduino IDE. (Arduino IDE "Sketch" Menu -> "Include Library" -> "Add .ZIP Library" -> select downloaded ZIP file)
RGB565_video Sample Code
Download latest RGB565_video sample code: (press "Clone or Download" -> "Download ZIP")
https://github.com/moononournation/RGB565_video
SD Card Data
Copy the converted files to SD card and insert into LCD card slot
Compile & Upload
- Open SDMMC_MJPEG_video_PCM_audio_dualSPI_multitask.ino in Arduino IDE
- If you are not using ILI9225, change the new class code (around line 35) to correct class name
- Press Arduino IDE "Upload" button
- If you failed to upload the program, try detach the connection between ESP32 GPIO 2 and SD D0/MISO
- If you find the orientation not correct, change the "rotation" value (0-3) in new class code
- If program run well you can try other sample start with SDMMC_*
- If you do not have SD card slot or you don't have FFmpeg installed, you can still try SPIFFS_* example
Step 9: Benchmark
Here are the performance summary for different video (220x176) and audio (44100 MHz) format:
Format | Frame per second(fps) |
MJPEG + PCM | 30 |
GIF + PCM | 15 |
RGB565 + PCM | 9 |
MJPEG + MP3 | 24 |
Note:
- MJPEG + PCM can reach higher fps but it is unnecessary play in a tiny screen greater than 30 fps
- RGB565 does not require decode process but the data size is too large and much time consumed at loading data from SD, 4-bit SD bus and faster SD card can improve it a little bit (wild guess can reach around 12 fps)
- MP3 decode process not yet optimized, it is now dedicate core 0 for MP3 decode and core 1 for playing video
Step 10: Happy Playing!
Now you can play video and audio with your ESP32, it unlocked many possibilities!
I think I will make a tiny vintage TV later...