Introduction: WIDI - Wireless HDMI Using Zybo (Zynq Development Board)
Have you ever wished that you could connect your TV to a PC or laptop as an external monitor, but didn't want to have all those pesky cords in the way? If so, this tutorial is just for you! While there are some products out that achieve this goal, a DIY project is much more satisfying and potentially cheaper.
This concept is different from products like chromecast, as it it's intended to take the place of an HDMI cord connecting to a monitor instead of being a streaming device.
Our project was created as a final project for a Real Time Operating Systems course at California State Polytechnic University, San Luis Obispo.
The goal of the project is to utilize two Digilent Zybo boards to act as the wireless communication interface between a HDMI transmitter device (PC, blu-ray, etc) to a HDMI receiving device (Desktop Monitor, Projector, TV, etc).
One Digilent Zybo will be connected via HDMI to the transmitting device, and the other will be connected via HDMI to the receiving device.
The wireless communication will be made by using a wireless local area network dedicated to the transmitter and receiver, without being routed through a home router or other such device. The wireless module used for this project is the tplink wr802n nanorouter, one of which operates as an access point to establish the network and the other to operate as a client to connect to the network. Each nanorouter will be connected via ethernet cable to either Zybo board. When connected to these routers, the devices will communicate via TCP as though they were connected with a single ethernet cable (meaning the only configuration needed to establish a connection is the IP address of the client).
While the goal of the project was to facilitate a stream of 1080x720 video @ 60Hz, this was not achievable due to bandwidth limitations in the wireless network and the lack of real time video compression to reduce the data required to send. Instead, this project serves as the framework for future development to achieve this goal, as it has severely restricted limitations in frame rate to properly stream HDMI data as intended.
Project Requirements:
2x Digilent Zybo Development Boards (must have at least one HDMI port)
2x HDMI cables
2x microusb cables (to connect Zybo to PC for development)
2x tplink wr802n nanorouters (including adtl. 2x microusb and wall outlet power adapters)
2x ethernet cables
***Note: This tutorial assumes familiarity with the Vivado design suite and experience creating a new project and block design.***
Step 1: Configure Zynq Programmable Logic for Transmitter
Our approach to developing the programmable logic of the transmitter was to perform an hdmi-to-hdmi pass-through from PC to monitor using two Video Direct Memory Access (VDMA) blocks, one for write and one for read.
Both are selected for free-running, 3 frame-buffer mode (0-1-2). Since the video core is optimized for 60 frames per second, this means that the VDMA will write or read to a new frame every 16.67 ms in this order: 0,1,2,0,1,2,0,1,2. The DDR memory locations for each frame are different for the two VDMAs because they are no longer synchronized with each other. Instead, a hardware timer (TTC1), configured for 60 Hz, is used to synchronize the movement of data between the two memory locations.
The image above shows 3 frames, their dimensions and the amount of memory each requires (to the right of the frame). If we assign the write VDMA to these memory locations, then we can assign the read VDMA memory locations beyond this set, say starting with 0x0B000000. Each frame is made up of 1280*720 pixels and each pixel is made up of 8 bits of Red, Green and Blue for a total of 24 bits. This means a frame is made up of 1280*720*3 bytes (2.76 MB).
Inside the timer IRQ, which is described in the VDMA driver setup, will handle copying data between the two VMDA memory locations. The VDMA provides a pointer to the current frame being written to or read from. The frame is represented by a particular gray code, which is converted in software. The gray code definitions for a 3 frame-buffer configuration can be found in the AXI VDMA Product Guide in appendix C.
This allows us to copy the contents being written to memory without reading from a frame currently being written to.
***Note that the read VDMA is not used when sending data across the wireless network. It's only purpose is to verify proper operation of copying memory from the write VMDA. The read VMDA should be disabled.***
Here are the steps to creating the Transmitter Design Block:
- When creating a new project, it is a good idea to assign a chip or board to the project. This link describes how to add new board files to the Vivado directory and associate the correct board with your project. It will come in handy when adding the Processing System block and transitioning from hardware to software (SDK side).
- Add the following blocks:
- dvi2rgb
- Video in to Axi4-stream
- Timing Controller
- axi4-stream to vid out
- rgb2dvi
- AXI VDMA x2
- AXI GPIO x2
- Clock Wizard
- Constant
- Zynq Processing System
- When adding the Processing System, click "Run Block Automation" from the top green colored bar and make sure the "Apply Board Preset" option is selected. Leave everything else default.
- Images of each block configuration window can be found in the images above. If you don't see an image for a particular window, just leave it as default.
- Begin Configuring the Zynq Processing system:
- In PS-PL Configuration --> AXI Non Secure Enable --> GP Master AXI, enable M AXI GP0 Interface
In PS-PL Configuration --> HP Slave AXI Interface, enable both HP0 and HP1
In MIO Configuration --> Make sure ENET0 is enabled under I/O Peripherals, then --> Application Processor Unit, enable Timer0
In Clock Configuration --> PL Fabric Clocks, enable FCLK_CLK0 and set to 100 MHz.
Click Ok
- Before clicking "Run Connection Automation," be sure to connect the video blocks as seen in the TX block design image above. You will want to rename the constant to VDD and set the value to 1. Connect the video blocks accordingly.
Make the HDMI TMDS clock and data pins external on the rgb2dvi and dvi2rgb blocks
Create an input and output port for the hot plug detect signal (HPD) and connect them together, these are defined in the constraints file
- The pixel clock is recovered from the TMDS_Clk_p, which is created in the constraints file. This will be 74.25 MHz in accordance with 720p resolution. It is important to connect the pixel clock (from the dvi2rgb block) to the following pins:
- vid_io_in_clk (vid in to axi stream block)
- vid_io_out_clk (axi stream to vid out block)
- clk (Timing Controller)
- PixelClk (rgb2dvi)
- ***Note: Currently, in order to activate the pixel clock recovery, the HDMI rx and tx connectors must be plugged into an active source/sink. One way around this is to separate the video rx and tx blocks into different clock domains (in other words, generate a new 74.25 MHz clock to feed to the tx block).***
- Next set up the clock wizard so that you have a 100 MHz input (global buffer source) and 3 output clocks @ 50 MHz (AXI-Lite clock), 150 MHz (AXI4-Stream clock), 200 MHz (dvi2rgb RefClk pin).
- Connect the FCLK_CLK0 processing system pin to the clock wizard input
- At this point click "Run Connection Automation" from the green bar at the top of the design window. It is a good idea to do this for one block at a time and follow the TX block design image above.
- The tool will attempt to add the AXI Interconnect, which acts as the master/slave interconnect for the blocks that use the AXI-Lite bus (VDMAs and GPIOs).
- It will also add AXI SmartConnect, which acts as the master/slave interconnect for the AXI4-Stream and High Performance processor interfaces used by the VDMA (Stream to Memory Map and vice versa).
- The tool will also add a Processor System Reset. Make sure this is only connected to the VDMAs, GPIOs and processor related blocks. Do not connect it to any video blocks (i.e. dvi2rgb, timing controller, vid to stream etc.)
- Once connection automation has been completed, verify that the connections match that of the TX block design image. You'll notice an extra System ILA block that has not been mentioned. This is for debugging only and is not needed for now. It uses the 150M Processor Reset, so that's not needed either. Anywhere you see small green "bugs" on busses, that is because of the ILA and can be ignored.
- The final step is to right click on the block design in the project sources tree and select "Create HDL Wrapper." If you plan on adding logic to the wrapper, it will be overwritten every time this is selected.
- See the VDMA Driver Setup section for details on the SDK side.
Clocks and Resets
I've found that the most important aspects of any programmable logic project is careful consideration of clock domains and reset signals. If those are properly configured you have a good shot at getting your design to work.
Pixel Clock and Timing Locked
In order to verify that certain signals are active, it is a good idea to tie these signals to LEDs (clocks, resets, locks etc). Two signals that I found helpful to track on the transmitter board were the pixel clock and the "locked" signal on the AXI4-Stream to video out block, which tells you that the video timing has been synchronized with the timing controller and the video source data. I've added some logic to the design block wrapper that tracks the pixel clock using the PixelClkLocked signal on the dvi2rgb block as a reset. I've attached the file as hdmi_wrapper.v here. The constraints file is also attached here.
Step 2: Configure Zynq Programmable Logic for Receiver
The Programmable Logic block for the receiver is simpler. The key difference, other than the missing hdmi input blocks is the absence of a recovered pixel clock. For that reason we have to generate our own from the clock wizard. This design should be done in a separate project from the transmitter. For our purposes the receiver project followed the Zybo 7Z-20 board while the Transmitter followed the Z7-10 board. The FPGAs on the boards are different so...be careful.
Here are the steps to creating the Receiver Design Block:
- Add the following ip blocks to your design:
- Timing Controller
- AXI4-Stream to Video Out
- RGB to DVI
- AXI VDMA
- AXI GPIO
- Processing System
- Clock Wizard
- Constant (VDD set to 1)
- Follow the same pattern for configuring these blocks as the Transmitter. Images for the notable differences in configuration have been included here. The others remain the same as the Transmitter.
- Configure the VDMA for this design as read channel only. Disable the write channel.
- The clock wizard should be configured for the following outputs:
- clk_out1: 75 MHz (pixel clock)
- clk_out2: 150 MHz (stream clock)
- clk_out3: 50 MHz (axi-lite clock)
- Connect up the video blocks as shown in the RX block design image.
- Then run the connection automation, which will add the AXI Interconnect, AXI SmartConnect and System Reset blocks and attempt to make the appropriate connections. Go slowly here to make sure it doesn't perform unwanted connections.
- Make the HDMI TMDS clock and data pins external on the rgb2dvi block
- No need for hot plug signal on this design.
Step 3: Setup VDMA Driver
Setup for the different blocks that are configured via the AXI-Lite interface is best done by using demo projects included with the BSP as a reference. After exporting the design hardware and launching the SDK from Vivado, you'll want to add a new board support package and include the lwip202 library on the BSP settings window. Open the system.mss file file from the BSP and you'll see the peripheral drivers present from your block design. The "Import Examples" option lets you import demo projects that utilize these peripherals and thus show you how to configure them in software using the available Xilinx drivers (see attached image).
This was the method used for configuring the VDMA, Timer & Interrupt, and the GPIO. The source code for both transmit and receive has been included here. The differences are almost exclusively in main.c.
***NOTE: Since the system is not fully functional at the time of writing this tutorial, the source code in this section does not include the wireless network code. Several bugs need to be addressed as a result of combining the video core transmit/receive projects with the network transmit/receive projects. Therefore this tutorial treats them separately for the time being.***
TX Interrupt Handler Function (IRQHandler)
This function reads the gray codes provided by both the read and write VDMAs via the GPIO blocks. The gray codes are converted to decimal and used for selecting the frame base memory location of the current frame. The frame copied is the previous frame to the one being written to by the VDMA (e.g. if the VDMA is writing to frame 2, we copy frame 1; if writing to frame 0, we wrap and read from frame 2).
The function only captures every 6th frame to reduce the frame rate to 10 Hz rather than 60 Hz. The upper bound of the network is 300 Mbps. At 10 frames per second a bandwidth of 221.2 Mbps is required.
Commenting/un-commenting two lines in this function will allow the user to change to HDMI passthru mode for debugging/test purposes (the code is commented to indicate the appropriate lines). It currently copies the frame to a memory location used by the ethernet code.
RX Interrupt Handler Function (IRQHandler)
This function is very similar to the TX function, but it copies from a 2 buffer FIFO used by the ethernet to write incoming data to. The ethernet code indicates which frame is being written to of the FIFO, data is copied from the opposite frame. The data is copied to the frame directly behind the one being read by the VDMA to avoid tearing.
Attachments
Step 4: Setup Nanorouter Network
In order to create a network using the TPlink nanorouters, power them on individually and connect to the default wifi SSID for the devices. More info on the configuration settings for this particular device can be found through the device user manual.
Setup one of the devices as an access point, this will act as the primary connection for the network. Make sure to name the network and make note of the name, and disable DHCP (we do not want the router to configure the IP adresses dynamically, we want the tansmitter and reciever Zybo boards to set their IP addresses themselves so they are consistent). After configuring, make sure the device reboots and establishes this network.
Setup the other device as a client, and make sure it connects to the network SSID you setup with the first nanorouter. Once again, make sure that DHCP is disabled for the client.
Once the client has finished and rebooted, it should connect to the access point nanorouter (if it doesn't, there is likely an issue in your configuration of one of the devices). You will notice that the LED light on the client will be solid once it has connected to the access point.
The access point nanorouter LED will likely continue flashing at this point, this is okay! The flashing light means it is not connected to another device from its ethernet port, and once it is connected to a configured Zybo the LED will remain solid indicating a successful network connection.
Now that we have our nanorouters setup, we have a wireless network that will allow us to communicate through. An important note is that our configuration method for the nanorouters (as access point and client) allows us to communicate from the transmitting Zybo board to the receiving Zybo board as though the two were connected with a single ethernet wire. This makes our network setup less difficult, as the alternative would likely incorporate configuring the Zybo boards to connect to the server explicitly along with the intended connetion.
Once both devices are setup, the nanorouters are configured and ready to be implemented into your WIDI network. There is no specific pairing between the nanorouters and the Zybo boards, as either the access point or client will work for either the transmit or receive device.
Step 5: Setup Zynq Processing System for Data Transmission Via Ethernet
In order to transmit the HDMI data from one Zybo board to the other, we must incorporate an Ethernet protocol in with our VDMA driver. Our goal here is to stream individual video frames through the Ethernet peripheral in the processing system, at a set rate that is consistent with our network bandwidth. For our project, we utilized TCP provided by the bare-metal LwIP API. Since both of the project members are relatively inexperienced with networking utilities, this choice was made without fully recognizing the implications and constraints involved with TCP. The major problem with this implementation was the limited bandwidth and the fact that it is really not designed for the purpose of steaming high volumes of data. Alternative solutions to replace TCP and improve tbe in this project will be discussed later.
A brief description of TCP with LwIP: Data is sent over the network in packets of size tcp_mss (TCP maximum segment size), which is generally 1460 bytes. Calling tcp_write will take some data referenced by a pointer and configure pbufs (packet buffers) to hold the data and provide a structure for the TCP operations. The maximum amount of data that can be queued at one time is set as tcp_snd_buf (TCP sender buffer space). Since this parameter is a 16 bit number, we are limited to a send buffer size of 59695 bytes (there is some required padding in the send buffer). Once the data has been queued, tcp_output is called to begin transmitting the data. Before sending the next segment of data, it is imperative that all of the previous packets have been successfully transmitted. This process is done using the recv_callback function, as this is the function that is called when the acknowledgement is seen from the receiver.
Utilizing the example projects in Vivado SDK is very helpful for learning how the LwIP TCP operation, and is a good starting point for beginning a new project.
The procedure for the WiDi transmitting device is as follows:
- Initialize the TCP network using the bare-metal LWIP driver function calls.
- Specify any callback functions necessary for network operations.
- Connect to the WiDi receiver by connecting to it's IP address and port (our configuration: Receiver IP is 192.168.0.9, connect to port 7).
- When the VDMA driver timer expires, enter the TX ISR.
- Determine the current frame buffer to access based on the VDMA gray code
- Queue up the first segment of data in the TCP send buffer
- Output the data, and update local variables to keep track of how much data has been sent of the current frame.
- Upon reaching the received callback (function call made after the transmitter gets an acknowledgement of data retrieval), queue up the next segment of data.
- Repeat steps 7 & 8 until the entire frame has been sent.
- Return to an idle state to wait for the next timer interrupt to indicate a new frame is ready (Back to step 4).
Make sure to setup the board support package LwIP settings as shown in the image above. All the values are default except for tcp_snd_buf, tcp_pueue_ooseq, mem_size, memp_n_tcp_seg. Also note that detailed debugging can be achieved by changing the BSP parameters for the debug_options group.
Step 6: Setup Zynq Processing System for Data Reception Via Ethernet
The Zybo development board that will act as the wireless receiver will operate similarly to the transmitting device. The board support package settings for LwIP will be identical to those in the previous step.
The device will take in packets containing the video frame segments from the nanorouter, and it will copy the video frame data into the triple frame buffer space for the receiving VDMA. In order to avoid overwriting any data, a double data buffer (we will refer to as the network buffer) is used when collecting data from the nanorouter, so that network traffic can continue streaming in while the previous full video frame is being copied into the VDMA buffer.
The procedure for the WiDi receiving device requires two tasks, one of which is receiving ethernet data, and the other is copying video frames from the network buffer to the VDMA's triple frame buffer.
Ethernet reception task:
- Initialize the TCP network using the bare-metal LWIP driver function calls (setup with IP address that the transmitter will connect to, 192.168.0.9 in ours)
- Specify any callback functions necessary for network operations.
- Upon received ethernet packet, copy packet data into current network buffer, increase current accumulated data.
- If the packet fills the network frame buffer, continue to steps 5 & 6. Otherwise, loop back to step 3 fro this task.
- signal that the VDMA triple frame buffer task should copy from the newly finished network buffer.
- Switch to the other network buffer and continue collecting data via ethernet.
- Idle until new ethernet packet is received (step 3).
Copy network buffer to VDMA triple frame buffer:
- When the VDMA driver timer expires, enter the RX ISR.
- Determine the current frame buffer to access based on the VDMA gray code.
- Determine which network buffer will be copied to the VDMA buffer, and copy that data
Step 7: Connect Your Zybo Boards to the HDMI Source and HDMI Sink
Now connect up the hdmi cables for both the receiver and transmitter, program the FPGA's and run the processing system. The frame rate will likely be very slow, due to the immense overhead in the LwIP operation and limited bandwidth. If there are any issues, connect via UART and try to identify any warnings or errors.
Step 8: Alternative Ideas for Improvement
A big issue for this project was the amount of data needed to send over wifi. This was expected, however we underestimated the impact this would have and resulted in more of a burst of images on a screen rather than a video feed. There are several ways to improve this project:
- Real time video compression. Compressing the incoming video feed frame by frame would greatly reduce the amount of data needed to be sent over the network. Ideally this would be done in hardware (which is not an easy task), or it could be done in software by using the other ARM core to run a compression algorithms (this would need some further analysis to ensure the timing works out). There are some open source real time video compression components we found on the web, but a majority are IP.
- Implementing the Ethernet stream in hardware, rather than software. There was a ton of overhead because of the lack of space available to queue outgoing data in the transmitter, due to the limitation on the segment size. A much more efficient process is to use the AXI Ethernet IP with a FIFO buffer or DMA to feed data into it. This would reduce the extra baggage from LwIP TCP and allow for more data flow.
Step 9: Accessibility
The resulting product of this WiDi project should be a fully integrated, compact pair of devices that a user could connect to any HDMI source and then sink the video feed to a display with HDMI capability wirelessly. The devices would feature the Zynq-7000 SoC found on the Zybo reference board and incorporate the network hardware found in the TP-Link nano-routers. Ideally, the user would be able to control the transmit module from a discrete location within the target operating system, with little need for significant technical ability.
Security and Connectivity
The devices should also incorporate Transport Layer Security (TLS) and have limited auto-connect ability, both for privacy purposes. It is the intention of the designers to make the connection with a display over a wireless interface a deliberate action on behalf of the user to avoid mistakenly broadcasting sensitive material.
Present Status
Until this point, the state of the project is still very much a work in progress. In order for the current end-point user to benefit from this tutorial, he or she must have a strong technical understanding of embedded system design and should have some familiarity with programmable hardware and embedded software working together.
The data being sent over the network is not encrypted at this point and is assumed to be a raw transmission of TCP/IP packets.
The video core project was successfully tested for both transmit and receive. On the other hand, the wireless connection between two zybo boards was established and test frame data was sent successfully. It is still necessary, however, to combine the network code to each video core project and test the transmission of actual video frames.