Introduction
Throughout this blog post, we will be learning more about Video Direct Memory Access (VDMA), a critical component for handling video data efficiently on the FPGA platforms. We will discuss how to create a loop testing environment for VDMA to understand its functionality better. This post will provide a comprehensive overview of VDMA and how it compares to DMA which we have explained in our previous blog post linked here, preparing us for our upcoming exploration of the Prewitt filter implementation which will be stated in the next blogs.
What is VDMA?
Video Direct Memory Access is a specialized type of Direct Memory Access block optimized for transferring video data between memory and video peripherals. Unlike regular DMA which handles generic data transfers, VDMA is specifically designed to handle the high throughput and real-time requirements of video data streams. Also, timing constraints have to be met correctly in order for VDMA to work. This makes VDMA an essential component in FPGA-based video processing applications. To become more familiar with VDMA, here are some key features of the VDMA:
- High throughput: Designed to handle large volumes of data typical in video applications ensuring that video frames are transferred quickly and efficiently.
- Real-Time performance: Ensures that video frames are transferred without latency. Crucial for video streaming and display applications.
- Flexible buffer management: Supports multiple frame buffers, allowing for smooth video playback and processing. Reducing flicker and tearing effects.
VDMA is particularly valuable in applications such as video processing, broadcasting, and medical imaging.
Required tools to create this project are Vivado Design Suite and PYNQ environment.
PART 1. Setting Up VDMA in Vivado Design Suite
In order to begin creating this project, we need to list all required IP cores. Here’s a brief overview of each core and its function:
- ZYNQ7 Processing System: This core represents the processing system of the Zynq-7000 series SoC, which includes the ARM Cortex-A9 processors. It is the central component that interfaces with other IP cores.
- AXI Interconnect: Facilitates communication between the AXI VDMA core and other system components, such as the memory controller and video source/sink.
- AXI VDMA: This core handles the transfer of video data between memory and video peripherals, supporting high throughput and real-time performance.
Concat: This core handles multiple inputs and connects them into one or more outputs. In this example, it’s required for interrupt handling.
After defining the required IP cores, the next step shows a step-by-step guide to creating a full VDMA project.
- Create a New Project: Start Vivado and create a new project. Select the PYNQ Z2 as the target board. This ensures that all the necessary board files and configurations are correctly loaded.
- Add ZYNQ7 Processing System Core: In the IP catalog, search for and add the ZYNQ7 Processing System core to your design. Configure it to enable the necessary interfaces. The required configuration is presented in Figure 1 below.

Also, Interrupt is required to be enabled for ZYNQ7. The example of how to set up the interrupt inside ZYNQ7 is shown in Figure 2.

3. Add VDMA IP Core: In the IP catalog, search for and add the AXI VDMA IP core to your design.
4. Add AXI Interconnect: In the IP catalog, search for and add the AXI Interconnect core to your design.
5. Configure VDMA: Double-click the VDMA core to configure it. Set the parameters according to your video resolution and memory interface requirements. This includes specifying the number of frame buffers and setting the base addresses for the frame buffers. The VDMA configuration is shown in Figure 3.

For the advanced part we need to set it for dynamic slave and master settings, and set unaligned transfers to checked.
6. Connect VDMA to AXI Interconnect: Ensure that the VDMA is correctly connected to the AXI interconnect and ZYNQ7 via Concat. This involves making connections to the AXI stream interfaces for video input and output and the AXI memory-mapped interface for VDMA transfers.
7. Connect AXI Interconnect to ZYNQ7 Processing System: Connect the AXI Interconnect to the ZYNQ7 Processing System, ensuring that data can flow between the processing system and the VDMA. For other connections, you can run Automation Connection.
8. Generate Bitstream: Validate your design, generate the bitstream, and export the hardware. This step compiles your design and prepares it for deployment on the FPGA.
The final project in Vivado would look like this (Figure 4).

PART 2. Implementing testing for VDMA
Testing involves setting up a simple loopback mechanism where video data is written to and from memory via VDMA, ensuring that the data transfer process is functioning correctly. The code was written in the PYNQ environment in the Jupyter Notebook. The code is presented below:
This is the python code
import os
from pynq import PL
from pynq import Overlay
from pynq.lib.video import *
from pynq import allocate
import cv2
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
# Constants
BITSTREAM = './design_1.bit'
IMAGE = './town.jpg'
PROCESSED_IMAGE = './town_processed.jpg'
overlay = Overlay(BITSTREAM)
vdma = overlay.axi_vdma_0
#brightness_filter = overlay.brightness_filter_0
original_image = Image.open(IMAGE)
if original_image.mode == 'RGBA':
original_image = original_image.convert('RGB')
canvas = plt.gcf()
size = canvas.get_size_inches()
canvas.set_size_inches(size * 2)
width_old, height_old = original_image.size
print("Image size: {}x{} pixels.".format(width_old, height_old))
plt.imshow(original_image)
plt.show()
width_new, height_new = 2400, 1599
print("Allocating buffers for input and output frames...")
input_buffer = allocate(shape=(height_old, width_old, 3), dtype=np.uint8, cacheable=1)
output_buffer = allocate(shape=(height_new, width_new, 3), dtype=np.uint8, cacheable=1)
input_buffer[:] = np.array(original_image)
address_of_input_buffer = input_buffer.device_address
address_of_output_buffer = output_buffer.device_address
vdma.write(0x00, 0x04)
while vdma.read(0x00) & 0x4 == 4:
pass
vdma.write(0x30, 0x04)
while vdma.read(0x30) & 0x4 == 4:
pass
# Configure VDMA for MM2S (memory to stream)
print("Configuring VDMA for MM2S...")
vdma.write(0x00, 0x93) # MM2S VDMA Control Register
vdma.write(0x5C, address_of_input_buffer) # Start address
vdma.write(0x58, 2400 * 3) # HSIZE (bytes per line)
vdma.write(0x54, width_new * 3) # HSTRIDE (bytes per line)
vdma.write(0x50, height_new) # VSIZE (number of lines)
print("VDMA MM2S configured.")
# Configure VDMA for S2MM (stream to memory)
print("Configuring VDMA for S2MM...")
vdma.write(0x30, 0x93) # S2MM VDMA Control Register
vdma.write(0xAC, address_of_output_buffer) # Start address
vdma.write(0xA8, width_new * 3) # HSIZE (bytes per line)
vdma.write(0xA4, width_new * 3) # HSTRIDE (bytes per line)
vdma.write(0xA0, height_new) # VSIZE (number of lines)
print("VDMA S2MM configured.")
#while vdma.register_map.S2MM_VDMASR.Halted != 1:
# pass
resized_image = Image.fromarray(output_buffer)
canvas = plt.gcf()
size = canvas.get_size_inches()
canvas.set_size_inches(size*2)
_ = plt.imshow(resized_image)
resized_image.save(PROCESSED_IMAGE)
In this code, we perform the next steps:
- Import Libraries: The code imports necessary libraries such as pynq, cv2, matplotlib, and numpy for handling FPGA overlays, image processing, and display.
- Set Constants: Paths for the bitstream (.bit file) and input image are defined, along with the output image path.
- Load Bitstream and Initialize VDMA: The bitstream is loaded onto the FPGA using the Overlay class, and the VDMA IP core is initialized.
- Load and Prepare Image: The input image is loaded and converted to RGB format if needed. The image dimensions are printed and displayed using matplotlib.
- Allocate Buffers: Input and output buffers are allocated using the allocate function, sized according to the old and new image dimensions respectively. The input buffer is filled with pixel data from the original image.
- Set Buffer Addresses: The physical addresses of the input and output buffers are obtained.
- Reset VDMA: The VDMA is reset by writing to its control registers to ensure it’s in a known state before configuration.
- Configure VDMA for MM2S: The VDMA is configured for memory-to-stream (MM2S) mode, specifying the start address, horizontal size, horizontal stride, and vertical size.
- Configure VDMA for S2MM: Similarly, the VDMA is configured for stream-to-memory (S2MM) mode, specifying the start address, horizontal size, horizontal stride, and vertical size.
- Process and Display Image: After configuring the VDMA, the output buffer containing the resized image is converted to an image format and displayed using matplotlib. The processed image is also saved to the specified path.
Results are shown when the image is transferred from the Master to the Slave bus of VDMA in Figure 5.

This forms the foundation for our upcoming blog post, where we will combine the Prewitt filter with the VDMA.
Comparison with DMA
While both VDMA and DMA facilitate data transfer between memory and peripherals, VDMA is specifically optimized for video data. Regular DMA may not handle the high throughput and real-time requirements as efficiently as VDMA. Quick comparison presented in Table 1:
DMA | VDMA | |
Throughput | Moderate | High |
Latency | Moderate | Low |
Buffer Management | Generic | Multiple frame buffers |
Use Case | General-purpose data transfer | Video data transfer |
Conclusion
In this blog post, we explored the fundamentals of VDMA and demonstrated how to set up and test VDMA on the PYNQ Z2 FPGA platform.