Introduction: Vision Based Object Tracking and Following on a DIY Drone.

Vision based object tracking and following uses the technique of visual servoing using a camera mounted on a 3-axis Gimbal. Unlike using a fixed camera, using a 3-axis gimbal adds better solution for object tracking and following as the camera can always focus on the target keeping it within the frame. This project involves the technique where the target is always kept at the centre of the frame of reference and the camera mounted on the Gimbal tracks the target with an attempt to keep it at the centre and then follows the target keeping it within its proximity.

The projects aims to encourage drone enthusiasts to build similar applications using simple algorithm and develop better applications taking the drone industry to a greater heights.

Step 1: Components Used

1. A F450 quadrotor

2. Odroid XU4 - Image processing and sending commands to Gimbal

3. Pixhawk PX4 - autopilot

4. 3-axis Gimbal (Storm32 BGC)

5. Sj4000 Camera

Step 2: Object Detection

Object detection is a the first step in this project. The main idea behind this project is that, the user has the ability to select the object of interest of his choice. This is done by Opencv methods such as 'mouse events', where the user draws a bounding box around the target and based on visual features like color and shape, the target is selected.

The following is the algorithm to implement it.

1. Take the video feed from your camera

2. Create a mouse callback function based on mouse click events like left mouse button down, mouse drag and left mouse button up.

3. store these coordinates in variables and create a rectangular box around the object you want to track.

4. extract the pixels of these coordinates and define the respective HSV values such that the target is masked.

5. Apply shape defining algorithms to the selected object, I had used a circular target in my project so I had applied Circular Hough transform (

6. Keep track of only these pixel values based on your color during the entire frame capture process.

7. Using contours, calculate the centroid of the target. This centroid will the the main focus point in the entire project.

8. Calculate the distance from the current centroid (in pixel) to the centre of the frame. Do it seperatey for rows and columns. Store this error distance (in pixels ) in 2 seperate variables.

Step 3: Object Tracking Using Camera Mounted on Gimbal

Based on the error distance (in pixels ) obtained from the previous step, we command the Gimbal to move across 2 directions - pitch and yaw. But the value was obtained in pixels and need to be converted in angular radians. The following algorithm is used for the tracking part.

1. Convert the pixels value in angular radians both along x (rows) and y(columns) directions. This is your yaw and pitch value respectively.

2. Since the value still is erroneous, we give it to a PID controller to stabilize, improve the response and decrease the error in the command given to the gimbal. Kp, Ki and Kd values can be set according to your convenience.

3. You will see that when you start you execution, the Gimbal will move with respect to the target with an attempt to keep it at the centre of the frame of reference and in the video feed you will be able to see that the target remains at the centre.

If you are doing this for a fast moving target, then you need to adjust your gimbal's motor speed and also improve your PID gains.

Step 4: Following

Once the gimbal tracks the target successfully, now we want the drone to remain within the proximity of the target. This involves a bit of patience and mathematics! Based on the pitch and yaw values obtained as the output of the PID controller, we use 3D trigonometry to find the position setpoints of the target and send these sentpoints to the autopilot which will then be used to command the drone to follow the target accordingly.

You can assume that the height of the target is 0 and can calculate only the x and y coordinate of the target based on the current pitch and yaw which makes the gimbal face the target.

Step 5: Conclusion

Vision Servoing has numerous applications in the fields of entertainment, to disaster management operations like search and rescue etc. It also plays a major role in traffic management or tracking a stolen vehicle without the notice of the thief.

My application is following the target selected by the user's choice which can be used in the field of entertainment or search and rescue but mainly focuses on tracking stolen vehicles which can be useful by police department where the operator specifies the vehicle to be tracked in the presence of numerous other vehicles where OCR (optical character recognition) is added to it.

Hoping that you all build many more such useful applications thereby solving many real life problems!


Drones Contest 2016

Runner Up in the
Drones Contest 2016