Computer vision, doubtless, is a fantastic thing! Using this, a computer gains the capability to "see" and sensing better the environment around, what allows the development of complex, useful and cool applications. Applications such as face detecting and recognising, object tracking and object detection are more and more present in our day-to-day activities, thanks to computer vision advances.
Considering how advanced and accessible are computer vision frameworks and tools, the application described in this article fits well: using a simple Raspberry PI and a cost-free and open-source computer vision framework called OpenCV to count objects in movement, more precisely how much objects go in and out of a certain monitored zone.
Step 1: Getting Deeper: How Can Object Movement Is Detected in an Image Stream?
Now it's time to getting deeper in image processing stuff:
how to get some webcam stream images and detect that something have moved there.
It consists in five steps:
Step 1: To highlight the object in movement
As defined in classical physics, a reference is necessary to infeer that something is moving or if it's standing still. Here, to determine is something has moved, it's pretty much the same: every single webcam stream captured frame will be compared to a reference frame. If something is different, something has been moved. It's simple as it sounds.
This reference frame must be captured in the most perfect conditions (nothing moving, for example). In the image processing's world, this comparison between a captured frame and a reference frame consists in a technic called background subtraction. Background subtration consists on literaly subtract pixel-to-pixel color information from the captured frame and the reference frame. So, the resulting image from this proccess will highlight / show with more details only what is different between these two frames (or, what have moved / got movement) and everything else will be black in image (the color of zero value on a gray-scale pixel). Important: lighting condictions and quality of webcam image captured (due to capture sensors quality) can slightely vary from frame to frame. It implies that the "equal parts" from reference frame and another frames won't be total black after background subtraction. Despite of this behavior, there's no serious consequences in the next steps image processing in this project.
In order to minimize image processing time, before doing a background subtration, captured frame and reference frame are converted to an gray scale image. But.. why? It's a computing efficience issue: a image that presents multiple colors (color image) has three informations per pixel: Red, Blue and Green color components (the old but gold RGB standard). So, matematically, each pixel can be defined as a three-value array, each one representing a color component. Ttherefore, extending it to the whole image, the final image will be actually the mix of three image components: Red, Blue and Green image components.
To Process it, a lot of work is required! However, in gray-scale images, each pixel has only one color information. So, the processing of an color image is three times slower than in gray-scale image case (at least three times, depending on what technique is involved). And there's more: for some purposes (like this project), process all the colors isn't necessary or important at all. Therefore, we came to the conclusion: gray-scale images usage is highly recommended for image processing purpose. After background subtration, it's necessary to apply Gaussian Blur filter.
The Gaussian Blur filter applied over background subtracted image smoothes all contours of the moving detected object. For sure, it'll be helpul in the next steps of image processing.
Step 2: Binarization
In most cases of image processing, binarization is almost a mandatory step after highlight objects / characteristcs in a image. Reason: in a binary image, each pixel color can assume two values only: 0x00 (black) or 0xFF (white). This helps a lot the image processing in order to require even less "computing power" to apply image processing techniques in the next steps. Binarization can be done comparing each pixel color of the gray-scale image to a certain threshold. If the value of the pixel color is greater than threshold,this pixel color will assume white value (0xFF), and if the value of the pixel color is lower than threshold,this pixel color will assume black value (0x00). Unfortunatelly, threshold value's choice isn't so easy to make. It depends on environment factors, such as lighting conditions. A wrong choice of a threshold value can ruin all the steps further. So, I strongly recommend you adjust manually a threshold in the project for your case before any further actions. This threshold value must ensure that the moving object shows in binary image. In my case, after a threshold's adequate choice, results in what you see in figure 5.
Figure 5 - binary image
Step 3: Dilate
Until now, it was possible to detect moving objects, highlight them and apply binarization, what results in a pretty clear image of moving object ( = pretty clear image of the object for image processing purposes). The preparation for object counting is ALMOST done. The "ALMOST" here means that there're some fine adjusts to make before moving on. At this point, there're real chances of presence of "holes" in the objects (black masses of pixels into the white highlighted object). These holes can be anything, from particular lighting conditions to some part of the object shape. Once holes can "produce" false objects inside real objects (depending on how big and where they're located), the consequences of holes presence in a image can be catastrophic to objects' counting. A way to eliminate these holes is using an image processing Technic called Dilate. Use this and holes go away.
Step 4: The Search for the Contours (and Its Centroids)
At this point, we have the highlighted objects, no holes inside it and ready for what's next: the search for the contours (and its centroids). There're resources in OpenCV to detect automatically contours, but the detected countours must be wisely chosen (to pick the real object or objects only). So, the criteria to detect the contours is the area of the object, measured in pixels². If a contour has a higher area than a limit (configured in software),so it must be considered as a real object to be counted. The choice of this area limit/criteria is very important, and a bad choice here means wrong countings. You must try some area value limits values and check what fits better to your usage. Don't worry, these limit isn't sohard to find / adjust. Once all the objects in the image are picked, the next step is to draw a retangle on it (this retangle must contain an entire detected object inside it). And the center of this rectangle is.... the object centroid! You are maybe thinking "What's the big deal with this centroid?", right? Here's your answer: doesn't matter how big or how is the shape of the object, its movement is the same of the centroid. In another words: this simple point called centroid represents all the movement of the object. It does makes the counting very simple now, doesn't it? See the image below (figure 6), where the object's centroid is represented as a black point.
Step 5: Centorid's Movement and Object Counting
The grand finale: compare object's centroid coordinates to entrance and exit lines coordinates and apply the counting algorythm described before. And there'll be counting of moving objects!
As shown in the very beginning of this post, here is the project in action: