In my previous post, I showed how we could perform face detection on video streams by combining VASE with a computer vision library (dlib). That approach required the training of detectors for each class of object: faces, people, cars, etc. However, for some applications one may not need to detect all the objects in a scene. Rather, one is interested whether there was a change from the expected – or average – scene.

This focus on changes can work well for videos from static surveillance cameras. Because those cameras continuously record the same location, we have enough data to model the expected scene (background) and detect when something changes in the scene (foreground). This technique is called background subtraction and can be used to emphasize regions of interest in the scene. This is useful because, instead of searching each video frame for a specific object (object detection), a classifier can tell whether there is, for example, a person or a car on the regions that changed (object identification).

There are many algorithms for background subtraction and I won’t go over their details. For a thorough review of the different techniques which are available, please check the book “Background modeling and Foreground Detection for video surveillance:  Traditional and Recent Approaches, Benchmarking and Evaluation”. The OpenCV library also contains fast implementations of many background subtraction algorithms.

Last month, Thomas and I met for a productive weekend of cooking, software architecture discussions and programming. Thomas had just finished setting up IP cameras in a private location, and we wondered what objects we could automatically detect from the recorded scenes. Were there people or animals walking around, tools being moved? This was the ideal application for background subtraction.

I implemented a solution using VASE and a background subtraction algorithm from OpenCV, which models the background pixels’ RGB values as a mixture of Gaussian distributions. For each frame, the pixel values are compared to the background distribution, and those that deviate significantly are marked as foreground.

The output from background subtraction is a mask with white pixels at the location of foreground objects. This mask often shows false-positives due to lighting changes and reflections. Happily, those false-positives were often isolated pixels and could be reduced with the application of morphological transforms (opening and closing).

Finally, the remaining pixels were grouped if they belonged to the same connected components. The bounding box of those pixels was drawn on the screen around the putative objects. The results you can see below.

Note that the algorithm detected the closed door on the second frame (top-right figure, door is in the center). After some time, though, immovable objects are again taken as part of the background.

While the results look satisfactory, much still can be improved. For example, reflections on the water or shadows on the surfaces could still be removed before considering whether a pixel belongs to the foreground. Similarly, motion detection could be used to validate foreground pixels and avoid false positives. I’ll come back to those improvements in a later post.