Multimodal Mean Adaptive Background Modeling

The availability of low-cost, portable imagers and new embedded computing platforms makes video surveillance possible in new environments. However, situations in which a portable, embedded video surveillance system is most useful (e.g., monitoring outdoor and/or busy scenes) also pose the greatest challenges. Real-world scenes are characterized by changing illumination and shadows, multimodal features (such as rippling waves and rustling leaves), and frequent, multilevel occlusions. To extract foreground in these dynamic visual environments, adaptive multimodal background models are frequently used that maintain historical scene information to improve accuracy. These methods are problematic in real-time embedded environments where limited computation and storage restrict the amount of historical data that can be processed and stored.


Results of MM Background Subtraction on 3 Video Sequences: Waving Trees and
Bootstrapping (from the Wallflower benchmark images) and an Outdoor Sequence.
See our ECVW07 paper for comparison with other backgrounding techniques.

We have developed a new adaptive technique, multimodal mean (MM), which balances accuracy, performance, and efficiency to meet embedded system requirements. This algorithm delivers comparable accuracy of the best alternative (Mixture of Gaussians) with a 6X improvement in execution time and an 18% reduction in required storage on an eBox-2300 Thin Client VESA PC running Windows Embedded CE 6.0.

 
eBox-2300 Thin Client VESA PC
Publications: