Multimedia Laboratory

Introduction

This project is motivated by the fact that it is more reliable to design overlapping parts at multiple layers and verify the visibility of a part for multiple times at different layers. The detection score of one part provides valuable contextual information for the estimation on its overlapping parts.

If the correlation among parts is modeled in a correct way, the detection score of the head-shoulder can be used to recommend the left-head-shoulder as visible and that of the two-legs can be used to recommend the left-leg as invisible. Therefore, the major challenges are how to model the relationship of the visibilities of different parts and how to properly combine the results of part detectors according to the estimation of part visibility.

Contribution Highlights

A probabilistic framework for pedestrian detection which models the visibility of parts as hidden variables. It is shown that various heuristic occlusion handling approaches (such as linear combination and hard-thresholding) are considered as its special cases but did not fully explore its power in modeling the correlations of different parts.
A discriminative deep model to learn the correlations of different parts, which is inspired by the great success of deep models in various applications of dimension reduction and recognition.

Citation

If you use our codes or dataset, please cite the following papers:

W. Ouyang and X. Wang. A Discriminative Deep Model for Pedestrian Detection with Occlusion Handling. In CVPR, 2012. PDF

Images

Framewrok Overview

The overview of the framework

The Parts Model:

s_l^i is detection score of each part,h_i^l is the visibility of ith part in the lth layer. For example, h_1^1 indicates the visibility of the left-head-shoulder part.

The BP network:

The BP network for fine tuning and estimating visibility.

Experimental results on ETHZ:

Experimental results on ETHZ for HOG-SVM, LatSVM-V2, and our approach

Experimental Comparisons:

Experimental comparisons of different part-based models ((a)-(b)) and different schemes of integrating part detection scores ((c)- (f)) on our dataset for pedestrians without occlusions (upper row) and with occlusions (bottom row).

Experimental results on Caltech:

Experimental results on Caltech for pedestrians under no occlusions (left), partial occlusions (center) and heavy occlusions (right). The ratio of occluded area is larger than 0.65 for partial occlusions and [0.2 0.65] for heavy occlusions. The log-average miss rate of our model is 61% for no occlusions and 80% for partial occlusions.

Experimental results on Daimler

Experimental results on Daimler occlusion dataset for HOG-SVM, LatSVM-V2, and our approach.