Bolei Zhou1, Xiaogang Wang2,3, and Xiaoou Tang1,3
1Department of Informaiton Engineering, 2Department of Electronic Engineering, The Chinese University of Hong Kong
3Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
[Paper] [Supplementary Materials] [Code&Tracklets] [Poster] [Video Dataset]
|In this paper, a Random Field Topic (RFT) model is proposed for semantic region analysis from motions of objects in crowded scenes. Different from existing approaches of learning semantic regions either from optical flows or from complete trajectories, our model assumes that fragments of trajectories (called tracklets) are observed in crowded scenes. It advances the existing Latent Dirichlet Allocation
topic model, by integrating the Markov random fields (MRF) as prior to enforce the spatial and temporal coherence between tracklets during the learning process. Two kinds of MRF, pairwise MRF and the forest of randomly spanning trees, are defined. Another contribution of this model is to include sources and sinks as high-level semantic prior, which effectively improves the learning of semantic regions and the clustering of tracklets. Experiments on a large scale data set, which includes 40, 000+ tracklets collected from the crowded New York Grand Central station, show that our model outperforms state-of-the-art methods both on qualitative results of learning semantic regions and on quantitative results of clustering tracklets.
We would like to identify semantic regions of crowded scenes from tracklets.
Semantic Regions: pathways commonly taken by pedestrians in the scene. There are many applications related to semantic region analysis, such as tracking, abnormaly detection, activity analysis.
Tracklets: fragments of trajectory from weak keypoint tracker, such as KLT. The fragmentation is caused by scene clutter and crowd occlusion.
A) The semantic regions learned from the tracklets. B) The plot of the tracklets. They are very fragmented and incomplete.
The challenge is that how to learn semantic regions from such noisy tracklets. Here is a video about trajectories extracted from the crowded scene.
2. Two Key Components in Modeling
There are two key components in our framework: 1) Modeling the high-level correlation between tracklets. 2) Modeling the source and sink of the scene.
- Modeling correlations between tracklets:
1, Pairwise Markov Random Field: to capture the dependencies between two tracklets
2, Spanning tree of MRF: to capture high-level dependencies among several tracklets
- Modeling source and sink of the scene:
Entry and exit locations of the scene, it is the initial position and ending position of pedestrian.
3. Graphical Model
Here is the graphical model of our Random Field Topic Model. We integrate the LDA topic model with MRF to capture the dependencies between tracklets. The model is inferred from Gibbs sampling.
4. Experimental Results & Applications
Here are the statistics of of tracklets extracted from the crowded scenes. These tracklets are highly fragmented and incomplete.
The semantic regions learned from the tracklets are visualized as follows.
Representative semantic regions learned by (A) our model (semantic region indices are randomly assigned by learning process),(B) OptHDP. (C) TrajHDP. The semantic regions learned by our model are more compact.
We further apply the learned semantic region model for trajectory clustering.
Clustering results by (A) our model, (B) OptHDP. (C) TrajHDP.
Please cite our paper if you use the codes or data.
Bolei Zhou, Xiaogang Wang, and Xiaoou Tang. "Random Field Topic Model for Semantic Region Analysis in Crowded Scenes from Tracklets.." Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2011)
update: Nov 13, 2012