Multimedia Laboratory

Action recognition is an important topic in computer vision and video analysis, whose main goal is to determine what people are doing from an observed video. This is an easy task for human perception, but very chanllenging to design robust algorithms. The main difficulties of action recognition come from large intra-class variations caused by scale and viewpoint chages, camera motions, background clutter, low quality of video data and so on. Action recognition has wide applications in video surveillance, human-computer interface, sports video analysis, and content based video retrieval.

Our research works mainly focus on recognizing human action from realistic videos such as youtube videos, moives. Currently, we try to design effective visual representation of video data and model the temporal structure of complex actions. We verify our algorithms on many public large scale dataset such as HMDB51, UCF50, and Olympic Sports Dataset. We obtain high performance on these challenging datasets.

Motionlets: Mid-Level 3D Parts for Human Motion Recognition
L. Wang, Y. Qiao, and X. Tang, In Proceedings of IEEE Conference on Computer Vision and Patter Recognition (Poster, CVPR 2013).

This paper proposes motionlet, a mid-level and spatiotemporal part, for human motion recognition. Motionlet can be seen as a tight cluster in motion and appearance space, corresponding to the moving process of different body parts. We postulate three key properties of motionlet for action recognition: high motion saliency, multiple scale representation, and representative-discriminative ability. Towards this goal, we develop a data-driven approach to learn motionlets from training videos. First, we extract 3D regions with high motion saliency. Then we cluster these regions and preserve the centers as candidate templates for motionlet. Finally, we examine the representative and discriminative power of the candidates, and introduce a greedy method to select effective candidates. With motionlets, we present a mid-level representation for video, called motionlet activation vector. We conduct experiments on three datasets, KTH, HMDB51, and UCF50. The results show that the proposed methods significantly outperform state-of-the-art methods.
PDF

Highlights