Human perception of music and image are highly correlated. Both of them can inspire human sensation like emotion and power. This project aims at jointly modeling the relationship between music and image, in order to bridge the gap between the two different kinds of semantic understanding. Many previous psychology and cognition studies indicate that brain information processing of visual and audio are related. For example, music can stimulate visual imagery, and visual imagery is found to be an important mechanism by which music brings emotion. Meyer once discussed that "it seems probable that image processes play a role of great importance in the musical affective experiences of many listeners". We believe it too.
* It is hard to find the well-matched music-image pairs. Therefore, we collected a set of 47,888 music-image pairs from more than 1,500 music videos. Half of these pairs have been asked by labelers to compare their matching degree. The labelers largely agree with each other on the annotation, which indicates that human have consensus on matching music and images based on music video.
* Since there exist two difficulties. First, image and music have different feature representations. Secondly, both image space and music space exhibit complex structure, and the relationship between them is nonlinear.We develop Multiple Ranking Canonical Correlation Analysis (MR-CCA) to deal with this problem. MR-CCA clusters music-image pairs according to their music sides, and utilizes Ranking CCA to model the local relationship for each cluster.
Cross Matching of Music and Image
X. Wu, Y. Qiao, X. Wang and X. Tang, Proceedings of the 20th ACM international conference on Multimedia, 2012.
This paper investigates how to model the relationship between music and image using music-image pairs extracted from music videos. We have two basic observations for this relationship: 1) music space exhibits simpler cluster structure than image space, and 2) the relationship between the two spaces is complex and nonlinear. Based on these observations, we develop Multiple Ranking Canonical Correlation Analysis (MR-CCA) to learn such relationship. MR-CCA clusters the music-image pairs according to their music parts, and then conducts Ranking CCA (R-CCA) for each cluster. It has potential applications in video generation, background music recommendation, and joint retrieval of music and image.
Automatic Music Video Generation: Cross Matching of Music and Image
X. Wu, X. Bing, Y. Qiao, and X. Tang, Proceedings of the 20th ACM international conference on Multimedia, 2012.
In this paper, we present a system which can automatically generate music video for a given song. The challenge of such system comes from how to select relative images and align them with the song. This paper deals with this challenge by leveraging lyrics (if exists) and the semantic similarity between music and image. We retrieve related image in internet with lyrics keyword as query and use a learning based method to estimate a semantic score between an image and a music segment. Finally we construct a music video after quality filtering and refinement. Our system also allows users to upload their images and re-pick recommended images to personalize the music video.