CUHK-Search-Reranking (CUHKSR) Dataset
1. Overview
CUHK-Search-Reranking (CUHKSR) dataset is for research on image re-ranking.
Data set |
Images for re-ranking |
Images of reference classes |
|||
# Keywords |
Collecting date |
Search engine |
Collecting date |
Search engine |
|
I |
120 |
Jul-10 |
Bing |
Jul-10 |
Bing |
II |
Jul-10 |
|
|||
III |
10 |
Aug-09 |
Bing |
Jul-10 |
Bing |
Note:
1) The images for re-ranking are the same in data set I and II
2)
The images of reference classes in data set III are the same with those in data
set I
2. Downloads
The dataset can be downloaded from following FTP:
Url: 137.189.35.203
Port: 21
Username: CUHKSRData
Password: fc4lmge
3. Reference
Please cite as:
X. Wang, K. Liu and X.
Tang, ¡°Query-Specific Visual Semantic Spaces for Web Image Re-ranking¡±, in Proceedings of IEEE
Computer Society Conference on Computer Vision and Patter Recognition (CVPR) 2011. [PDF] [Project Website]
4. Data
Description
Note: Each zip file contains 120 folders, corresponding to 120 query keywords.
Data |
File Name |
File Size |
Description |
Images for re-ranking in data set I and II |
BingReRanking(set I and II).zip |
~900Mb |
Within each query¡¯s folder, there¡¯re two folders: Data and Images. The Images folder contains the ~1000 images (resized to be 160*160 at most). These images files are named as XXXXimage.jpg, where XXXX is a 4-digit ID for the image (e.g., 0000image.jpg, 0001image.jpg, etc.) The Data folder contains two files: Metadata.txt and Labels.txt. Their formats can be found below |
Webpages of images for re-ranking in data set I and II |
BingReRanking_Htmls(set I and II).zip |
~3.6Gb |
The webpages are placed under Htmls folder, named as XXXXtext.html (e.g. 0001text.html) |
Images for re-ranking in data set III |
BingReRanking(set III).zip |
~180Mb |
The organization of data is the same as ¡°Images for re-ranking in data set I and II¡± |
Webpages of images for re-ranking in data set III |
BingReRanking_Htmls(set III).zip |
~370Mb |
The organization of data is the same as ¡°Webpages of images for re-ranking in data set I and II¡± |
Metadata of Images of reference classes in data set I |
BingRef_Metadata(set I).zip |
~240Mb |
Within each query¡¯s folder, there¡¯re ~30 txt files. Each
txt file is named by a query keyword expansion, and its format is the same as
metadata file for images for re-ranking. |
Metadata of Images of reference classes in data set II |
GoogRef_Metadata(set II).zip |
~200Mb |
The organization of data is the same as ¡°Images of reference classes in data set I¡± |
a) Metadata.txt
The metadata of each image takes up three lines, followed by a blank line. The
three lines are: ID, image url and the url of page containing the image.
Following is an example:
0000
http://www.usageorge.com/Wallpapers/Computer/wallpaper/Apple-Macintosh.jpg
http://www.usageorge.com/Wallpapers/Computer/Apple-Macintosh.html
b) Labels.txt
Labels.txt contains the labeled ground truth results of the images. It looks
like
0000
apple wallpaper
apple logo
0001
red apple
which means that image 0000 is categorized into ¡°apple wallpaper¡± and ¡°apple
logo¡± (an image may be categorized into multiple classes), while image 0001 is
categorized into ¡°red apple¡±. Note that the ids in Labels.txt may not be in alphabetical
order.