Li-Jia Li, Hao Su, Yongwhan Lim, Robert Cosgriff, Daniel Goodwin, and Li Fei-Fei
Vision Lab, Stanford University
Figure 1. Overview of Object Bank Representation Extraction
Overview [Home]
Object bank representation is a novel image representation for high-level visual tasks, which encodes semantic and
spatial information of the objects within an image. In object
bank, an image is represented as a collection of scale-invariant responsemaps of a large
number of pre-trained generic object detectors. In Figure 1, we show the feature extraction process.
Using simple, off-the-shelf classifiers such as linear support
vector machines and logistic regression, we show that this high-level
image representation can be used effectively for high-level visual
tasks such as object and scene image classification, image annotation
and image retrieval. The results surpass reported
state-of-the-arts performance on a number of standard benchmark
datasets.
Software [Home]
The executable takes in an image in any standard image format (e.g., jpg). Using pre-trained object filters (included in the package), it outputs objectbank representation in text format.
Highlights
- Source code is implemented in MATLAB and C++ with optimized time complexity.
- Pre-trained object filters are contained in the package, providing high flexibility on object filter selection.
- Responsemap of each object filter can be chosen as an optional output.
- System admits any number of pre-trained object filters; the current list can be found here.
- MATLAB version takes approximately 7 seconds per image.
Downloads
- feature extraction source code: C++ and MATLAB (7 seconds per image)
- classifcation source code: MATLAB
Benchmark
Below are two example benchmark results on MIT-Indoor and UIUC-Event using linaer SVM (OB-SVM) and linear iregression (OB-LR).
The extracted object bank features of these two datasets can be downloaded here: MIT-Indoor and UIUC-Event.
References [Home]
[1] Li-Jia Li*, Hao Su*, Eric P. Xing and Li Fei-Fei. Object Bank: A High-Level Image Representation for Scene Classification and Semantic Feature Sparsification. Proceedings of the Neural Information Processing Systems (NIPS), 2010. (*indicates equal contribution) [PDF]
[2] Li-Jia Li*, Hao Su*, Yongwhan Lim and Li Fei-Fei. Objects as Attributes for Scene Classification. Proceedings of the 12th European Conference of Computer Vision (ECCV), 1st International Workshop on Parts and Attributes, 2010. (*indicates equal contribution) [PDF]
[3] A. Oliva and A. Torralba. Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision (IJCV), 42, 2001.
[4] D. Lowe. Object recognition from local scale-invariant features. Proceedings of International Conference on Computer Vision (ICCV), 1999.
[5] S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2006.
[6] A. Quattoni and A. Torralba. Recognizing Indoor Scenes. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
[7] Li-Jia Li and Li Fei-Fei. What, where and who? Classifying event by scene and object recognition. IEEE International Conference in Computer Vision (ICCV), 2007.