CS 323: Understanding Images and Videos: Recognizing and Learning High-Level Visual Concepts

Please scroll down for all the paper downloads.

Lecture	Date	Description	Readings	Presenter
1	Wed, Sep 23	Class introduction
	Wed, Sep 30	Class cancelled Make-up Session: 9am - 12pm, Fri Oct 9
2	Wed, Oct 7	Course project papers	Deng et al 2009; Ikizler and Forsyth 2008;	Piyush; Louis;
3	Fri, Oct 9 9am - 12pm	Object recognition tutorial	L. Fei-fei, R. Fergus, and A. Torralba 2005;	Fei-Fei;
4	Wed, Oct 14	Pictorial structure	Felzenwalb et al 2005; Felzenwalb et al 2009;	I-Ting;
5	Wed, Oct 21	3D object categorization	Savarese et al 2007; Su et al. 2009;	Siddharth;
6	Wed, Oct 28	Object in context; Project proposal due	Hoiem et al. 2007; Gupta and Davis 2008;	Jaewon;
7	Wed, Nov 4	Natural scene understanding	Fei-Fei et al 2005; Lazebnik et al 2006;	Zixuan;
8	Wed, Nov 11	Total scene understanding	Li et al 2009; Yao et al; Tu et al;	Georgios; Haider;
9	Fri, Nov 20	Human action recognition	Laptev 2008; Babenko 2009;	Amir;
	Wed, Nov 25	NO CLASS, Thanksgiving break
10	Wed, Dec 2	Video analysis	Gupta et al 2009; Y. Ke et al 2007;	Ashish; Michael;
11	TBA	Course project presentation
	Fri, Dec 11	Course project due

References

Lecture #2:

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li and L. Fei-Fei. (2009) ImageNet: A Large-Scale Hierarchical Image Database, To Appear in IEEE Computer Vision and Pattern Recognition (CVPR).

N. Ikizler and D.A. Forsyth (2008) Searching for Complex Human Activities with No Visual Examples, Int. J. Computer Vision. Vol. 80, no. 3, pp. 337-357.

Lecture #3:

L. Fei-fei, R. Fergus, and A. Torralba. (2006) Recognizing and learning object categories, http://people.csail.mit.edu/torralba/iccv2005, Tutorial presented at ICCV 2005 pages visited Feb. 7, 2006.

Lecture #4:

P. Felzenszwalb, D. Huttenlocher (2005). Pictorial Structures for Object Recognition, International Journal of Computer Vision, Vol. 61, No. 1, January 2005.

P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan. (2009) Object Detection with Discriminatively Trained Part-Based Models, IEEE Pattern Analysis and Machine Intelligence (PAMI). Accepted for publication.

Lecture #5:

S. Savarese and L. Fei-Fei. (2007) 3D generic object categorization, localization and pose estimation, IEEE International Conference in Computer Vision (ICCV). 2007.

*M. Sun, *H. Su, S. Savarese and L. Fei-Fei. (2009) A Multi-View Probabilistic Model for 3D Object Classes, To appear in IEEE Computer Vision and Pattern Recognition (CVPR) (*indicates equal contributions)

Lecture #6:

D. Hoiem, A. Efros, and M. Herbert. (2006) Putting Objects in Perspective, Proc. IEEE International Conf. Computer Vision and Pattern Recognition (CVPR).

A. Gupta, L. Davis. (2008) Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers, Proceedings of the 10th European Conference on Computer Vision: Part I.

Lecture #7:
L. Fei-Fei and P. Perona. (2005) A Bayesian Hierarchical Model for Learning Natural Scene Categories. IEEE Comp. Vis. Patt. Recog.

S. Lazebnik, C. Schmid, and J. Ponce, Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, New York, June 2006, vol. II, pp. 2169-2178.

Lecture #8:
L.-J. Li, R. Socher and L. Fei-Fei. (2009) Towards Total Scene Understanding:Classification, Annotation and Segmentation in an Automatic Framework, To appear in Computer Vision and Pattern Recognition (CVPR). (Oral)

B. Yao, X. Yang, Liang Lin, M.W. Lee, and S.C. Zhu. (2009) I2T: Image Parsing to Text Description, Proceedings of IEEE, (under review, invited for the special issue on Internet Vision).

Z.W. Tu, X.R. Chen, A.L. Yuille, and S.C. Zhu,(2005) Image parsing: unifying segmentation, detection and recognition, Int'l J. of Computer Vision, 63(2), 113-140.

Lecture #9:
I. Laptev, M. Marszałek, C. Schmid and B. Rozenfeld. (2008) Learning realistic human actions from movies, in Proc. CVPR'08, Anchorage, US.

B. Babenko, M.H. Yang, S.J. Belongie. (2009) Visual tracking with online Multiple Instance Learning, in Proc. CVPR'09, pp. 983-990.

Lecture #10:

A. Gupta, P. Srinivasan, J. B. Shi, L.S. Davis. (2009) Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos, in Proc. CVPR'09 pp. 2012-2019.

Y. Ke, R. Sukthankar, and M. Hebert. Event Detection in Crowded Videos, ICCV, 2007.