CS 323: Understanding Images and Videos: Recognizing and Learning High-Level Visual Concepts

Announcements:

• The deadline for the final project has been extended to Friday Dec 11th , 5pm.

• Class on Nov 18th has been moved to Friday Nov 20th in Gates 200.

• Our room has been changed to Gates 300

• Please email us if you would like to present on the ImageNet paper for October 7th.

• Class on Wed, Sep 30th has been rescheduled for Fri, Oct 9th 9am - 12pm.

Instructor: Prof. Fei-Fei Li

Office: Room 246 Gates Bldg

Phone: (650)725-3860

Email: feifeili [at] cs [dot] stanford [dot] edu

Office hours: by email appointment

Course Assistant: Andy L. Lin

Email: ydna [at] stanford [dot] edu

Office hours: by email appointment

Class Location and Time:

Wed 2:15-5:05pm - 3 units - Room: Gates 300

Course Description:

The field of computer vision has seen an explosive growth in the past decade. Much of the recent effort in vision research is towards developing algorithms that can perform high-level visual recognition tasks on real-world images and videos. With the development of the Internet, this task becomes particularly challenging and interesting given the heterogeneous data on the web. This course will focus on reading recent research papers that are focused on solving high-level visual recognition problems, such as object recognition and categorization, scene understanding, human motion understanding, etc.

Syllabus:

Weekly reading on recent, state-of-the art papers Course project involving using data from the ImageNet ontology and a Video Dataset

Week 1-2: classic papers in object recognition
Week 3-5: object categorization in 3D, in context and large numbers
Week 6-7: scene understanding
Week 7-8: human motion understanding
Week 9-10: webscale recognition

Pre-req:

Some experience in research with one of the following fields: computer vision, image processing, computer graphics, machine learning.

Textbook:

None required.