Project 01
Title: Classify images in a hierarchy with uncertainty
Faculty: Alex Berg
Description: For instance, given an image, determine where it fits in the wordnet hierarchy. The result should specify uncertainty, for instance a picture of an unknown furry animal might be somewhere under the animal part of the hierarchy, but probably not in the reptile part.
This could be addressed either in terms of concrete computer vision features, or in terms of more abstract features, but there should be some mathematical formalism in the approach.
Project 02
Title: Image reranking
Faculty: Tamara Berg
Description: Given a set of web pages returned for an object query, rerank the images contained on these pages so that images at the top of the ranking depict the object while those at the bottom of the ranking do not. You should perform your reranking using information extracted from the images themselves in combination with information extracted from the surrounding text. Explore various methods for combining word and image information and compare to handicapped versions of your method using only image or only text information. You might think about using external sources of information such as wikipedia, or wordnet.
One source of web data for this project is from the Visual Geometry Group at Oxford: http://www.robots.ox.ac.uk/~vgg/data/mkdb/index.html
Project 03
Title: Classificationbased Tracking
Faculty: Robert Collins
Description: In this project we view tracking as a foregroundbackground classification problem. Given an initial frame of video where an object of interest has been indicated by a bounding box, we sample image patches from the object to form a positive training set, and patches from the background region surrounding the object to form a negative training set. Each sampled patch is described by a set of extracted features, e.g. RGB color histograms, oriented gradient histograms, motion/flow features, etc. The positive and negative examples are used to train a classifier to label patches in a new image as either object or background. Applying this classifier to patches in a new image produces a confidence map that can be used to localize the object in the new image, e.g. by meanshift. After localization, new samples of object and background can be extracted and added to the positive and negative training sets, a new classifier can be learned for use in the next frame, and so on, resulting in a tracker that automatically adapts to changes in appearance of both the object and the background. Unfortunately, naive implementation of this adaptive scheme inevitably results in tracker drift, so mechanisms for avoiding drift will also be explored.
Project 04
Title: Putting bounding boxes on objects: a SemiSupervised Approach
Faculty: Li FeiFei
Description: Putting bounding boxes on objects of interest in images is a laborious task. This project exploits possible techniques to do it in a semisupervised way. Here is the setting: For each class of objects, we have a set of hundreds of images (e.g. raccoons in photos). In addition, a small set of these photos (about ~50) already contain bounding boxes on the object of interest (e.g. bounding boxes on raccoons contained in these photos). Can you leverage on this information and complete the bounding box annotation for the rest of the photos that contain raccoons?
3 classes of objects and their images are provided. Each class will contain 500 images, 50 of which are annotated with bounding boxes. The dataset is available here.
Project 05
Title: Clustering
Faculty: Tony Jebara
Description: Clustering is an unsupervised algorithm which can potentially separate different classes of objects or observations in a dataset without knowing any labels beforehand. Consider using clustering to solve binary classification tasks where all the classification labels are hidden.
Download 5 UCI classification datasets of your choice from: http://archive.ics.uci.edu/ml/.
If these are not binary classification problems, simply form a twoclass binary classification problem by only including the largest two classes. You will use clustering to attempt to separate these two classes. Run the spectral clustering algorithm of Ng, Jordan and Weiss using the radial basis function kernel. Evaluate and plot the resulting classification error rate and also plot the ratio cut, sparse cut and normalized cut scores achieved by the algorithm. For each dataset, show these four performance measures while sweeping different values of the bandwidth in the kernel.
Project 06 & 07 & 08
Faculty: Yann LeCun
Background information about these project can be found in this paper: http://yann.lecun.com/exdb/publis/index.html#jarretticcv09. Kevin Jarrett, Koray Kavukcuoglu, Marc'Aurelio Ranzato and Yann LeCun: "What is the Best MultiStage Architecture for Object Recognition?," Proc. International Conference on Computer Vision (ICCV'09), 2009.
Project 06 Title: Learning features with predictive sparse decomposition
Description: Learning features with Predictive Sparse Decomposition:
Predictive Sparse Decomposition (PSD) is a method for learning sparse features in an unsupervised manner. The method is described in this paper: http://yann.lecun.com/exdb/publis/pdf/koraypsd08.pdf. The method minimizes the following energy function:
L(Y,Z,W) = YWd.Z^2 + k.Sparse(Z) + ZG(We,Y)^2
where Y is an image patch, Z is the feature vector representing Y, Wd and We are matrices, and G() is a nonlinear function (usually a tanh sigmoid function). For a given Y, we find the corresponding feature vector Z by minimizing L with respect to Z. Then, we can learn Wd and We by performing a gradient step to minimize L. (the columns of Wd must be normalized).
Implement PSD, and use the system built for project 06 (below) to test them. The image patches should be 9x9 pixels or 16x16 pixels. The images should be preprocessed with a highpass filter (replace each pixel by itself minus a weighted average of its neighbors). and (optionally) a local contrast normalization. (divide each pixel by the weights standard deviation of its neighbors).
Project 07 Title: Learning features with denoising autoencoders
Description: "Denoising autoencoder" is a method to train feature extractors. The method is described here: ICML paper: http://www.iro.umontreal.ca/~vincentp/Publications/vincent_icml_2008.pdf
ICML video: http://videolectures.net/icml08_vincent_ecrf/
more information: http://www.iro.umontreal.ca/~vincentp/publications.html
Train a denoising autoencoder on natural image patches of size 9x9 or 16x16. The images should be preprocessed with a highpass filter (replace each pixel by itself minus a weighted average of its neighbors) and (optionally) a local contrast normalization. (divide each pixel by the weights standard deviation of its neighbors).
Project 08 Title: Building the simplest object recognizer
Description: As described in "What is the Best MultiStage Architecture for Object Recognition?", build a feature extraction system as follows:
 Use sizenormalized object images from one of the standard datasets (e.g. Caltech 101).
 Preprocess them with a highpass filter (replace each pixel by itself minus a weighted average of its neighbors) and
(optionally) a local contrast normalization (divide each pixel by the weights standard deviation of its neighbors).
 Apply 64 random filters of size 9x9 or 16x16 over the entire image.
 Pass the outputs through an absolute value rectification.
 Perform highpass filtering and local contrast normalization on the resulting 16 feature maps.
 Perform spatial pooling and subsampling on each of the 16 feature maps. The subsampling ratio can be 4x4 or so.
 Apply 64 random filters of size 9x9 or 16x16 to each of the 64 feature maps. This produces 4096 feature maps. Reduce this to 256 feature maps by adding random subsets of 64 of the 4096 feature maps.
 Pass the outputs through an absolute value rectification.
 Perform highpass filtering and local contrast normalization on the resulting 16 feature maps.
 Perform spatial pooling and subsampling on each of the 16 feature maps the subsampling ratio can be 4x4 or so.
 Feed the resulting features to a multinomial logistic regression classifier or to a linear SVM. Train this classifier in supervised mode.
This should get between 60 and 65% correct on Caltech101.
Project 09
Title: Symmetrybased saliency detection from unsegmented images
Faculty: Yanxi Liu
Description: Symmetry detection has been a longstanding research topic in computer vision. This project will help you to appreciate the difference between human and machine visual perception of real world symmetries, and the challenges of symmetry detection/learning for computers, which is usually considered trivial and instantaneous for humans.
Find 10 images (or take some photos of your own), ask FIVE people to label all the symmetry parts on these 10 images. Symmetry can include: reflection symmetry with a reflection axis, rotation symmetry with a rotation center and a number of fold (for example, the star on the Chinese flag has a 5fold rotation symmetry), or even translation symmetry which are periodic patterns such as a façade of a building with two translation generating vectors forming the smallest generating tile of the pattern. Write an algorithm to extract symmetries, you can focus on a SINGLE (reflection, rotation or translation) type of symmetries. Finally, compare the computer output with those labeled by the FIVE human observers.
References and Tools:
 Data sources on line:
o Tools for human labeling (both images and interfaces): http://vision.cse.psu.edu/SymEva_files/Page406.htm
o Various types of real images demonstrating real world regularities can be found here, you are also encouraged to contribute to this database (!): http://vivid.cse.psu.edu/texturedb/gallery/
o You are also encouraged to use images from publicly available object categorization and recognition data sets, e.g. from VOC2009 http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2009/
 A survey paper by Professor Liu et al will be distributed before Thursday
 Performance Evaluation of StateoftheArt Discrete Symmetry Detection Algorithms. (CVPR 2008) Minwoo Park, Seungkyu Lee, James Hays, PoChun Chen, Somesh Kashyap, Asad Butt and Yanxi Liu. http://vision.cse.psu.edu/evaluation.html
 Curved GlideReflection Symmetry Detection (CVPR 2009) Seungkyu Lee and Yanxi Liu
Project 10
Title: Face image alignment
Faculty: Yi Ma
Description: We will provide you a set of wellaligned face images of a person, taken under different lighting conditions. Now, given a new image of the person, roughly cropped from a picture, please write an algorithm to automatically align the face image with the rest correctly. Your algorithm should work for input images that may have different rotation, or scale than the wellaligned ones.

Project 11
Title: Discovering and matching planar structures from images
Faculty: Silvio Savarese
Description: This project aims at automatically discovering and matching local planar regions in complex scenes observed from different viewpoints. Learning techniques based on random sample consensus (RANSAC) are explored for estimating the geometrical transformation connecting observed planar surfaces across views and thus enabling robust matching procedures. The project will also address the issue of detecting and matching multiple planar regions by using iterative techniques such as sequential RANSAC or JLinkage. For this project will we provide images extracted from a video sequence portraying a complex urban scene comprising different semilocal planar structures such as building facades, sidewalks, and/or advertisement panels.
Project 12
Title: Name 10 objects in the picture
Faculty: Jianbo Shi
Description: The name should be category based. We can redefine a (long) name list, or they can come up with their own. The goal is to discover objects, instead of looking for few specific objects.
version 1. Just put down names.
version 2. Segment + Name.
Project 13
Title: Object counting
Faculty: Jianbo Shi
Description: Given an image with repeated objects appearing multiple times, count them. For example, 4 (chair like object), 2 (table like objects). Note, we don't need to name, we just need to be able to discover repeated 'thing'.
Project 14
Title: Shape from texture
Faculty: Sinisa Todorovic
Description: This project will address unsupervised estimation of the 3D shape and orientation of textured surfaces depicted in an image. Shape from texture is an important step toward higherlevel image understanding, and thus one of the fundamental problems in computer vision. If a surface is textured, i.e., characterized by a spatial repetition of primitive texture elements (or texels), its 3D properties can be estimated by analyzing the texel shape, size, and placement properties in the image. For example, texels lying along parts of the surface that are far away from the camera will appear smaller in the image than the texels that are closer to the camera. Since the texels are statistically similar to each other, identifying the dominant trend in variations of texel properties (e.g., the gradient of foreshortening) can directly be used for 3D shape estimation. We will focus on images containing multiple textured surfaces.
Project 15
Title: Choices of features, classifiers, and representations for non rigid objection
Faculty: Zhuowen Tu
Description: The performance of object detection systems is determined by several key factors: (1) the learning algorithm; (2) the feature set; and (3) the underlying representation. Considerable recent progress has been made on these three aspects. In this project, we will use the INRIA dataset as a testbed (http://pascal.inrialpes.fr/data/human/) on the pedestrian detection problem.
To test the effectiveness of different classifiers, the students can choose to compare the effectiveness of different discriminative classifiers. A paper with empirical studies on machine learning dataset is at http://www.cs.cornell.edu/~caruana/ctp/ct.papers/caruana.icml06.pdf. Some typical ones include (but not limited):
(1) Support Vector Machine (SVM) (a welldocumented implementation can be found at http://www.csie.ntu.edu.tw/~cjlin/libsvm/).
(2) Boosting (based on decisionstump and decisiontree week classifier)
(3) Knearest neighborhood (one can use KD tree for fast retrieval)
(4) randomforest classifier
To test the effectiveness of different features, one can try to use:
(1) the HOG features (http://lear.inrialpes.fr/people/triggs/pubs/Dalalcvpr05.pdf)
(2) Haar features (Paul A. Viola, Michael J. Jones: Robust RealTime Face Detection. IJCV 57(2): 137154 (2004))
(3) a large number of features including HOG, Haar, and features from different channels
To test the effectiveness of different representations, one can choose to learn object parts for detection:
(1) a latent SVM implementation (http://people.cs.uchicago.edu/~pff/latent/)
(2) a multiple component learning algorithm (http://vision.ucsd.edu/~pdollar/research/papers/DollarEtAlECCV08mcl.pdf)
Project 16
Title: Statistics of SIFT/HOG representation for objects
Faculty: Zhuowen Tu
Description: SIFT/HOG features have been widely used in the computer vision literature. Different objects (or different parts in the same object) may observe different texture patterns. It is important to know when and how to use them as a basic object representation.
In the project, the students can choose a set of object classes from e.g. the PASCAL, LHI or LabelMe dataset, and empirically study the manifold of SIFT, as well as its behavior for different object categories in classification. If successfully carried, this project will provide some useful empirical guidelines for the use of SIFT/HOG features, which is somewhat lacking in the literature.
Project 17
Title: Object recognition
Faculty: Eric Xing
Description: The Caltech 256 dataset contains images of 256 object categories taken at varying orientations, varying lighting conditions, and with different backgrounds. http://www.vision.caltech.edu/Image_Datasets/Caltech256/.
You can try to create an object recognition system which can identify which object category is the best match for a given test image. Apply clustering to learn object categories without supervision. Here are three ideas you can possibly work on:
1) The "codebook" used in the original CVPR05 paper by FeiFei was generated using a data preprocessing procedure, i.e., clustering the visual elements, and pick the centroids as "codewords". When applied to a generative model, the observed visual elements in a given image is matched to a "codeword" through a hard assignment. You may want to improve the flexibility of the model by allowing a soft assignment so that a given visual element in the data can possibly matched to multiple different "codewords" with uncertainty through a noisy channel. You may also want to eliminate the preprocessing step by learning the codebook jointly. In this project you are asked to design (with help from the instructor) and implement such a model.
2) In the mixture of topic model used in FeiFei's CVPR 05 paper, the building block is an LDA model. Now you are asked to changed it to a mixture of LogisticNormal topic model. This change does not only enrich the model so that it can now capture correlations between topics, but also it allows a direct upgrade of the original model to a dynamic model that evolves over time, and therefor can be applied to perform tasks such as object tracking and trajectory modeling in video data.
3) In you have a solid understanding of the above two steps, and the nonparametric Bayesian models, you can extend the above models into a semiparametric model by including a Dirichlet process, or HDP, or a Dynamic CRP prior, so that you can model your data in image/video datasets without prespecifying the number of image classes, number of codewords, number of objects, number and duration of trajectories, etc. Many of these are open topics in vision and ML in general.
4) You can also try to create a discriminative maxmargin topic model based on "J. Zhu, A. Ahmed and E. P. Xing, MedLDA: Maximum Margin Supervised Topic Models for Regression and Classification, The 26th International Conference on Machine Learning (ICML 2009)" for images to boost up its performance on classification, and to learn truly discriminative topics. Again, no earlier attempt has been made along this direction.
Project 18 & 19
Faculty: SongChun Zhu
In computer vision and pattern recognition, a visual concept, such as a texture pattern, a shape, and an object category, is often defined by a set of instances which could be governed by a probability model. Thus it is of significant interest to study algorithms for (1) simulating the model by drawing typical examples, (2) estimating certain quantities about this model, such as the cardinality of the set, the expectations of certain statistics. The following exercises are two classical problems that should be solved through Markov chain Monte Carlo and importance sampling.
Project 18 Title: Simulating a model
Description: In an NxN lattice with torus boundary condition and 4nearest neighborhood, for any site s=(i,j), a label x(s) is defined in a set {1, 2, 3, ..., K}. Let's start with K=2 and then generalize to arbitrary K later. This label forms a random field X={ x(s) }. For a pair of neighboring sites (s,t), we define an indicator function 1(x(s)=x(t)), it is equal to 1 if the labels x(s)=x(t) and 0 otherwise.
Now we define a simple pattern in a set, C = { X: E[ 1(x(s) = x(t) ] = h, for all (s,t) } , h is a constant in [0,1] and measures how likely two nearby sites have the same label.
Question:
1, Derive a probability model for this set C.
2, Design an algorithm that can draw fair samples from this set.
3, How to diagnose your sample is a fair sample? This is called exact sampling.
4, Adjust h to locate the critical temperature where the sampler slows down.
5, Develop a cluster sampling algorithm which can draw samples in polynomial time.
6, Plot curves for the convergence, and show typical images at various steps.
Project 19 Title: Estimating the size of a set
Description: Consider a N x N grid, we cosider a selfavoidingwalk (SAW) as a path
starting from site [0,0], i.e. the bottomleft corner, it moves to one of its immediate nearest
neighbor (4Nearest neighborhood) provided that neighbor has not been visited before.
It stops when all the surrounding neighbors have been visited.
We want to know how many distinct SAW paths exist in a 7x7 or 10x10 grid.
Note that this number of very big, you won't be able to enumerate the paths.
