[General Policy]
[Segmentation Project]
[Tracking Project]
[Detection Project]
General Policy:
- We provide skeleton code or each project. It's recommended to base your project around this code.
- You may be able to find code online implemented by the authors. While it's ok to look at external code, it's an honor code violation to directly copy code from existing implementations. Your code will be read by the course instructors.
- We allow for both 1-person or 2-person teams for the projects. Each team hands in one copy of the codes and writeup. Grading will be fair regardless of the team size.
- Your project will be evaluated based on many factors, including your understanding of the algorithm, performance of your algorithm, quality of your code, your creativity, write-up, and presentation.
- It is estimated to take you 20-30 hours for each project, depending on your familiarity with the algorithm and how far you want to go for the projects.
- You are not expected to implement all the details of the algorithm or really achieve state-of-the-art performance, but if your algorithm works particularly well, you will get extra credit.
Grading Policy per Project:
- Technical approach and code: 35% (including correctness and completeness of your method, your innovation, etc)
- Experiment evaluation: 35% (including performance and results of your method, thoroughness of your experiments, insights and analysis that you draw from your results)
- Write-up quality: 20% (please read the write-up sample carefully)
- Project presentation: 10% (clarity and content of your project presentation)
Late Days:
We allow seven late days in total for all the three projects. Once you have used up these late days, the project turned in late will be penalized 20% per late day.
Project 1: Interactive Image Segmentation with GrabCut
Project Description:
Dataset and Experiment:
References:
Y.Boykov and M.Jolly. Interactive Graph Cuts for Optimal Boundary and Region Segmentation of Objects in N-D Images. ICCV 2001.
C.Rother, V.Kolmogorov, and A.Blake. "GrabCut" - Interactive Foreground Extraction using Iterated Graph Cuts. SIGGRAPH 2004.
R.Szeliski, R.Zabih, D.Scharstein, O.Veksler, V.Kolmogorov, A.Agarwala, M.Tappen, and C.Rother. A Comparative Study of Energy Minimization Methods for Markov Random Fields with Smoothness-Based Priors. PAMI 2008.
V.Lempitsky, P.Kohli, C.Rother, T.Sharp. Image segmentation with a Bounding Box Prior. ICCV 2009.
Important Dates:
4-5 slides for presentation due to jkrause@cs: Sun, Apr 19 (5:00 pm)
A short presentation of your work: Mon, Apr 20 (in class)
Deadline of submitting the code and write-up (submit via CourseWork): Tues, Apr 21 (5:00 pm)
Project 2: Tracking
Project Description:
In this project, you will design the learning and detection components
of an online single-target tracker such as TLD.
We have provided you a modified version of the original Kalal et al. tracker to
get you started: starter_code
We do not expect you to use any other files from the original OpenTLD implementation
What you need to implement: You are expected to fill the sections marked as "TODO" in the following files:
run_TLD_on_video.m
tld/tldDetection.m
tld/tldGeneratePositiveData.m
tld/tldGenerateNegativeData.m
tld/tldInit.m
tld/tldLearning.m
tld/tldNN.m
tld/tldPatch2Pattern.m
tld/tldProcessFrame.m
tld/tldTrainNN.m
Carefully read through each of the TODO sections before implementing them.
Please note that this is an open ended assignment, and we expect you to play around with the model parameters in run_TLD_on_video.m, as well as the features, detection and learning methods to understand the tracking problem.
Dataset and Experiments:
Evaluation:
We have also provided an evaluation script tldEvaluateTracking.m in tld/ folder.
This code accepts a ground-truth text file and the detection text file generated by
tldExample.m to provide the final tracking performance.
We will use two evaluation metrics: average overlap,
and mean Average Precision. In addition, for this project we also expect you
to report the time taken for tracking the object in each video.
Dataset:
Download the tracking dataset from tiny_tracking_data.
The dataset contains
a validation set and a test set. You are expected to fine-tune your model
parameters on the validation set and retain the same values for the test set.
The validation and test splits are provided in val_test_splits.txt.
The validation set contains 4 videos and test sets contain 5 videos.
The video sequences also contain a ground-truth
text file. Every line of the file provides the true bounding box co-ordinates for
the box in the corresponding frame. The "img" folders in the video sequences
also contain an "init.txt" file which provides the intializing bounding box.
Getting started:
The starter code uses opencv to implement the KLT tracker.
If you plan to run the code on corn.stanford.edu, we have
provided a compile_corn.m file.
Run this file to generate the mex files. If you have a local installation
of OpenCV, then change the dependencies in the compile_all_platforms.m
file accordingly.
To get started, test your code on a sample video as shown in main.m
A reasonable implementation should get you a mAP of 1.0 and average
overlap ≥ 0.8 on the first 20 frames of the provided
"Car4" sequence.
You should also be able to get an average overlap ≥ 0.68
and mAP ≥ 0.78 on the full "Car4" sequence.
Analytical study and extensions: You should have at least one or two of these, here are some ideas:
Feature represantations for detection/learning.
Try different features such as HOG or
binary features like BRIEF, FREAK, BRISK etc.
Keypoint/grid/pyramid sampling strategies for feature extraction.
Classification methods such as linear/kernel SVM, Random forests, boosting,
decision trees.
Strategies for generating positive and negatives such as warping,
noise addition, shifting of confident positive and negative patches.
An online learning method for speed-up.
Strategies for fusing the dtection bounding boxes from your learned detector, and
the boxes obtained from the KLT tracker.
References:
Kalal et al. Tracking-Learning-Detection. PAMI 2010.
Ojala et al. Local Binary Patterns (LBP). PAMI 2002.
Calonder et al. BRIEF . ECCV 2010.
Important Dates:
Presentation slides due to alahi@stanford.edu: Sun, May 10 (5:00 pm)
A short presentation of your work: Mon, May 11 (in class)
Deadline of submitting the code and write-up: Tues, May 12 (5:00 pm)
Project 3: Detection with R-CNN
Project Description:
In this project, you will be implementing the core algorithm of an R-CNN.
Starter code is available here.
What you need to implement: As detailed in 'readme.txt' within the starter code, you are expected to implement the following:
extract_region_feats.m
train_rcnn.m
train_bbox_reg.m
test_rcnn.m
If you'd like to use python instead, that's fine, but be prepared to re-implement the given starter code. In particular, hooking up python with caffe may take some work (but is not impossible: caffe has a python wrapper).
Dataset and Experiments:
Evaluation:
Detection evaluation is based on mean average precision (mAP), and you should report your average precision for each class as well as show precision recall (PR) curves. Code has been provided to help with these calculations.
Dataset:
Download the images here.
The dataset contains a training and a test set. In practice, one would also have a validation set to tune parameters. Since R-CNNs take a while to run, we're skipping that for the class.
Details of the dataset format are available in readme.txt
Analysis and extensions: You should have at least a few of these (depending on the depth of the extension), here are suggestions:
Feature representation. Compare CNN features with more traditional vision features, e.g. HOG, LLC, fisher vectors.
Which layer of the CNN to use.
Parameters used during training (regularization, overlap thresholds, etc.)
Bounding box offset parameterizations for bounding box regression.
Training your own, larger CNN.
Fine-tuning the CNN on the given images.
Other methods for detection, e.g. DPM.
Simultaneous detection and segmentation.
References:
Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014.
Important Dates:
Presentation slides due to jkrause@cs: Tues, June 2 (5:00 pm)
A short presentation of your work: Wed., June 3 (in class)
Deadline for submitting code and write-up: Thurs, June 4 (5:00 pm)