Stanford University CS231n: Deep Learning for Computer Vision

Important Dates

Unless otherwise noted, all project items are due by 11:59pm Pacific Time.

Deliverable	Weight	Due Date	Late Days
Project Proposal	1%	04/25	Yes
Project Milestone	2%	05/16	Yes
Final Report	29%	06/04	No
Poster Session (in person) + Poster PDF & Code (submit online)	3%	Poster Session: 06/11; Submitting PDF and Code: 06/10 11:59pm Pacific Time	No

Overview

The Course Project is an opportunity for you to apply what you have learned in class to a problem of your interest. Potential projects usually fall into these two tracks:

Applications. If you're coming to the class with a specific background and interests (e.g. biology, engineering, physics), we'd love to see you apply vision models learned in this class to problems related to your particular domain of interest. Pick a real-world problem and apply computer vision models to solve it.
Models. You can build a new model (algorithm) or a new variant of existing models, and apply it to tackle vision tasks. This track might be more challenging, and sometimes leads to a piece of publishable work.

One restriction to note is that this is a Computer Vision class, so your project should involve pixels of visual data in some form somewhere. E.g. a pure NLP project is not a good choice, even if your approach involves ConvNets. Related areas like shape analysis which are important part of vision conferences, will be allowed.

We have compiled a list of project ideas for inspiration (TBD) that combine recent trend and interesting applications. Note that you do not need to pick one from here. Rather, these can be served as starting points for you to find the ideas that excite you.

To get a better feeling for what we expect from CS231n projects, we encourage you to take a look at the project reports from previous years:

To inspire ideas, you might also look at recent deep learning publications from top-tier conferences, as well as other resources below.

CVPR: IEEE Conference on Computer Vision and Pattern Recognition
ICCV: International Conference on Computer Vision
ECCV: European Conference on Computer Vision
NIPS: Neural Information Processing Systems
ICLR: International Conference on Learning Representations
ICML: International Conference on Machine Learning
Publications from the Stanford Vision Lab
Awesome Deep Vision
Past CS229 Projects: Example projects from Stanford's machine learning class
Kaggle challenges: An online machine learning competition website. For example, a Yelp classification challenge.

For applications, this type of projects would involve careful data preparation, an appropriate loss function, details of training and cross-validation and good test set evaluations and model comparisons. Don't be afraid to think outside of the box. Some successful examples can be found below:

ConvNets also run in real time on mobile phones and Raspberry Pi's - building an interesting mobile application could be a good project. If you want to go this route you might want to check out PyTorch Mobile, TensorFlow Lite or Caffe2 iOS/Android integration.

You might also gain inspiration by taking a look at some popular computer vision datasets:

Meta Pointer: A large collection organized by CV Datasets.
Yet another Meta pointer
ImageNet: a large-scale image dataset for visual recognition organized by WordNet hierarchy
Vision-language datasets:
- Visual Genome
- Flickr30k
- VQA v2
- ADE20K
- LAION
SA-1B: dataset of a large number of images and segmentation masks to segment objects in those images
COCO: large-scale object detection, segmentation, and captioning dataset
Open Images: a dataset of ~9M images annotated with image-level labels, object bounding boxes, object segmentation masks, visual relationships, and localized narratives
Cityscapes Dataset: This dataset focuses on semantic understanding of urban street scenes, with pixel-level annotations for various object classes such as cars, pedestrians, and roads
DeepFashion: a large-scale clothes dataset containing over 800,000 diverse fashion images annotated with bounding boxes, clothing categories, and attributes
Hugging face datasets: collection of generic datasets available on hugging face
Objaverse: a large-scale 3D asset database
SUN Database: a benchmark for scene recognition and object detection with annotated scene categories and segmented objects
Places Database: a scene-centric database with 205 scene categories and 2.5 millions of labelled images
NYU Depth Dataset v2: a RGB-D dataset of segmented indoor scenes
Microsoft COCO: a new benchmark for image recognition, segmentation and captioning
Flickr100M: 100 million creative commons Flickr images
Labeled Faces in the Wild: a dataset of 13,000 labeled face photographs
Human Pose Dataset: a benchmark for articulated human pose estimation
YouTube Faces DB: a face video dataset for unconstrained face recognition in videos
UCF101: an action recognition data set of realistic action videos with 101 action categories
HMDB-51: a large human motion dataset of 51 action classes
ActivityNet: A large-scale video dataset for human activity understanding
Moments in Time: A dataset of one million 3-second videos

Collaboration

You can work in teams of up to 3 people. We do expect that projects done with 3 people have more impressive writeup and results than projects done with fewer people. For example, to get a sense for the scope and expectations for projects, have a look at project reports from previous years. While we encourage that you work in teams, you may also work alone.

Honor Code

You may consult any papers, books, online references, or publicly available implementations for ideas and code that you may want to incorporate into your strategy or algorithm, so long as you clearly cite your sources in your code and your writeup. However, under no circumstances may you look at another group’s code or incorporate their code into your project.

If you are combining your course project with a project from another class, you must obtain permission from the instructor of the other class. You DO NOT need to get prior approval from the CS231n staff. However, in your Project Proposal, Milestone, and Final Report, you must clearly specify the UNIQUE portion of the project that is being counted for CS231n. You must prepare separate reports for each course, and submit your final report for the other course as well. Remember, it is an honor code violation to use the same final report PDF for multiple classes.

Late Policy

See the late policy on the home page.

Project Proposal

The project proposal should be one paragraph (200-400 words). Your project proposal should describe:

What is the problem that you will be investigating? Why is it interesting?
What reading will you examine to provide context and background?
What data will you use? If you are collecting new data, how will you do it?
What method or algorithm are you proposing? If there are existing implementations, will you use them and how? How do you plan to improve or modify such implementations? You don't have to have an exact answer at this point, but you should have a general sense of how you will approach the problem you are working on.
How will you evaluate your results? Qualitatively, what kind of results do you expect (e.g. plots or figures)? Quantitatively, what kind of analysis will you use to evaluate and/or compare your results (e.g. what performance metrics or statistical tests)?
If you are combining this project with another course/research project, what is the unique portion of the project that is counted towards this class?

Submission: Please submit your proposal as a PDF on Gradescope. Only one person on your team should submit. Please have this person add the rest of your team as collaborators as a "Group Submission".

Project Milestone

Fine-grained requirements are listed on Ed. Your project milestone report should be between 2 - 3 pages using the provided template. The following is a suggested structure for your report:

Title, Author(s)
Introduction: this section introduces your problem, and the overall plan for approaching your problem
Problem statement: Describe your problem precisely specifying the dataset to be used, expected results and evaluation
Technical Approach: Describe the methods you intend to apply to solve the given problem
Intermediate/Preliminary Results: State and evaluate your results upto the milestone

Submission: Please submit your milestone as a PDF on Gradescope. Only one person on your team should submit. Please have this person add the rest of your team as collaborators as a "Group Submission".

Final Report

Your final write-up is required to be between 6 - 8 pages using the provided template, structured like a paper from a computer vision conference (CVPR, ECCV, ICCV, etc.). Please use this template so we can fairly judge all student projects without worrying about altered font sizes, margins, etc. After the class, we will post all the final reports online so that you can read about each others' work. If you do not want your writeup to be posted online, then please let us know via the project registration form.

The following is a suggested structure for your report, as well as the rubric that we will follow when evaluating reports. You don't necessarily have to organize your report using these sections in this order, but that would likely be a good starting point for most projects.
Refer to Ed for more fine-grained details and explanations of each separate section.

Title, Author(s)
Abstract: Briefly describe your problem, approach, and key results. Should be no more than 300 words.
Introduction (10%): Describe the problem you are working on, why it's important, and an overview of your results
Related Work (10%): Discuss published work that relates to your project. How is your approach similar or different from others?
Data (10%): Describe the data you are working with for your project. What type of data is it? Where did it come from? How much data are you working with? Did you have to do any preprocessing, filtering, or other special treatment to use this data in your project?
Methods (30%): Discuss your approach for solving the problems that you set up in the introduction. Why is your approach the right thing to do? Did you consider alternative approaches? You should demonstrate that you have applied ideas and skills built up during the quarter to tackling your problem of choice. It may be helpful to include figures, diagrams, or tables to describe your method or compare it with other methods.
Experiments (30%): Discuss the experiments that you performed to demonstrate that your approach solves the problem. The exact experiments will vary depending on the project, but you might compare with previously published methods, perform an ablation study to determine the impact of various components of your system, experiment with different hyperparameters or architectural choices, use visualization techniques to gain insight into how your model works, discuss common failure modes of your model, etc. You should include graphs, tables, or other figures to illustrate your experimental results.
Conclusion (5%) Summarize your key results - what have you learned? Suggest ideas for future extensions or new applications of your ideas.
Writing / Formatting (5%) Is your paper clearly written and nicely formatted?
Supplementary Material, not counted toward your 6-8 page limit and submitted as a separate file. Your supplementary material might include:
- Source code (if your project proposed an algorithm, or code that is relevant and important for your project.).
- Cool videos, interactive visualizations, demos, etc.
Examples of things to not put in your supplementary material:
- The entire PyTorch/TensorFlow Github source code.
- Any code that is larger than 10 MB.
- Model checkpoints.
- A computer virus.

Submission: You will submit your final report as a PDF and your supplementary material as a separate PDF or ZIP file. We will provide detailed submission instructions as the deadline nears.

Additional Submission Requirements: We will also ask you do do the following when you submit your project report:

Your report PDF should list all authors who have contributed to your work; enough to warrant a co-authorship position. This includes people not enrolled in CS 231N such as faculty/advisors if they sponsored your work with funding or data, significant mentors (e.g., PhD students or postdocs who coded with you, collected data with you, or helped draft your model on a whiteboard). All authors should be listed directly underneath the title on your PDF. Include a footnote on the first page indicating which authors are not enrolled in CS 231N. All co-authors should have their institutional/organizational affiliation specified below the title.

If you have non-231N contributors, you will be asked to describe the following:
Specify the involvement of non-CS 231N contributors (discussion, writing code, writing paper, etc). For an example, please see the author contributions for AlphaGo (Nature, 2016).
Specify whether the project has been submitted to a peer-reviewed conference or journal. Include the full name and acronym of the conference (if applicable). For example: Neural Information Processing Systems (NIPS). This only applies if you have already submitted your paper/manuscript and it is under review as of the report deadline.

Any code that was used as a base for projects must be referenced and cited in the body of the paper. This includes CS 231N assignment code, finetuning example code, open-source, or Github implementations. You can use a footnote or full reference/bibliography entry.
If you are using this project for multiple classes, submit the other class PDF as well. Remember, it is an honor code violation to use the same final report PDF for multiple classes.

In summary, include all contributing authors in your PDF; include detailed non-231N co-author information; tell us if you submitted to a conference, cite any code you used, and submit your dual-project report (e.g., CS 230, CS 231A, CS 234).

Poster Session

We will hold a poster session in which you will present the results of your projects.

Date: 06/11
Time: Session A: 12:00–1:30 PM; Session B: 2:00–3:30 PM
Location: Burnham Pavilion
Who: Student groups must present in-person at the poster session, unless approved by course staff beforehand to present online. Stanford students, faculty, and guests from industry are welcome!

Students: We will provide foam poster boards and easels.The foam boards we will provide have the size of 30x40 inches, so please print your poster <= than this size but >= 20x30 inches. Our recommended size is 24x36 inches. You may print your poster in landscape or portrait orientation.

Frequently Asked Questions

Where can I print my poster? Several options are listed below. These are just examples; they are not the only options. Please verify the turnaround time yourself and plan ahead since the printing services may become occupied as the day of the event approaches.
- Lathrop Library’s Tech Desk: Approximately 3-day turnaround.
- FedEx: Approximately 2-day(?) turnaround.
- Walgreens: Approximately same-day pickup.
- Biotech Productions: Approximately same-day delivery.
- Staples: Approximately same-day pickup.
Can I print my poster on 8.5x11 inch pieces of paper and tape them together? Yes, but we encourage you to print out one full poster. If you do print sections and tape them together, make sure that all the content is still legible and fits on a 30x40 foam board.

CS231n: Deep Learning for Computer Vision

Stanford - Spring 2025

Final Project