Scene-Classification

Stanford Vision Lab UIUC Beckman Institute

Home

Research

Publications

People

Resources

Links

Research

Only the most recent projects are shown here. Please refer to publications for a complete list of research.

fMRI Pattern Recognition Toolbox

We have developed an fMRI data processing toolbox that allows users to import their fMRI data and experiment information to perform decoding and other analyses. It allows users to specify analysis parameters (for example, comparing different experimental conditions) and then automatically partitions the data and uses linear SVMs for decoding (using the Princeton MVPA toolbox and LIBSVM in the background). The toolbox is currently being used within the lab and will be released in the near future.

Neural Correlates of Object Classes

We've started work on a new project which seeks to investigate how object classes are represented in the brain and the relationships between the representations of different objects.

Object Recognition in Context

Contextual violations have long been known to cause deficits in object detection and recognition (Biederman, et al. 1982). Incongruent objects attract earlier and longer eye fixations (Underwood and Foulsham 2006) and evoke stronger ERPs (Mudrik, Lamy, and Deuoell 2010), suggesting that the brain rapidly marshals additional resources to aid in processing unexpected objects. Despite a wealth of psychophysical results on context, cortical models of contextual facilitation are still speculative (Bar 2004). In this study, we use MVPA methods with fMRI data to explore the effects of context on neural representations. We found that a classifier trained using lateral occipital complex (LOC) responses to isolated boats and cars achieved above-chance decoding accuracy on the same objects placed in scenes. We then presented these objects in scenes that violated a semantic relationship (e.g. a boat sitting on a city street) and/or a geometric relationship (e.g. a car flying over a city street). Decoding performance remained high when only a semantic relationship was violated, but decreased to chance when a geometric relationship was violated. Surprisingly, the same pattern of results appears in the parahippocampal place area (PPA), demonstrating that some information about foreground object identity is present in this region. We then examined the relationship between LOC and PPA. Functional connectivity between these regions increased under semantic violation, providing evidence for increased recurrent processing when object and context information is in conflict. Under all conditions, we observed that LOC exhibited significantly stronger connectivity to posterior PPA than anterior PPA, raising the possibility of functional subdivisions within PPA.

Translation invariance of natural scene categories

We have investigated how invariance to translation arises in the human visual system (why we perceive object identity with ease regardless of where the object is in our visual field). Using fMRI analysis, we investigated how the neural representation of natural scene categories changes when stimuli are translated to various locations in the subjects' visual field. More specifically, we used MVPA to quantify this change throughout the ventral visual stream: from early visual cortex to higher level areas such as the Parahippocampal Place Area (PPA).

Categorization of good and bad examples of natural scene categories

Despite the vast range of images that we might categorize as an example of a particular natural scene category (e.g., beach), human observers are able to quickly and efficiently categorize even briefly presented images of these scenes. However, within the range of images that we might categorize as a “beach”, for example, some will be more representative of that category than others. We asked whether participants' ability to categorize briefly presented scenes differed depending on whether the images were good or bad examples of the scene. 4000 images from six categories (beaches, city streets, forests, highways, mountains and offices) were first rated by naïve subjects as good or bad examples of those categories. On the basis of these ratings, 50 good and 50 bad images were chosen from each category to be used in a categorization experiment, in which a separate set of participants were asked to categorize the scenes by pressing one of six buttons. The images in this experiment were presented very briefly (<100ms), followed by a perceptual mask. Exemplars of the 6 categories were intermixed in blocks of different quality. As predicted, participants categorized good examples of a category significantly faster and more accurately than bad examples, suggesting that part of what makes an image a good example of a category can be gleaned in very brief presentations. To further understand the neural basis of this categorization ability, in a follow-up fMRI experiment, we will ask whether a statistical pattern recognition algorithm trained to discriminate the distributed patterns of neural activity associated with our six scene categories might also show a decrement in classification accuracy for good versus bad examples of the category. By comparing classification accuracy within different brain regions (e.g., V1, PPA) we can ask which brain regions are most sensitive to this good/bad distinction.

The neural representation of natural scene categories

Humans are extremely efficient at perceiving natural scenes and understanding their contents. Little is known,however about the neural mechanisms of scene categorization.
In our fMRI study, we monitored the neural activity of human subjects while they viewed blocks of images of six natural scene categories. We concentrated on two regions of interest (ROIs) known to be active for natural scenes (PPA and RSC) as well as three other ROIs (V1, FFA, and LOC). For each ROI we constructed a scene decoder by training a support vector machine (SVM) classification algorithm on assigning the correct category
labels to the fMRI data. The decoder generated predictions for the scene categories of novel images from the fMRI activity recorded in separate test runs while the subjects viewed these images. We found high decoding accuracy in PPA and RSC, lower decoding accuracy in FFA and LOC, and chance-level accuracy in V1. In a behavioral experiment designed to compare decoder performance with that of human subjects categorizing scenes, the same scenes were presented briefly, followed by a perceptual mask. Asked to indicate the presented category by pressing one of six buttons, subjects performed with high (but not perfect) accuracy. We compared the error patterns made by the decoder with the errors made by subjects by correlating the frequencies of specific mistakes in the two experiments. We found high correlation for the decoders in PPA and RSC, but not in FFA,LOC, or V1.
To further investigate the relationship between decoder performance and behavior, we added blocks of inverted images (i.e., mirrored across the horizontal axis) to both the fMRI and behavioral experiments. Predictably, subjects were significantly less accurate at identifying the natural scene category for inverted scenes than upright scenes. Interestingly, this was also true for the decoder in PPA and RSC as well as in FFA, but not LOC or V1.
In short, across all analyses and experiments, PPA and RSC showed the highest decoding accuracy and best correlation with human performance of all the ROIs investigated. Taken together, these results suggest that categories of natural scenes have a neural representation in PPA and RSC, and that the representation in these two areas is affected by errors and scene inversion in a similar way as categorization performance by human
subjects.

Searchlight analysis reveals brain areas involved in scene categorization

Our ability to categorize natural scenes is essential for visual tasks such as navigation or the recognition of objects in their natural environment. Although different classes of natural scenes often share similar image statistics, human subjects are extremely efficient at categorizing natural scenes. In order to map out the brain regions involved in scene categorization, we use multivariate pattern recognition to analyze the fMRI activation within a small spherical region (the searchlight, Kriegeskorte et al. 2006) that is positioned at every possible location in the brain. From local activity patterns in each searchlight, we attempt to predict the scene category that the subject viewed during the experiment. Such an analysis allows us to generate a spatial map of those brain regions producing the highest classification accuracy. Furthermore, we can generate similar maps of the correlation of the pattern of errors made by the classification algorithm at each searchlight
location with the pattern of errors made by human subjects in an accompanying behavioral experiment. Lastly, we ask which searchlight locations show a decrement in prediction accuracy for up-down inverted images relative to upright images, to reveal brain regions that may participate in the inversion effect that we found in the behavioral experiment. Together, these maps implicate large regions of the ventral visual cortex in the categorization of natural scenes, including area V1, the parahippocampal place area (PPA), retrosplenial cortex (RSC), and lateral occipital complex (LOC), previously shown to be involved in natural scene categorization (Caddigan et al., VSS 2007 & VSS 2008; Walther et al. HBM 2007 & SfN 2008) as well as other intermediate-level visual areas. We further explore the functions of these regions with respect to natural scene categorization and attempt to find their specific contributions to the scene categorization
process.

Finding “good” features for natural scene classification

Humans are adept at determining the base-level category of natural scenes (Tversky &
Hemenway, 1983). What visual features of an image does an observer use in such
categorization tasks? Previous computational studies have established that classification of scenes is possible using power spectral information (i.e., magnitude of spatial frequencies; Oliva & Torralba, 2001) and local texture descriptors (Fei-Fei & Perona, 2005). Here we take a new approach toward identifying possible features that distinguish between categories by comparing good and bad examples of a category. If a particular feature is relevant to human categorization, it should also provide better classification for good than bad examples of that category. Using linear pattern recognition algorithms, we performed multi-way classification on six categories (beaches, city streets, forests, highways, mountains and offices), each comprised of 50 images that were rated by naïve participants as “good” examples of their category, and an additional 50 that were rated as “bad” examples of their category (Torralbo, et al., VSS 2009). We found that several feature sets, including the power spectrum, color histogram, and local surface geometry and texture information (Hoiem, et al, 2005) resulted in average classification rates significantly above chance-level.More importantly, when these classification results were separated into “good” and “bad” examples, all three feature sets showed greater classification accuracies for “good” than “bad” category exemplars . These results suggest that all three feature sets are viable candidate features that humans could use to distinguish among our natural scenes categories.