Summary

We propose a fully automatic framework to detect and extract arbitrary human motion volumes from real-world videos collected from YouTube. Our system is composed of two stages. A person detector is first applied to provide crude information about the possible locations of humans. Then a constrained clustering algorithm groups the detections and rejects false positives based on the appearance similarity and spatio-temporal coherence. In the second stage, we apply a top-down pictorial structure model to complete the extraction of the humans in arbitrary motion. During this procedure, a density propagation technique based on a mixture of Gaussians is employed to propagate temporal information in a principled way. This method reduces greatly the search space for the measurement in the inference stage. We demonstrate the initial success of this framework both quantitatively and qualitatively by using a number of YouTube videos.

System overview

Evaluation

We evaluate the retrieval performance of our system in terms of the precision-recall measures. For each sequence, we have generated ground-truth by manually labeling every human present in each frame with a bounding box. We compare the precision-recall rates at three stages of our system: pedestrian detection only, people detection and clustering, and the full model. For a fixed threshold of the human detector, we obtain the three precision-recall pairs in each row of Table 1. Our full system provides the highest performance in terms of the F- measure. This reflects the fact that our system achieves much higher recall rates by extracting non-upright people beyond the pedestrian detections.

Example Results


demo1

demo2

demo3

demo4

demo5

demo6

demo7

demo8

demo9

demo10

Dataset

The dataset used in our experiments contains 50 sequences with variuos human motion types. You may download the data here:
Dataset (1.5GB) (release date: september 05, 2008)

Resources

  • Code is available under request (Linux only, no Windows version yet) . Please contact us: jniebles -at- princeton -dot- edu
  • Juan Carlos Niebles, Bohyung Han, Andras Ferencz and Li Fei-Fei. Extracting Moving People from Internet Videos. ECCV 2008. PDF Poster