What's the Point

What's the Point: Semantic Segmentation with Point Supervision

Amy Bearman¹, Olga Russakovsky^1,2, Vitto Ferrari³, Li Fei-Fei¹

¹Stanford University, ²Carnegie Mellon University, ³University of Edinburgh

Models

Unlimited Budget

Img: Model supervised only with image-level labels, with the classes term and the constraint term. Row 1 of Table 1 in the paper.

Img + Obj: Model supervised only with image-level labels, with the classes term, the constraint term, and the objectness term. Row 2 of Table 1 in the paper.

1Point + Img: Model supervised with point-level supervision, with the classes term and the constraint term. Row 3 of Table 1 in the paper.

1Point + Img + Obj: Model supervised with point-level supervision, with the classes term, the constraint term, and the objectness term. Row 4 of Table 1 in the paper.

AllPoints: Model supervised with point-level supervision (1 point per each object instance), with the classes term, the constraint term, and the objectness term. Row 5 of Table 1 in the paper.

AllPoints (weighted): Model supervised with point-level supervision (1 point per each object instance), with the classes term, the constraint term, and the objectness term. The points are weighted in the order in which they were annotated. Row 6 of Table 1 in the paper.

1Point (3 annotators): Model supervised with point-level supervision (1 point per each object instance), with the classes term, the constraint term, and the objectness term. 3 annotators were used, and all annotations are retained. Row 7 of Table 1 in the paper.

1Point (random points): Model supervised with point-level supervision (1 point per object class) obtained randomly from the ground truth segmentation, with the classes term, the constraint term, and the objectness term. Row 8 of Table 1 in the paper.

Full supervision: Fully supervised model, with the classes term and the constraint term. Row 10 of Table 1 in the paper.

Hybrid supervision: Hybrid supervision (100 fully-supervised images and the rest point supervised). Row 11 of Table 1 in the paper.

1Squiggle: Model supervised with squiggle-level supervision, with the classes term, the constraint term, and the objectness term. Row 12 of Table 1 in the paper.

Fixed Budget

Full supervision (fixed budget):Model supervised with full supervision on a fixed annotation budget. Row 1 of Table 3 in the paper.

Image-level supervision (fixed budget): Model supervised with image-level labels on a fixed annotation budget. Row 2 of Table 3 in the paper.

Squiggle-level supervision (fixed budget): Model supervised with squiggles on a fixed annotation budget. Row 3 of Table 3 in the paper.

Point-level supervision (fixed budget): Model supervised with points (1 point per object class) on a fixed annotation budget. Row 4 of Table 3 in the paper.

Initialization Models

VGG16-CONV-PASCAL: Initial network for all our fully convolutional networks trained on PASCAL VOC 2012. All classifier weights are zero, except for weights learned by the original VGG network for classes common to both PASCAL and ILSVRC.

VGG16-CONV: Convolutional VGG 16-layer network based on this model, with all fully connected layers converted to convolutional layers. It does a 1,000-way pixel-wise softmax classification for the 1,000 ILSVRC classes.