We present ablations of the saliency baselines on PASCAL-Co for four randomly chosen sets on several scenarios (see Tab. 3): DINO ResNet, Supervised Resnet, DINO ViT and Supervised ViT. Our method best manages to localize the common objects in the images.
Image | ||||||
---|---|---|---|---|---|---|
DINO Resnet | ||||||
Sup. Resnet | ||||||
Sup. ViT | ||||||
DINO ViT | ||||||
Ours |
Image | ||||||
---|---|---|---|---|---|---|
DINO Resnet | ||||||
Sup. Resnet | ||||||
Sup. ViT | ||||||
DINO ViT | ||||||
Ours |
Image | ||||||
---|---|---|---|---|---|---|
DINO Resnet | ||||||
Sup. Resnet | ||||||
Sup. ViT | ||||||
DINO ViT | ||||||
Ours |
Image | ||||||
---|---|---|---|---|---|---|
DINO Resnet | ||||||
Sup. Resnet | ||||||
Sup. ViT | ||||||
DINO ViT | ||||||
Ours |