TY - GEN
T1 - PatchRefineNet
T2 - 2024 IEEE Winter Conference on Applications of Computer Vision, WACV 2024
AU - Nagendra, Savinay
AU - Kifer, Daniel
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024/1/3
Y1 - 2024/1/3
N2 - The purpose of binary segmentation models is to determine which pixels belong to an object of interest (e.g., which pixels in an image are part of roads). The models assign a logit score (i.e., probability) to each pixel and these are converted into predictions by thresholding (i.e., each pixel with logit score ≥ τ is predicted to be part of a road). However, a common phenomenon in current and former state-of-the-art segmentation models is spatial bias - in some patches, the logit scores are consistently biased upwards and in others they are consistently biased downwards. These biases cause false positives and false negatives in the final predictions. In this paper, we propose PatchRefineNet (PRN), a small network that sits on top of a base segmentation model and learns to correct its patch-specific biases. Across a wide variety of base models, PRN consistently helps them improve mIoU by 2-3%. One of the key ideas behind PRN is the addition of a novel supervision signal during training. Given the logit scores produced by the base segmentation model, each pixel is given a pseudo-label that is obtained by optimally thresholding the logit scores in each image patch. Incorporating these pseudo-labels into the loss function of PRN helps correct systematic biases and reduce false positives/negatives. Although we mainly focus on binary segmentation, we also show how PRN can be extended to saliency detection and few-shot segmentation. We also discuss how the ideas can be extended to multiclass segmentation. Source code is available at https://github.com/savinay95n/PatchRefineNet.
AB - The purpose of binary segmentation models is to determine which pixels belong to an object of interest (e.g., which pixels in an image are part of roads). The models assign a logit score (i.e., probability) to each pixel and these are converted into predictions by thresholding (i.e., each pixel with logit score ≥ τ is predicted to be part of a road). However, a common phenomenon in current and former state-of-the-art segmentation models is spatial bias - in some patches, the logit scores are consistently biased upwards and in others they are consistently biased downwards. These biases cause false positives and false negatives in the final predictions. In this paper, we propose PatchRefineNet (PRN), a small network that sits on top of a base segmentation model and learns to correct its patch-specific biases. Across a wide variety of base models, PRN consistently helps them improve mIoU by 2-3%. One of the key ideas behind PRN is the addition of a novel supervision signal during training. Given the logit scores produced by the base segmentation model, each pixel is given a pseudo-label that is obtained by optimally thresholding the logit scores in each image patch. Incorporating these pseudo-labels into the loss function of PRN helps correct systematic biases and reduce false positives/negatives. Although we mainly focus on binary segmentation, we also show how PRN can be extended to saliency detection and few-shot segmentation. We also discuss how the ideas can be extended to multiclass segmentation. Source code is available at https://github.com/savinay95n/PatchRefineNet.
UR - http://www.scopus.com/inward/record.url?scp=85189310433&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85189310433&partnerID=8YFLogxK
U2 - 10.1109/WACV57701.2024.00139
DO - 10.1109/WACV57701.2024.00139
M3 - Conference contribution
AN - SCOPUS:85189310433
T3 - Proceedings - 2024 IEEE Winter Conference on Applications of Computer Vision, WACV 2024
SP - 1350
EP - 1361
BT - Proceedings - 2024 IEEE Winter Conference on Applications of Computer Vision, WACV 2024
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 4 January 2024 through 8 January 2024
ER -