TY - GEN
T1 - EncoderMI
T2 - 27th ACM Annual Conference on Computer and Communication Security, CCS 2021
AU - Liu, Hongbin
AU - Jia, Jinyuan
AU - Qu, Wenjie
AU - Gong, Neil Zhenqiang
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/11/12
Y1 - 2021/11/12
N2 - Given a set of unlabeled images or (image, text) pairs, contrastive learning aims to pre-train an image encoder that can be used as a feature extractor for many downstream tasks. In this work, we propose EncoderMI, the first membership inference method against image encoders pre-trained by contrastive learning. In particular, given an input and a black-box access to an image encoder, EncoderMI aims to infer whether the input is in the training dataset of the image encoder. EncoderMI can be used 1) by a data owner to audit whether its (public) data was used to pre-train an image encoder without its authorization or 2) by an attacker to compromise privacy of the training data when it is private/sensitive. Our EncoderMI exploits the overfitting of the image encoder towards its training data. In particular, an overfitted image encoder is more likely to output more (or less) similar feature vectors for two augmented versions of an input in (or not in) its training dataset. We evaluate EncoderMI on image encoders pre-trained on multiple datasets by ourselves as well as the Contrastive Language-Image Pre-training (CLIP) image encoder, which is pre-trained on 400 million (image, text) pairs collected from the Internet and released by OpenAI. Our results show that EncoderMI can achieve high accuracy, precision, and recall. We also explore a countermeasure against EncoderMI via preventing overfitting through early stopping. Our results show that it achieves trade-offs between accuracy of EncoderMI and utility of the image encoder, i.e., it can reduce the accuracy of EncoderMI, but it also incurs classification accuracy loss of the downstream classifiers built based on the image encoder.
AB - Given a set of unlabeled images or (image, text) pairs, contrastive learning aims to pre-train an image encoder that can be used as a feature extractor for many downstream tasks. In this work, we propose EncoderMI, the first membership inference method against image encoders pre-trained by contrastive learning. In particular, given an input and a black-box access to an image encoder, EncoderMI aims to infer whether the input is in the training dataset of the image encoder. EncoderMI can be used 1) by a data owner to audit whether its (public) data was used to pre-train an image encoder without its authorization or 2) by an attacker to compromise privacy of the training data when it is private/sensitive. Our EncoderMI exploits the overfitting of the image encoder towards its training data. In particular, an overfitted image encoder is more likely to output more (or less) similar feature vectors for two augmented versions of an input in (or not in) its training dataset. We evaluate EncoderMI on image encoders pre-trained on multiple datasets by ourselves as well as the Contrastive Language-Image Pre-training (CLIP) image encoder, which is pre-trained on 400 million (image, text) pairs collected from the Internet and released by OpenAI. Our results show that EncoderMI can achieve high accuracy, precision, and recall. We also explore a countermeasure against EncoderMI via preventing overfitting through early stopping. Our results show that it achieves trade-offs between accuracy of EncoderMI and utility of the image encoder, i.e., it can reduce the accuracy of EncoderMI, but it also incurs classification accuracy loss of the downstream classifiers built based on the image encoder.
UR - http://www.scopus.com/inward/record.url?scp=85119159254&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85119159254&partnerID=8YFLogxK
U2 - 10.1145/3460120.3484749
DO - 10.1145/3460120.3484749
M3 - Conference contribution
AN - SCOPUS:85119159254
T3 - Proceedings of the ACM Conference on Computer and Communications Security
SP - 2081
EP - 2095
BT - CCS 2021 - Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security
PB - Association for Computing Machinery
Y2 - 15 November 2021 through 19 November 2021
ER -