TY - GEN
T1 - Learning Emotion Representations from Verbal and Nonverbal Communication
AU - Zhang, Sitao
AU - Pan, Yimu
AU - Wang, James Z.
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Emotion understanding is an essential but highly challenging component of artificial general intelligence. The absence of extensive annotated datasets has significantly impeded advancements in this field. We present Emotion-CLIp, the first pre-training paradigm to extract visual emotion representations from verbal and nonverbal communication using only uncurated data. Compared to numerical labels or descriptions used in previous methods, communication naturally contains emotion information. Furthermore, acquiring emotion representations from communication is more congruent with the human learning process. We guide EmotionCLIP to attend to nonverbal emotion cues through subject-aware context encoding and verbal emotion cues using sentiment-guided contrastive learning. Extensive experiments validates the effectiveness and transferability of Emotion Clip. Using merely linear-probe evaluation protocol, EmotionCLIP outperforms the state-of-the-art supervised visual emotion recognition methods and rivals many multimodal approaches across various benchmarks. We anticipate that the advent of Emotion Clip will address the prevailing issue of data scarcity in emotion understanding, thereby fostering progress in related domains. The code and pretrained models are available at https://github.com/Xeaver/EmotionCLIP.
AB - Emotion understanding is an essential but highly challenging component of artificial general intelligence. The absence of extensive annotated datasets has significantly impeded advancements in this field. We present Emotion-CLIp, the first pre-training paradigm to extract visual emotion representations from verbal and nonverbal communication using only uncurated data. Compared to numerical labels or descriptions used in previous methods, communication naturally contains emotion information. Furthermore, acquiring emotion representations from communication is more congruent with the human learning process. We guide EmotionCLIP to attend to nonverbal emotion cues through subject-aware context encoding and verbal emotion cues using sentiment-guided contrastive learning. Extensive experiments validates the effectiveness and transferability of Emotion Clip. Using merely linear-probe evaluation protocol, EmotionCLIP outperforms the state-of-the-art supervised visual emotion recognition methods and rivals many multimodal approaches across various benchmarks. We anticipate that the advent of Emotion Clip will address the prevailing issue of data scarcity in emotion understanding, thereby fostering progress in related domains. The code and pretrained models are available at https://github.com/Xeaver/EmotionCLIP.
UR - http://www.scopus.com/inward/record.url?scp=85166305162&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85166305162&partnerID=8YFLogxK
U2 - 10.1109/CVPR52729.2023.01821
DO - 10.1109/CVPR52729.2023.01821
M3 - Conference contribution
AN - SCOPUS:85166305162
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 18993
EP - 19004
BT - Proceedings - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023
PB - IEEE Computer Society
T2 - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023
Y2 - 18 June 2023 through 22 June 2023
ER -