TY - GEN
T1 - BM-NAS
T2 - 36th AAAI Conference on Artificial Intelligence, AAAI 2022
AU - Yin, Yihang
AU - Huang, Siyu
AU - Zhang, Xiang
N1 - Publisher Copyright:
Copyright © 2022, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2022/6/30
Y1 - 2022/6/30
N2 - Deep neural networks (DNNs) have shown superior performances on various multimodal learning problems. However, it often requires huge efforts to adapt DNNs to individual multimodal tasks by manually engineering unimodal features and designing multimodal feature fusion strategies. This paper proposes Bilevel Multimodal Neural Architecture Search (BM-NAS) framework, which makes the architecture of multimodal fusion models fully searchable via a bilevel searching scheme. At the upper level, BM-NAS selects the inter/intra-modal feature pairs from the pretrained unimodal backbones. At the lower level, BM-NAS learns the fusion strategy for each feature pair, which is a combination of predefined primitive operations. The primitive operations are elaborately designed and they can be flexibly combined to accommodate various effective feature fusion modules such as multi-head attention (Transformer) and Attention on Attention (AoA). Experimental results on three multimodal tasks demonstrate the effectiveness and efficiency of the proposed BM-NAS framework. BM-NAS achieves competitive performances with much less search time and fewer model parameters in comparison with the existing generalized multimodal NAS methods. Our code is available at https://github.com/Somedaywilldo/BM-NAS.
AB - Deep neural networks (DNNs) have shown superior performances on various multimodal learning problems. However, it often requires huge efforts to adapt DNNs to individual multimodal tasks by manually engineering unimodal features and designing multimodal feature fusion strategies. This paper proposes Bilevel Multimodal Neural Architecture Search (BM-NAS) framework, which makes the architecture of multimodal fusion models fully searchable via a bilevel searching scheme. At the upper level, BM-NAS selects the inter/intra-modal feature pairs from the pretrained unimodal backbones. At the lower level, BM-NAS learns the fusion strategy for each feature pair, which is a combination of predefined primitive operations. The primitive operations are elaborately designed and they can be flexibly combined to accommodate various effective feature fusion modules such as multi-head attention (Transformer) and Attention on Attention (AoA). Experimental results on three multimodal tasks demonstrate the effectiveness and efficiency of the proposed BM-NAS framework. BM-NAS achieves competitive performances with much less search time and fewer model parameters in comparison with the existing generalized multimodal NAS methods. Our code is available at https://github.com/Somedaywilldo/BM-NAS.
UR - http://www.scopus.com/inward/record.url?scp=85140811558&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85140811558&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85140811558
T3 - Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022
SP - 8901
EP - 8909
BT - AAAI-22 Technical Tracks 8
PB - Association for the Advancement of Artificial Intelligence
Y2 - 22 February 2022 through 1 March 2022
ER -