[1] Shi Q, Fan J, Wang Z, et al. Multimodal channel-wise
attention transformer inspired by multisensory integration
mechanisms of the brain[J]. Pattern Recognition, 2022,
130: 108837.
[2] 潘梦竹, 李千目, 邱天. 深度多模态表示学习的研究综
述[J]. 计算机工程与应用, 2023, 59(2): 48-64.
PAN M Z, LI Q M, QIU T. Survey of research on deep
multimodal representation learning [J]. Computer
Engineering and Applications, 2023, 59(2): 48-64. (in
Chinese)
[3] Liu Y, Liu L, Guo Y, et al. Learning visual and textual
representations for multimodal matching and classification
[J]. Pattern Recognition, 2018, 84: 51-67.
[4] 李牧, 杨宇恒, 柯熙政. 基于混合特征提取与跨模态特
征预测融合的情感识别模型[J]. 计算机应用, 2024,
44(01): 86-93.
Li M, YANG Y H, KE X Z. Emotion recognition model
based on hybrid feature extraction and cross-modal
feature prediction fusion [J]. Computer Applications,2024, 44(01): 86-93. (in Chinese)
[5] Zadeh A, Chen M, Poria S, et al. Tensor fusion network for
multimodal sentiment analysis[J]. arXiv preprint arXiv:
1707.07250, 2017.
[6] Liu Z, Shen Y, Lakshminarasimhan V B, et al. Efficient
low-rank multimodal fusion with modality-specific
factors[J]. arXiv preprint arXiv: 1806.00064, 2018.
[7] Zadeh A, Liang P P, Mazumder N, et al. Memory fusion
network for multi-view sequential learning[C]//
Proceedings of the AAAI conference on artificial
intelligence. Washington, DC, USA: AAAI Press, 2018:
5634-5641.
[8] Tsai Y H H, Bai S, Liang P P, et al. Multimodal
transformer for unaligned multimodal language
sequences[C]//Proceedings of the 57th Annual Meeting of
the Association for Computational Linguistics. Florence,
Italy: Association for Computational Linguistics, 2019:
6558-6569.
[9] 徐志京, 高姗. 基于 Transformer-ESIM 注意力机制的多
模态情绪识别[J]. 计算机工程与应用, 2022, 58(10):
132-138.
XU Z J, GAO S. Multimodal emotion recognition based
on Transformer-ESIM attention mechanism [J].
Computer Engineering and Applications, 2022, 58(10):
132-138. (in Chinese)
[10] Rahman W, Hasan M K, Lee S, et al. Integrating
multimodal information in large pretrained transformers[C]
//Proceedings of the 58th Annual Meeting of the
Association for Computational Linguistics. Florence, Italy:
Association for Computational Linguistics, 2020:
2359-2369
[11] Devlin J. Bert: Pre-training of deep bidirectional
transformers for language understanding[J]. arXiv preprint
arXiv: 1810.04805, 2018.
[12] Vaswani A. Attention is all you need[J]. Advances in
Neural Information Processing Systems. New York, USA:
Curran Associates, Inc, 2017: 6000-6010.
[13] Hazarika D, Zimmermann R, Poria S. Misa:
Modality-invariant and-specific representations for
multimodal sentiment analysis[C]//Proceedings of the 28th
ACM international conference on multimedia. New York,
USA: Association for Computing Machinery, 2020:
1122-1131.
[14] Yu W, Xu H, Yuan Z, et al. Learning modality-specific
representations with self-supervised multi-task learning for
multimodal sentiment analysis[C]//Proceedings of the
AAAI conference on artificial intelligence. Washington,
DC, USA: AAAI Press, 2021: 10790-10797.
[15] Han W, Chen H, Gelbukh A, et al. Bi-bimodal modality
fusion for correlation-controlled multimodal sentiment
analysis[C]//Proceedings of the 2021 international
conference on multimodal interaction. New York, USA:
Association for Computing Machinery, 2021: 6-15.
[16] Pennington J, Socher R, Manning C D. Glove: Global
vectors for word representation[C]//Proceedings of the
2014 conference on empirical methods in natural language
processing (EMNLP). Doha, Qatar: Association for
Computational Linguistics, 2014: 1532-1543.
[17] Medsker L R, Jain L. Recurrent neural networks[J]. Design
and Applications, 2001, 5(2): 64-67.
[18] 孙仁科, 许靖昊, 皇甫志宇, 等. 基于视觉-语言预训练
模型的零样本迁移学习方法综述[J]. 计算机工程, 2024,
50(10): 1-15.
SUN R K, XU J H, HUANGFU Z Y, et al. Survey of
zero-shot transfer learning methods based on
vision-language pre-trained models [J]. Computer
Engineering, 2024, 50(10): 1-15. (in Chinese)
[19] Sun C, Qiu X, Xu Y, et al. How to fine-tune bert for text
classification?[C]//Chinese computational linguistics: 18th
China national conference, CCL 2019. Kunming, China:
Springer International Publishing, 2019: 194-206.
[20] Baevski A, Zhou Y, Mohamed A, et al. wav2vec 2.0: A
framework for self-supervised learning of speech
representations[J]. Advances in neural information
processing systems, 2020, 33: 12449-12460.
[21] Hsu W N, Bolte B, Tsai Y H H, et al. Hubert:
Self-supervised speech representation learning by masked
prediction of hidden units[J]. IEEE/ACM transactions on
audio, speech, and language processing, 2021, 29:
3451-3460.
[22] Yu W, Xu H, Meng F, et al. Ch-sims: A chinese
multimodal sentiment analysis dataset with fine-grained
annotation of modality[C]//Proceedings of the 58th annual
meeting of the association for computational linguistics.
Florence, Italy: Association for Computational Linguistics,
2020: 3718-3727. [23] Graves A, Graves A. Long short-term memory[J].
Supervised sequence labelling with recurrent neural
networks, 2012: 37-45.
[24] Zadeh A, Zellers R, Pincus E, et al. Mosi: multimodal
corpus of sentiment intensity and subjectivity analysis in
online opinion videos[J]. arXiv preprint arXiv:1606.06259,
2016.
[25] Zadeh A A B, Liang P P, Poria S, et al. Multimodal
language analysis in the wild: Cmu-mosei dataset and
interpretable dynamic fusion graph[C]//Proceedings of the
56th Annual Meeting of the Association for Computational
Linguistics. Florence, Italy: Association for Computational
Linguistics, 2018: 2236-2246.
[26] Wang D, Guo X, Tian Y, et al. TETFN: A text enhanced
transformer fusion network for multimodal sentiment
analysis[J]. Pattern Recognition, 2023, 136: 109259.
[27] Zhang H, Wang Y, Yin G, et al. Learning language-guided
adaptive hyper-modality representation for multimodal
sentiment analysis[J]. arXiv preprint arXiv:2310.05804,
2023.
[28] Liu S, Luo Z, Fu W. Fcdnet: Fuzzy Cognition-based
Dynamic Fusion Network for Multimodal Sentiment
Analysis[J]. IEEE Transactions on Fuzzy Systems, doi:
10.1109/TFUZZ.2024.3407739.
[29] Li Y, Wang Y, Cui Z. Decoupled multimodal distilling for
emotion recognition[C]//Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition.
Vancouver, BC, Canada: IEEE Computer Society, 2023:
6631-6640.
|