[1] 孙强,王姝玉.结合时间注意力机制和单模态标签自动生成策略的自监督多模态情感识别[J].电子与信息学报, 2024(002):046.
Sun Qiang, Wang Shuyu. Self-Supervised multimodal sentiment recognition combining temporal attention mechanism and single-modality label automatic generation strategy[J]. Journal of Electronics and Information Technology, 2024(002): 046.
[2] Jiang Y, Li W, Hossain M S, et al. A snapshot research and implementation of multimodal information fusion for data-driven emotion recognition[J]. Information Fusion, 2020, 53: 209-221.
[3] Tsai Y H H, Bai S, Liang P P, et al. Multimodal transformer for unaligned multimodal language sequences[C]//Proceedings of the conference. Association for computational linguistics. Meeting. 2019, 2019: 6558.
[4] Li M, Yang D, Lei Y, et al. A unified self-distillation framework for multimodal sentiment analysis with uncertain missing modalities[C]//Proceedings of the AAAI conference on artificial intelligence. 2024, 38(9): 10074-10082.
[5] 殷兵,凌震华,林垠,等.兼容缺失模态推理的情感识别方法[J/OL].计算机应用,1-10[2025-05-19].http://kns.cnki.net/kcms/detail/51.1307.TP.20241213.1133.004.html.
Yin Bing, Ling Zhenhua, Lin Yin, et al. A sentiment recognition method compatible with missing modal reasoning [J/OL]. Journal of Computer Applications, 1-10 [2025-05-19]. http://kns.cnki.net/kcms/detail/51.1307.TP.20241213.1133.004.html.
[6] 任楚岚,于振坤,关超,等.基于自适应融合技术的多模态实体对齐模型[J].计算机应用研究,2025,42(01):100-105.DOI:10.19734/j.issn.1001-3695.2024.05.0187.
Ren Chulan, Yu Zhenkun, Guan Chao, et al. A multimodal entity alignment model based on adaptive fusion technology [J]. Application Research of Computers, 2025, 42(01): 100-105. DOI: 10.19734/j.issn.1001-3695.2024.05.0187.
[7] Hazarika D, Zimmermann R, Poria S. Misa: Modality-invariant and-specific representations for multimodal sentiment analysis[C]//Proceedings of the 28th ACM international conference on multimedia. 2020: 1122-1131.
[8] Huang J, Zhou J, Tang Z, et al. TMBL: Transformer-based multimodal binding learning model for multimodal sentiment analysis[J]. Knowledge-Based Systems, 2024, 285: 111346.
[9] Li Z, Zhou Y, Zhang W, et al. AMOA: Global acoustic feature enhanced modal-order-aware network for multimodal sentiment analysis[C]//Proceedings of the 29th International Conference on Computational Linguistics. 2022: 7136-7146.
[10] Rahman W, Hasan M K, Lee S, et al. Integrating multimodal information in large pretrained transformers[C]//Proceedings of the conference. Association for computational linguistics. Meeting. 2020, 2020: 2359.
[11] Shi T, Feng W, Shang F, et al. Deep correlated prompting for visual recognition with missing modalities[J]. Advances in Neural Information Processing Systems, 2024, 37: 67446-67466.
[12] Wang H, Chen Y, Ma C, et al. Multi-modal learning with missing modality via shared-specific feature modelling[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 15878-15887.
[13] ‘Ma M, Ren J, Zhao L, et al. Smil: Multimodal learning with severely missing modality[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2021, 35(3): 2302-2310.
[14] 王嫄,邓振宇,王佳鑫,等.基于两阶段缺失模态恢复的多模态情感分析方法[J].天津科技大学学报,2025,40(01):57-63+80.DOI:10.13364/j.issn.1672-6510.20230188.
Wang Yuan, Deng Zhenyu, Wang Jiaxin, et al. A multimodal sentiment analysis method based on two-stage missing modal recovery [J]. Journal of Tianjin University of Science and Technology, 2025, 40(01): 57-63+80. DOI: 10.13364/j.issn.1672-6510.20230188.
[15] Zhao J, Li R, Jin Q. Missing modality imagination network for emotion recognition with uncertain missing modalities[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021: 2608-2618.
[16] Heinzerling B, Inui K. Language models as knowledge bases: On entity representations, storage capacity, and paraphrased queries[J]. arXiv preprint arXiv:2008.09036, 2020.
[17] Khattak M U, Rasheed H, Maaz M, et al. Maple: Multi-modal prompt learning[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023: 19113-19122.
[18] Tsimpoukelli M, Menick J L, Cabi S, et al. Multimodal few-shot learning with frozen language models[J]. Advances in Neural Information Processing Systems, 2021, 34: 200-212.
[19] Lee Y L, Tsai Y H, Chiu W C, et al. Multimodal prompting with missing modalities for visual recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 14943-14952.
[20] Zadeh A, Chen M, Poria S, et al. Tensor fusion network for multimodal sentiment analysis[J]. arXiv preprint arXiv:1707.07250, 2017.
[21] Zadeh A, Liang P P, Mazumder N, et al. Memory fusion network for multi-view sequential learning[C]//Proceedings of the AAAI conference on artificial intelligence. 2018, 32(1).
[22] Sun H, Wang H, Liu J, et al. CubeMLP: An MLP-based model for multimodal sentiment analysis and depression estimation[C]//Proceedings of the 30th ACM international conference on multimedia. 2022: 3722-3729.
[23] Williams J, Kleinegesse S, Comanescu R, et al. Recognizing emotions in video using multimodal DNN feature fusion[C]//Proceedings of Grand Challenge and Workshop on Human Multimodal Language (Challenge-HML). 2018: 11-19.
[24] Liu Z, Shen Y, Lakshminarasimhan V B, et al. Efficient low-rank multimodal fusion with modality-specific factors[J]. arXiv preprint arXiv:1806.00064, 2018.
[25] Zadeh A A B, Liang P P, Poria S, et al. Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018: 2236-2246.
[26] Tsai Y H H, Liang P P, Zadeh A, et al. Learning factorized multimodal representations[J]. arXiv preprint arXiv:1806.06176, 2018.
[27] Yu W, Xu H, Yuan Z, et al. Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis[C]//Proceedings of the AAAI conference on artificial intelligence. 2021, 35(12): 10790-10797.
[28] Mai S, Zeng Y, Zheng S, et al. Hybrid contrastive learning of tri-modal representation for multimodal sentiment analysis[J]. IEEE Transactions on Affective Computing, 2022, 14(3): 2276-2289.
[29] 王楠,王淇,欧阳丹彤.基于知识蒸馏与动态调整机制的多模态情感分析模型[J/OL].计算机学报,1-21[2025-05-19].http://kns.cnki.net/kcms/detail/11.1826.TP.20250402.1820.003.html.
Wang Nan, Wang Qi, Ouyang Dantong, et al. A multimodal sentiment analysis model based on knowledge distillation and dynamic adjustment mechanism [J/OL]. Chinese Journal of Computers, 1-21 [2025-05-19].http://kns.cnki.net/kcms/detail/11.1826.TP.20250402.1820.003.html.
[30] Liu S, Luo Z, Fu W. Fcdnet: Fuzzy Cognition-based Dynamic Fusion Network for Multimoda Sentiment Analysis[J]. IEEE Transactions on Fuzzy Systems, 2025, 33(1): 3-14.
[31] Wang D, Guo X, Tian Y, et al. TETFN: A text enhanced transformer fusion network for multimodal sentiment analysis[J]. Pattern Recognition, 2023, 136: 109259.
[32] Han W, Chen H, Poria S. Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis[J]. arXiv preprint arXiv:2109.00412, 2021.
[33] Zhang H, Wang Y, Yin G, et al. Learning language-guided adaptive hyper-modality representation for multimodal sentiment analysis[J]. arXiv preprint arXiv:2310.05804, 2023.
[34] Feng X, Lin Y, He L, et al. Knowledge-Guided Dynamic Modality Attention Fusion Framework for Multimodal Sentiment Analysis[J]. arXiv preprint arXiv:2410.04491, 2024
[35] Degottex G, Kane J, Drugman T, et al. COVAREP—A collaborative voice analysis repository for speech technologies[C]//2014 ieee international conference on acoustics, speech and signal processing (icassp). IEEE, 2014: 960-964.
[36] Ekman P , Rosenberg E L .What the face reveals: Basic and applied studies of spontaneous expression using the facial action coding system (FACS), 2nd ed.[J]. 2005, 10.1093/acprof:oso/9780195179644.001.0001:21-38.DOI:10.1093/acprof:oso/9780195179644.003.0002.
[37] Devlin J Chang M W, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers). 2019: 4171-4186.
[38] Zadeh A, Zellers R, Pincus E, et al. Mosi: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos[J]. arXiv preprint arXiv:1606.06259, 2016.
|