[1] Latif S, Shahid A, Qadir J. Generative emotional AI for speech emotion recognition: The case for synthetic emotional speech augmentation[J]. Applied Acoustics, 2023, 210: 109425. DOI: 10.1016/j.apacoust.2023.109425.
[2] Liu D, Dai W, Zhang H, et al. Brain-machine coupled learning method for facial emotion recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(9): 10703-10717. DOI: 10.1109/TPAMI.2023.3257846.
[3] Lin W C, Busso C. Sequential modeling by leveraging non-uniform distribution of speech emotion[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023, 31: 1087-1099. DOI: 10.1109/TASLP.2023.3244527.
[4] Yang L, Zhong J, Wen T, et al. CCIN-SA: Composite cross modal interaction network with attention enhancement for multimodal sentiment analysis[J]. Information Fusion, 2025: 103230. DOI: 10.1016/j.inffus.2025.103230.
[5] Tao X, Li Q, Ren C, et al. Affinity and class probability-based fuzzy support vector machine for imbalanced data sets[J]. Neural Networks, 2020, 122: 289-307. DOI: 10.1016/j.neunet.2019.10.016.
[6] Li J, Zhang X, Li F, et al. Acoustic-articulatory emotion recognition using multiple features and parameter-optimized cascaded deep learning network[J]. Knowledge-Based Systems, 2024, 284: 111276. DOI:10.1016/j.knosys.2023.111276.
[7] Luna-Jiménez C, Kleinlein R, Griol D, et al. A proposal for multimodal emotion recognition using aural transformers and action units on ravdess dataset[J]. Applied Sciences, 2021, 12(1): 327. DOI: 10.3390/app12010327.
[8] Yu W, Xu H, Yuan Z, et al. Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis[C]//Proceedings of the AAAI conference on artificial intelligence. 2021, 35(12): 10790-10797. DOI: 10.1609/aaai.v35i12.17289.
[9] Tang X, Huang J, Lin Y, et al. Speech emotion recognition via cnn-transformer and multidimensional attention mechanism[J]. Speech Communication, 2025: 103242. DOI: 10.1016/j.specom.2025.103242.
[10] Mou L, Zhao Y, Zhou C, et al. Driver emotion recognition with a hybrid attentional multimodal fusion framework[J]. IEEE Transactions on Affective Computing, 2023, 14(4): 2970-2981. DOI: 10.1109/TAFFC.2023.3250460.
[11] Qin Z, Luo Q, Zang Z, et al. Multimodal GRU with directed pairwise cross-modal attention for sentiment analysis[J]. Scientific Reports, 2025, 15(1): 10112. DOI: 10.1038/s41598-025-93023-3.
[12] Li J, Zhang X, Li F, et al. Acoustic-articulatory emotion recognition using multiple features and parameter-optimized cascaded deep learning network[J]. Knowledge-Based Systems, 2024, 284: 111276. DOI: 10.1016/j.knosys.2023.111276.
[13] Hao X, Li H, Wen Y. Real-time music emotion recognition based on multimodal fusion[J]. Alexandria Engineering Journal, 2025, 116: 586-600. DOI: 10.1016/j.aej.2024.12.060.
[14] Ryumina E, Ryumin D, Axyonov A, et al. Multi-corpus emotion recognition method based on cross-modal gated attention fusion[J]. Pattern Recognition Letters, 2025, 190: 192-200. DOI: 10.1016/j.patrec.2025.02.024.
[15] Zhang Y, Jia A, Wang B, et al. M3GAT: A multi-modal, multi-task interactive graph attention network for conversational sentiment analysis and emotion recognition[J]. ACM Transactions on Information Systems, 2023, 42(1): 1-32. DOI: 10.1145/3593583.
[16] Liu X, He G, Li S, et al. Multi-level feature decomposition and fusion model for video-based multimodal emotion recognition[J]. Engine ering Applications of Artificial Intelligence, 2025, 152: 110744. DOI: 10.1016/j.engappai.2025.110744.
[17] Qi X, Wen Y, Zhang P, et al. MFGCN: Multimodal fusion graph convolutional network for speech emotion recognition[J]. Neurocomputing, 2025, 611: 128646. DOI: 10.1016/j.neucom.2024.128646.
[18] 张学军,王天晨,王泽田.基于多域信息融合的卷积Transformer脑电情感识别模型[J].数据采集与处理,2024,39(06):1543-1552. DOI: 10.16337/j.1004-9037.2024.06.021.
ZHANG X J, WANG T C, WANG Z T. A convolutional transformer model for EEG emotion recognition based on multi-domain information fusion[J]. Journal of Data Acquisition and Processing, 2024, 39(6): 1543-1552. DOI:
10.16337/j.1004-9037.2024.06.021.
[19] Chatterjee S, Ghosh K, Bhattacharjee S, et al. Federated Artificial Resampling for Imbalanced Facial Emotion Recognition[J]. IEEE Transactions on Affective Computing, 2024: 1461-1472. DOI: 10.1109/TAFFC.2024.3516822.
[20] Alhuzali H, Ananiadou S. Improving textual emotion recognition based on intra-and inter-class variations[J]. IEEE Transactions on Affective Computing, 2021, 14(2): 1297-1307. DOI: 10.1109/TAFFC.2021.3104720.
[21] Fan W, Xu X, Liu F, et al. Multimodal speech emotion recognition via dynamic multilevel contrastive loss under local enhancement network[J]. Expert Systems with Applications, 2025: 127669. DOI: 10.1016/j.eswa.2025.127669.
[22] Franceschini R, Fini E, Beyan C, et al. Multimodal emotion recognition with modality-pairwise unsupervised contrastive loss[C]//2022 26th International Conference on Pattern Recognition (ICPR). IEEE, 2022: 2589-2596. DOI: 10.1109/ICPR56361.2022.9956589.
[23] Zhu X, Men J, Yang L, et al. Imbalanced driving scene recognition with class focal loss and data augmentation[J]. International Journal of Machine Learning and Cybernetics, 2022, 13(10): 2957-2975. DOI: 10.1007/s13042-022-01575-x.
[24] 曹荣贺,吴晓龙,冯畅,等.基于Wav2vec2.0与语境情感信息补偿的对话语音情感识别[J].信号处理,2023,39(04):698-707. DOI: 10.16798/j.issn.1003-0530.2023.04.011.
Cao R H, Wu X L, Feng C, et al. Dialogue speech emotion recognition based on Wav2vec2.0 and contextual emotion information compensation[J]. Signal Processing, 2023, 39(04): 698-707. DOI: 10.16798/j.issn.1003- 0530.2023.04.011.
[25] Shou Y, Meng T, Ai W, et al. Adversarial alignment and graph fusion via information bottleneck for multimodal emotion recognition in conversations[J]. Information Fusion, 2024, 112: 102590. DOI: 10.1016/j.inffus.2024.102590.
[26] 王永旗,王雷.基于跨模态增强与时间步门控的多模态情感识别[J].计算机工程,2025,7:1-11. DOI: 10.19678/j.issn.1000-3428.0070508.
WANG Y Q, WANG L. Multimodal emotion recognition based on cross-modal enhancement and time-step gating[J]. Computer Engineering, 2025, 7: 1-11. DOI: 10.19678/j.issn.1000-3428.0070508.
[27] Baltrušaitis T, Robinson P, Morency L P. Openface: an open source facial behavior analysis toolkit[C]//2016 IEEE winter conference on applications of computer vision (WACV). IEEE, 2016: 1-10. DOI: 10.1109/WACV.2016.7477553.
[28] Sadok S, Leglaive S, Girin L, et al. A multimodal dynamical variational autoencoder for audiovisual speech representation learning[J]. Neural Networks, 2024, 172: 106120. DOI: 10.1016/j.neunet.2024.106120.
[29] Sadok S, Leglaive S, Séguier R. A vector quantized masked autoencoder for audiovisual speech emotion recognition[J]. Computer Vision and Image Understanding, 2025, 257: 104362. DOI: 10.1016/j.cviu.2025.104362.
[30] Qi X, Wen Y, Zhang P, et al. MFGCN: Multimodal fusion graph convolutional network for speech emotion recognition[J]. Neurocomputing, 2025, 611: 128646. DOI: 10.1016/j.neucom.2024.128646.
[31] Sun L, Lian Z, Liu B, et al. Hicmae: Hierarchical contrastive masked autoencoder for self-supervised audio-visual emotion recognition[J]. Information Fusion, 2024, 108: 102382. DOI: 10.1016/j.inffus.2024.102382.
[32] Mocanu B, Tapu R, Zaharia T. Multimodal emotion recognition using cross modal audio-video fusion with attention and deep metric learning[J]. Image and Vision Computing, 2023, 133: 104676. DOI: 10.1016/j.imavis.2023.104676.
|