| 1 |
SHI Q Q , FAN J S , WANG Z R , et al. Multimodal channel-wise attention transformer inspired by multisensory integration mechanisms of the brain. Pattern Recognition, 2022, 130, 108837.
doi: 10.1016/j.patcog.2022.108837
|
| 2 |
潘梦竹, 李千目, 邱天. 深度多模态表示学习的研究综述. 计算机工程与应用, 2023, 59 (2): 48- 64.
|
|
PAN M Z , LI Q M , QIU T . Survey of research on deep multimodal representation learning. Computer Engineering and Applications, 2023, 59 (2): 48- 64.
|
| 3 |
LIU Y , LIU L , GUO Y M , et al. Learning visual and textual representations for multimodal matching and classification. Pattern Recognition, 2018, 84, 51- 67.
|
| 4 |
李牧, 杨宇恒, 柯熙政. 基于混合特征提取与跨模态特征预测融合的情感识别模型. 计算机应用, 2024, 44 (1): 86- 93.
|
|
LI M , YANG Y H , KE X Z . Emotion recognition model based on hybrid-mel gama frequency cross-attention Transformer modal. Journal of Computer Applications, 2024, 44 (1): 86- 93.
|
| 5 |
|
| 6 |
LIU Z, SHEN Y, LAKSHMINARASIMHAN V B, et al. Efficient low-rank multimodal fusion with modality-specific factors[EB/OL]. [2024-09-11]. https://arxiv.org/abs/1806.00064.
|
| 7 |
ZADEH A, LIANG P P, MAZUMDER N, et al. Memory fusion network for multi-view sequential learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2018: 5634-5641.
|
| 8 |
TSAI Y H, BAI S J, LIANG P P, et al. Multimodal Transformer for unaligned multimodal language sequences[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: Association for Computational Linguistics, 2019: 6558-6569.
|
| 9 |
徐志京, 高姗. 基于Transformer-ESIM注意力机制的多模态情绪识别. 计算机工程与应用, 2022, 58 (10): 132- 138.
|
|
XU Z J , GAO S . Multi-modal emotion recognition based on Transformer-ESIM attention mechanism. Computer Engineering and Applications, 2022, 58 (10): 132- 138.
|
| 10 |
RAHMAN W, HASAN M K, LEE S, et al. Integrating multimodal information in large pretrained Transformers[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Kerrville, USA: Association for Computational Linguistics, 2020: 2359-2369.
|
| 11 |
DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional Transformers for language understanding[EB/OL]. [2024-09-11]. https://arxiv.org/abs/1810.04805.
|
| 12 |
VASWANI A. Attention is all you need[C]//Proceedings of Advances in Neural Information Processing Systems. New York, USA: Curran Associates, Inc., 2017: 6000-6010.
|
| 13 |
HAZARIKA D, ZIMMERMANN R, PORIA S. MISA: modality-invariant and-specific representations for multimodal sentiment analysis[C]//Proceedings of the 28th ACM International Conference on Multimedia. New York, USA: ACM Press, 2020: 1122-1131.
|
| 14 |
YU W M, XU H, YUAN Z Q, et al. Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2021: 10790-10797.
|
| 15 |
HAN W, CHEN H, GELBUKH A, et al. Bi-bimodal modality fusion for correlation-controlled multimodal sentiment analysis[C]//Proceedings of the 2021 International Conference on Multimodal Interaction. New York, USA: ACM Press, 2021: 6-15.
|
| 16 |
PENNINGTON J, SOCHER R, MANNING C. GloVe: global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Philadelphia, USA: Association for Computational Linguistics, 2014: 1532-1543.
|
| 17 |
MEDSKER L , JAIN L C . Recurrent Neural Networks: Design and Applications. Boca Raton, USA: CRC Press, 1999.
|
| 18 |
孙仁科, 许靖昊, 皇甫志宇, 等. 基于视觉-语言预训练模型的零样本迁移学习方法综述. 计算机工程, 2024, 50 (10): 1- 15.
doi: 10.19678/j.issn.1000-3428.0070036
|
|
SUN R K , XU J H , HUANGFU Z Y , et al. Survey of zero-shot transfer learning methods based on vision-language pre-trained models. Computer Engineering, 2024, 50 (10): 1- 15.
doi: 10.19678/j.issn.1000-3428.0070036
|
| 19 |
SUN C, QIU X P, XU Y G, et al. How to fine-tune BERT for text classification?[C]//Proceedings of CCL 2019. Berlin, Germany: Springer, 2019: 194-206.
|
| 20 |
BAEVSKI A, ZHOU H, MOHAMED A, et al. Wav2Vec 2.0: a framework for self-supervised learning of speech representations[EB/OL]. [2024-09-11]. https://arxiv.org/abs/2006.11477.
|
| 21 |
HSU W N , BOLTE B , TSAI Y H , et al. HuBERT: self-supervised speech representation learning by masked prediction of hidden units. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29, 3451- 3460.
|
| 22 |
YU W M, XU H, MENG F Y, et al. CH-SIMS: a Chinese multimodal sentiment analysis dataset with fine-grained annotation of modality[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Philadelphia, USA: ACL Press, 2020: 3718-3727.
|
| 23 |
GRAVES A. Long short-term memory[M]//KACPRZYK J. Supervised sequence labelling with recurrent neural networks. Berlin, Germany: Springer, 2012: 37-45.
|
| 24 |
ZADEH A, ZELLERS R, PINCUS E, et al. MOSI: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos[EB/OL]. [2024-09-11]. https://arxiv.org/abs/1606.06259.
|
| 25 |
BAGHER ZADEH A, LIANG P P, PORIA S, et al. Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Philadelphia, USA: ACL Press, 2018: 2236-2246.
|
| 26 |
WANG D , GUO X T , TIAN Y M , et al. TETFN: a text enhanced transformer fusion network for multimodal sentiment analysis. Pattern Recognition, 2023, 136, 109259.
URL
|
| 27 |
ZHANG H Y, WANG Y, YIN G H, et al. Learning language-guided adaptive hyper-modality representation for multimodal sentiment analysis[EB/OL]. [2024-09-11]. https://arxiv.org/abs/2310.05804.
|
| 28 |
LIU S , LUO Z , FU W N . Fcdnet: fuzzy cognition-based dynamic fusion network for multimodal sentiment analysis. IEEE Transactions on Fuzzy Systems, 2025, 33 (1): 3- 14.
doi: 10.1109/TFUZZ.2024.3407739
|
| 29 |
LI Y, WANG Y Z, CUI Z. Decoupled multimodal distilling for emotion recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2023: 6631-6640.
|