[1] BALTRUSAITIS T, AHUJA C, MORENCY L P. Multimodal machine learning: A survey and taxonomy[J].
IEEE Transactions on Pattern Analysis and Machine
Intelligence, 2018, 41(2): 423-443.
[2] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems, 2017, 30.
[3] 刘建伟, 刘俊文, 罗雄麟. 深度学习中注意力机制研究
进展[J]. 工程科学学报, 2021, 43(11): 1499-1511.DOI:
10.13374/j.issn2095-9389.2021.01.30.005
Liu J W, Liu J W, Luo X L. Research progress of at
tention mechanism in deep learning[J]. Journal of Eng
ineering Science , 2021, 43(11): 1499-1511.DOI:10.133
74/j.issn2095-9389.2021.01.30.005. (in Chinese)
[4] SETYAWAN M Y H, AWANGGA R M, Efendi S R.
Comparison of multinomial naive bayes algorithm and
logistic regression for intent classification in chatbot
[C]//2018 International Conference on Applied Engineering (ICAE). IEEE, 2018: 1-5. [5] 刘娇,李艳玲,林民.人机对话系统中意图识别方法综述
[J].计算机工程与应用, 2019, 55(12):8.DOI:10.3778/j.iss
n.1002-8331.1902-0129.
Liu J, Li Y L, Lin M. Summary of intention recognition methods in man-machine dialogue system[J]. Computer engineering and application,2019,55(12):8.DOI:1
0.3778/j.issn.1002-8331.1902-0129. (in Chinese)
[6] YOLCHUYEVA S, NEMETH G, GYIRES-TOTH B. S
elf-attention networks for intent detection[J]. arXiv pr
eprint arXiv:2006.15585, 2020.
[7] WANG J, WEI K, RADFAR M, et al. Encoding syntactic knowledge in transformer encoder for intent detection and slot filling[C]//Proceedings of the AAAI C
onference on Artificial Intelligence. 2021, 35(16): 139
43-13951.
[8] Liu X, Li J, Mu J, et al. Effective open intent classification with K-center contrastive learning and adjustable decision boundary[C]//Proceedings of the AAAI C
onference on Artificial Intelligence. 2023, 37(11): 132
91-13299.
[9] CASANUEVA I, TEMCINAS T, GERZ D, et al. Efficient Intent Detection with Dual Sentence Encoders[C]
//In Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI, 2020: 38-45
[10] HUANG Y, DU C, XUE Z, et al. What makes multimodal learning better than single (provably)[J]. Advances in Neural Information Processing Systems, 2021,
34: 10944-10956.
[11] 程大雷,张代玮,陈雅茜.多模态情感识别综述[J].西南民
族大学自然科学版,2022,48(4):440-447
Cheng D L, Zhang D W, Chen Y Q. A summary of
multimodal emotion recognition[J]. Southwest University for Nationalities Natural Science Edition, 2022,48
(4):440-447. (in Chinese)
[12] HASAN M K, LEE S, RAHMAN W, et al. Humor
knowledge enriched transformer for understanding multimodal humor[C]//Proceedings of the AAAI conference on artificial intelligence. 2021, 35(14): 12972-1298
0.
[13] ZHANG H, XU H, WANG X, et al. Mintrec: A new
dataset for multimodal intent recognition[C]//Proceedings of the 30th ACM International Conference on Multimedia. 2022: 1688-1697.
[14] ZHAN L M, LIANG H, LIU B, et al. Out-of-scope i
ntent detection with self-supervision and discriminative
training[J]. arXiv preprint arXiv:2106.08616, 2021.
[15] ZHOU Y, LIU P, QIU X. KNN-contrastive learning
for out-of-domain intent classification[C]//Proceedings
of the 60th Annual Meeting of the Association for
Computational Linguistics (Volume 1: Long Papers).
2022: 5129-5141.
[16] GANDHI A, ADHVARYU K, PORIA S, et al. Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications,
challenges and future directions[J]. Information Fusion,
2023, 91: 424-444.
[17] HAZARIKA D, ZIMMERMANN R, PORIA S. Misa:
Modality-invariant and-specific representations for multimodal sentiment analysis[C]//Proceedings of the 28th
ACM international conference on multimedia. 2020: 1
122-1131.
[18] ZADEH A, CHEN M, PORIA S, et al. Tensor fusion
network for multimodal sentiment analysis[J]. arXiv preprint arXiv:1707.07250, 2017.
[19] LIU Z, SHEN Y, LAKSHMINARASIMHAN V B, et
al. Efficient low-rank multimodal fusion with modality
-specific factors[J]. arXiv preprint arXiv:1806.00064, 2
018.
[20] TSAI Y H H, BAI S, LIANG P P, et al. Multimodalt
ransformer for unaligned multimodal language sequences[C]//Proceedings of the conference. Association for
computational linguistics. Meeting. NIH Public Access,
2019, 2019: 6558.
[21] RAHMAN W, HASAN M K, LEE S, et al. Integrating multimodal information in large pretrained transformers[C]//Proceedings of the conference. Association for Computational Linguistics. Meeting. NIH Public Access, 2020, 2020: 2359.
[22] HAN W, CHEN H, PORIA S. Improving multimodal
fusion with hierarchical mutual information maximization for multimodal sentiment analysis[J]. arXiv preprint arXiv:2109.00412, 2021.
[23] YU W, XU H, YUAN Z, et al. Learning modality-specific representations with self-supervised multi-task l-earning for multimodal sentiment analysis[C]//Proceedings of the AAAI conference on artificial intelligence.
2021, 35(12): 10790-10797.
[24] DEVLIN J, CHANG M W, LEE K, et al. Bert: Pre-tr
aining of deep bidirectional transformers for language
understanding[J]. arXiv preprint arXiv:1810.04805, 201
8.
[25] LIU Z, Lin Y, Cao Y, et al. Swin transformer: Hierar
chical vision transformer using shifted windows[C]//Pr
oceedings of the IEEE/CVF international conference o
n computer vision. 2021: 10012-10022.
[26] CHEN S, WANG C, CHEN Z, et al. Wavlm: Large-s
cale self-supervised pre-training for full stack speech
processing[J]. IEEE Journal of Selected Topics in Sign
al Processing, 2022, 16(6): 1505-1518.
[27] ZHANG H, WANG X, XU H, et al. MIntRec2.0: A
Large-scale Benchmark Dataset for Multimodal Intent
Recognition and Out-of-scope Detection in Conversations[J]. arXiv preprint arXiv:2403.10943, 2024.
[28] SOUJANYA P, DEVAMANYU H, NAVONIL M, et al.
MELD: A Multimodal Multi-Party Dataset for Emotio
n Recognition in Conversations[C]//Proceedings of the
57th Annual Meeting of the Association for Computat
ional Linguistics. 2019: 527–536.
|