[1] Vaswani Ashish, Shazeer Noam, Parmar Niki, et al. Attention is all you need [C]// Advances in Neural Information Processing Systems 30 (NeurIPS 2017). Long Beach, CA: Curran Associates Inc., 2017: 5998-6008.
[2] 刘晓明, 李丞正旭, 吴少聪, 等. 文本分类算法及其应用场景研究综述[J].计算机学报, 2024, 47(6): 1244-1287.
Liu Xiao-Ming, Li Cheng-Zheng-Xu, Wu Shao-Cong, et al. A Survey of Text Classification Algorithms and Application Scenarios[J]. Chinese Journal of Computers, 2024, 47(6): 1244–1287.
[3] 孙新, 唐正, 赵永妍, 等. 基于层次混合注意力机制的文本分类模型[J]. 中文信息学报, 2021, 35(2): 69-77
SUN Xin, TANG Zheng, ZHAO Yongyan, et al. Hierarchical Networks with Mixed Attention for Text Classification[J]. Journal of Chinese Information Processing, 2021, 35(2): 69-77
[4] Wang Sinong, Li Belinda Z, Khabsa Madian, et al. Linformer: self-attention with linear complexity [J/OL]. arXiv preprint arXiv: 2006.04768, 2020.
[5] Choromanski Krzysztof Marcin, Likhosherstov Valerii, Dohan David, et al. Rethinking attention with performers[C/OL]//9th International Conference on Learning Representations. [S. l.]: OpenReview.net, 2021. (2021-06-23)[2026-03-04]. https://openreview.net/forum?id=Ua6zuk0WRH.
[6] Zaheer Manzil, Guruganesh Guru, Dubey Kumar Avinava, et al. Big bird: transformers for longer sequences [C]// Proc of Advances in Neural Information Processing Systems 33. [S. l.]: Curran Associates Inc., 2020.
[7] Xiong Yunyang, Zeng Zhanpeng, Chakraborty Rudrasis, et al. Nyströmformer: a Nyström-based algorithm for approximating self-attention [C]// Proc of the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI). [S. l.]: AAAI Press, 2021: 14138-14148.
[8] Zhang Yu, Yang Songlin, Zhu Ruijie, et al. Gated slot attention for efficient linear-time sequence modeling [C]// Proc of the 38th Annual Conference on Neural Information Processing Systems. [S.l.]: [s.n.], 2024.
[9] Lu Jiecheng, Han Xu, Sun Yan, et al. ZeroS: zero-sum linear attention for efficient transformers [C/OL]// Proc of the 39th Annual Conference on Neural Information Processing Systems. [S.l.]: [s.n.],2025. https://openreview.net/pdf?id=Ms6IXbfzzX.
[10] Guo Han, Yang Songlin, Goel T, et al. Log-linear attention [EB/OL]. (2025). https://doi.org/10.48550/arXiv.2506.04761.
[11] Chen Y, Thai Z L, Zhou Z, et al. Hybrid linear attention done right: efficient distillation and effective architectures for extremely long contexts [EB/OL]. (2026). https://arxiv.org/abs/2601.22156.
[12] Miyato T, Maeda S, Koyama M, et al. Virtual adversarial training: a regularization method for supervised and semi-supervised learning[J]. IEEE Trans Pattern Anal Mach Intell, 2019, 41(8): 1979-1993.
[13] Tarvainen Antti, Valpola Harri. Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results [C]// Proc of Advances in Neural Information Processing Systems 30 (NeurIPS 2017). Long Beach, CA, USA: Curran Associates Inc., 2017: 1195-1204.
[14] Xie Qizhe, Dai Zihang, Hovy Eduard H., et al. Unsupervised data augmentation for consistency training [C]// Proc of the 34th Annual Conference on Neural Information Processing Systems (NeurIPS 2020). Virtual: Neural Information Processing Systems Foundation, 2020.
[15] Liang Xiaobo, Wu Lijun, Li Juntao, et al. R-Drop: regularized dropout for neural networks [C]// Proc of the 35th Annual Conference on Neural Information Processing Systems. Virtual: Neural Information Processing Systems Foundation, 2021: 10890-10905.
[16] Sirbu I, Popovici R-A, Caragea C, et al. MultiMatch: multihead consistency regularization matching for semi-supervised text classification[C]//Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP 2025). Suzhou: Association for Computational Linguistics, 2025: 2792-2808.
[17] 钱来, 赵卫伟. 基于对比学习和注意力机制的文本分类方法[J]. 计算机工程, 2024, 50(7): 104-111.
QIAN Lai, ZHAO Weiwei. Text Classification Method Based on Contrastive Learning and Attention Mechanism[J]. COMPUTER ENGINEERING, 2024, 50(7): 104-111.
[18] 郑诚, 李鹏飞. 基于双超图神经网络特征融合的文本分类[J]. 计算机工程, 2025, 51(6): 127-135.
ZHENG Cheng, LI Pengfei. Text Classification Based on Feature Fusion of Dual Hypergraph Neural Networks[J]. COMPUTER ENGINEERING,2025, 51(6): 127-135.
[19] Yuan B, Chen Y, Zhang Y. Weed out, then harvest: dual low-rank adaptation is an effective noisy label detector for noise-robust learning[C]//Findings of the Association for Computational Linguistics: ACL 2025. Vienna: Association for Computational Linguistics, 2025: 15292-15311.
[20] Erden C. Dynamic rank reinforcement learning for adaptive low-rank multi-head self attention in large language models[J/OL]. CoRR, 2025, abs/2512.15973. https://doi.org/10.48550/arXiv.2512.15973.
[21] Tan Songbo, Zhang Jin. An empirical study of sentiment analysis for Chinese documents [J]. Expert Systems with Applications, 2008, 34(4): 2622-2629.
[22] Maas Andrew L, Daly Raymond E, Pham Peter T, et al. Learning word vectors for sentiment analysis [C]// Proc of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT 2011). Portland, Oregon, USA: Association for Computational Linguistics, 2011: 142-150.
[23] Zhang Xiang, Zhao Junbo Jake, LeCun Yann. Character-level convolutional networks for text classification [C]// Proc of Advances in Neural Information Processing Systems 28 (NeurIPS). Montreal, Quebec, Canada: Curran Associates Inc., 2015: 649-657.
[24] Demszky Dorottya, Movshovitz-Attias Dana, Ko Jeongwoo, et al. GoEmotions: a dataset of fine-grained emotions [C]// Proc of the 58th Annual Meeting of the Association for Computational Linguistics (ACL). Online: Association for Computational Linguistics, 2020: 4040-4054.
[25] Hua Ting, Li Xiao, Gao Shangqian, et al. Dynamic low-rank estimation for transformer-based language models [C]// Proc of the Findings of the Association for Computational Linguistics: EMNLP. Singapore: Association for Computational Linguistics, 2023: 9275-9287.
[26] Bardes A, Ponce J, LeCun Y. VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning [C]// Proceedings of the International Conference on Learning Representations (ICLR). Vienna: OpenReview, 2022: 1–14.
[27] Grill Jean-Bastien, Strub Florian, Altché Florent, et al. Bootstrap your own latent – a new approach to self-supervised learning [C]// Proc of Advances in Neural Information Processing Systems 33 (NeurIPS). [S. l.]: Curran Associates Inc., 2020.
[28] LI Jingyang, SUN Maosong. Scalable Term Selection for Text Categorization.[C]// Proc. of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic: [s. n.], 2007: 774-782.
[29] Tishby Naftali, Pereira Fernando C, Bialek William. The information bottleneck method [J/OL]. arXiv preprint physics/0004057, 2000.
[30] Li Z, Sun M. Punctuation as implicit annotations for Chinese word segmentation[J]. Comput Linguistics, 2009, 35(4): 505-512.
[31] Belinkov Yonatan, Bisk Yonatan. Synthetic and natural noise both break neural machine translation [C]// Proc of the 6th International Conference on Learning Representations (ICLR 2018). Vancouver, BC, Canada: OpenReview.net, 2018.
[32] Gao Ji, Lanchantin Jack, Soffa Mary Lou, et al. Black-box generation of adversarial text sequences to evade deep learning classifiers [C]// Proc of the IEEE Security and Privacy Workshops (SPW). San Francisco, CA, USA: IEEE Computer Society, 2018: 50-56.
[33] Li J, Sun M, Zhang X. A comparison and semi-quantitative analysis of words and character-bigrams as features in chinese text categorization[C]//proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics. 2006: 545-552.
[34] Devlin Jacob, Chang Ming-Wei, Lee Kenton, et al. BERT: pre-training of deep bidirectional transformers for language understanding [C]// Proc of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). Minneapolis, MN, USA: Association for Computational Linguistics, 2019: 4171-4186.
|