基于动态低秩注意力的鲁棒文本分类

doi:10.19678/j.issn.1000-3428.0253483

摘要/Abstract

摘要： 针对高效Transformer模型在噪声环境下文本分类性能退化的问题，提出了一种结合动态低秩注意力与双视图一致性约束的高效鲁棒文本分类方法。该方法通过输入特征的方差信息自适应地调整注意力秩值，对语义复杂样本分配更高秩以增强表达能力，对简单样本使用较低秩以维持近线性计算复杂度，从而在表示能力与效率之间实现动态平衡。同时，在训练阶段引入双视图一致性约束机制，通过构造干净与受扰动文本视图并约束其语义表示一致，抑制噪声对模型判别边界的干扰，进一步提升模型鲁棒性。本文在多组中英文文本分类数据集上进行了系统实验，包括情感分析、主题识别及细粒度情绪分类等任务。实验结果表明，所提方法在准确率等指标上均优于固定低秩基线模型，并在多种噪声类型与强度下表现出更稳定的分类性能。该研究为在复杂噪声环境下实现高效鲁棒的文本分类提供了一种新的解决思路。

Abstract: To address the performance degradation of efficient Transformer models in noisy text classification scenarios, this study proposes a robust and efficient classification method that integrates a dynamic low-rank attention mechanism with a dual-view consistency constraint. The proposed approach adaptively adjusts the attention rank based on the variance of input features, allocating higher ranks to semantically complex samples to enhance representation capacity and lower ranks to simpler samples to maintain near-linear computational complexity, thus achieving a dynamic balance between expressiveness and efficiency. During training, a dual-view consistency mechanism is introduced by constructing clean and perturbed text views and enforcing consistency between their semantic representations, which suppresses noise-induced shifts in the decision boundary and further improves robustness. Extensive experiments on multiple Chinese and English text classification datasets — including sentiment analysis, topic identification, and fine-grained emotion classification — demonstrate that the proposed method outperforms fixed-rank baselines in terms of accuracy and exhibits more stable performance across various noise types and intensities. This study provides a novel solution for achieving efficient and robust text classification in complex noisy environments.

周泽生, 李平. 基于动态低秩注意力的鲁棒文本分类[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0253483.

Zhou Zesheng, Li Ping. Robust Text Classification with Dynamic Low-Rank Attention[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0253483.

参考文献

[1] Vaswani Ashish, Shazeer Noam, Parmar Niki, et al. Attention is all you need [C]// Advances in Neural Information Processing Systems 30 (NeurIPS 2017). Long Beach, CA: Curran Associates Inc., 2017: 5998-6008.
[2] 刘晓明, 李丞正旭, 吴少聪, 等. 文本分类算法及其应用场景研究综述[J].计算机学报, 2024, 47(6): 1244-1287. Liu Xiao-Ming, Li Cheng-Zheng-Xu, Wu Shao-Cong, et al. A Survey of Text Classification Algorithms and Application Scenarios[J]. Chinese Journal of Computers, 2024, 47(6): 1244–1287.
[3] 孙新, 唐正, 赵永妍, 等. 基于层次混合注意力机制的文本分类模型[J]. 中文信息学报, 2021, 35(2): 69-77 SUN Xin, TANG Zheng, ZHAO Yongyan, et al. Hierarchical Networks with Mixed Attention for Text Classification[J]. Journal of Chinese Information Processing, 2021, 35(2): 69-77
[4] Wang Sinong, Li Belinda Z, Khabsa Madian, et al. Linformer: self-attention with linear complexity [J/OL]. arXiv preprint arXiv: 2006.04768, 2020.
[5] Choromanski Krzysztof Marcin, Likhosherstov Valerii, Dohan David, et al. Rethinking attention with performers[C/OL]//9th International Conference on Learning Representations. [S. l.]: OpenReview.net, 2021. (2021-06-23)[2026-03-04]. https://openreview.net/forum?id=Ua6zuk0WRH.
[6] Zaheer Manzil, Guruganesh Guru, Dubey Kumar Avinava, et al. Big bird: transformers for longer sequences [C]// Proc of Advances in Neural Information Processing Systems 33. [S. l.]: Curran Associates Inc., 2020.
[7] Xiong Yunyang, Zeng Zhanpeng, Chakraborty Rudrasis, et al. Nyströmformer: a Nyström-based algorithm for approximating self-attention [C]// Proc of the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI). [S. l.]: AAAI Press, 2021: 14138-14148.
[8] Zhang Yu, Yang Songlin, Zhu Ruijie, et al. Gated slot attention for efficient linear-time sequence modeling [C]// Proc of the 38th Annual Conference on Neural Information Processing Systems. [S.l.]: [s.n.], 2024.
[9] Lu Jiecheng, Han Xu, Sun Yan, et al. ZeroS: zero-sum linear attention for efficient transformers [C/OL]// Proc of the 39th Annual Conference on Neural Information Processing Systems. [S.l.]: [s.n.],2025. https://openreview.net/pdf?id=Ms6IXbfzzX.
[10] Guo Han, Yang Songlin, Goel T, et al. Log-linear attention [EB/OL]. (2025). https://doi.org/10.48550/arXiv.2506.04761.
[11] Chen Y, Thai Z L, Zhou Z, et al. Hybrid linear attention done right: efficient distillation and effective architectures for extremely long contexts [EB/OL]. (2026). https://arxiv.org/abs/2601.22156.
[12] Miyato T, Maeda S, Koyama M, et al. Virtual adversarial training: a regularization method for supervised and semi-supervised learning[J]. IEEE Trans Pattern Anal Mach Intell, 2019, 41(8): 1979-1993.
[13] Tarvainen Antti, Valpola Harri. Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results [C]// Proc of Advances in Neural Information Processing Systems 30 (NeurIPS 2017). Long Beach, CA, USA: Curran Associates Inc., 2017: 1195-1204.
[14] Xie Qizhe, Dai Zihang, Hovy Eduard H., et al. Unsupervised data augmentation for consistency training [C]// Proc of the 34th Annual Conference on Neural Information Processing Systems (NeurIPS 2020). Virtual: Neural Information Processing Systems Foundation, 2020.
[15] Liang Xiaobo, Wu Lijun, Li Juntao, et al. R-Drop: regularized dropout for neural networks [C]// Proc of the 35th Annual Conference on Neural Information Processing Systems. Virtual: Neural Information Processing Systems Foundation, 2021: 10890-10905.
[16] Sirbu I, Popovici R-A, Caragea C, et al. MultiMatch: multihead consistency regularization matching for semi-supervised text classification[C]//Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP 2025). Suzhou: Association for Computational Linguistics, 2025: 2792-2808.
[17] 钱来, 赵卫伟. 基于对比学习和注意力机制的文本分类方法[J]. 计算机工程, 2024, 50(7): 104-111. QIAN Lai, ZHAO Weiwei. Text Classification Method Based on Contrastive Learning and Attention Mechanism[J]. COMPUTER ENGINEERING, 2024, 50(7): 104-111.
[18] 郑诚, 李鹏飞. 基于双超图神经网络特征融合的文本分类[J]. 计算机工程, 2025, 51(6): 127-135. ZHENG Cheng, LI Pengfei. Text Classification Based on Feature Fusion of Dual Hypergraph Neural Networks[J]. COMPUTER ENGINEERING,2025, 51(6): 127-135.
[19] Yuan B, Chen Y, Zhang Y. Weed out, then harvest: dual low-rank adaptation is an effective noisy label detector for noise-robust learning[C]//Findings of the Association for Computational Linguistics: ACL 2025. Vienna: Association for Computational Linguistics, 2025: 15292-15311.
[20] Erden C. Dynamic rank reinforcement learning for adaptive low-rank multi-head self attention in large language models[J/OL]. CoRR, 2025, abs/2512.15973. https://doi.org/10.48550/arXiv.2512.15973.
[21] Tan Songbo, Zhang Jin. An empirical study of sentiment analysis for Chinese documents [J]. Expert Systems with Applications, 2008, 34(4): 2622-2629.
[22] Maas Andrew L, Daly Raymond E, Pham Peter T, et al. Learning word vectors for sentiment analysis [C]// Proc of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT 2011). Portland, Oregon, USA: Association for Computational Linguistics, 2011: 142-150.
[23] Zhang Xiang, Zhao Junbo Jake, LeCun Yann. Character-level convolutional networks for text classification [C]// Proc of Advances in Neural Information Processing Systems 28 (NeurIPS). Montreal, Quebec, Canada: Curran Associates Inc., 2015: 649-657.
[24] Demszky Dorottya, Movshovitz-Attias Dana, Ko Jeongwoo, et al. GoEmotions: a dataset of fine-grained emotions [C]// Proc of the 58th Annual Meeting of the Association for Computational Linguistics (ACL). Online: Association for Computational Linguistics, 2020: 4040-4054.
[25] Hua Ting, Li Xiao, Gao Shangqian, et al. Dynamic low-rank estimation for transformer-based language models [C]// Proc of the Findings of the Association for Computational Linguistics: EMNLP. Singapore: Association for Computational Linguistics, 2023: 9275-9287.
[26] Bardes A, Ponce J, LeCun Y. VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning [C]// Proceedings of the International Conference on Learning Representations (ICLR). Vienna: OpenReview, 2022: 1–14.
[27] Grill Jean-Bastien, Strub Florian, Altché Florent, et al. Bootstrap your own latent – a new approach to self-supervised learning [C]// Proc of Advances in Neural Information Processing Systems 33 (NeurIPS). [S. l.]: Curran Associates Inc., 2020.
[28] LI Jingyang, SUN Maosong. Scalable Term Selection for Text Categorization.[C]// Proc. of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic: [s. n.], 2007: 774-782.
[29] Tishby Naftali, Pereira Fernando C, Bialek William. The information bottleneck method [J/OL]. arXiv preprint physics/0004057, 2000.
[30] Li Z, Sun M. Punctuation as implicit annotations for Chinese word segmentation[J]. Comput Linguistics, 2009, 35(4): 505-512.
[31] Belinkov Yonatan, Bisk Yonatan. Synthetic and natural noise both break neural machine translation [C]// Proc of the 6th International Conference on Learning Representations (ICLR 2018). Vancouver, BC, Canada: OpenReview.net, 2018.
[32] Gao Ji, Lanchantin Jack, Soffa Mary Lou, et al. Black-box generation of adversarial text sequences to evade deep learning classifiers [C]// Proc of the IEEE Security and Privacy Workshops (SPW). San Francisco, CA, USA: IEEE Computer Society, 2018: 50-56.
[33] Li J, Sun M, Zhang X. A comparison and semi-quantitative analysis of words and character-bigrams as features in chinese text categorization[C]//proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics. 2006: 545-552.
[34] Devlin Jacob, Chang Ming-Wei, Lee Kenton, et al. BERT: pre-training of deep bidirectional transformers for language understanding [C]// Proc of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). Minneapolis, MN, USA: Association for Computational Linguistics, 2019: 4171-4186.

选择文件类型/文献管理软件名称

选择包含的内容