基于对比学习和注意力机制的文本分类方法

doi:10.19678/j.issn.1000-3428.0068132

摘要/Abstract

摘要：

文本分类作为自然语言处理领域的基本任务, 在信息检索、机器翻译和情感分析等应用中发挥着重要作用。然而大多数深度模型在预测时未充分考虑训练实例的丰富信息, 导致学到的文本特征不够全面。为了充分利用训练实例信息, 提出一种基于对比学习和注意力机制的文本分类方法。首先, 设计一种有监督对比学习训练策略, 旨在优化模型对文本向量表征的检索, 提高模型在推理过程中检索到的训练实例的质量; 然后, 构建注意力机制, 对获取的训练文本特征进行注意力分布学习, 聚焦关联性更强的相邻实例信息, 获得更多隐含的相似特征; 最后, 将注意力机制与模型网络相结合, 融合相邻的训练实例信息, 增强模型提取多样性特征的能力, 实现全局特征和局部特征的提取。实验结果表明, 所提方法在卷积神经网络(CNN)、双向长短期记忆网络(BiLSTM)、图卷积网络(GCN)、BERT和RoBERTa等多个模型上都取得了显著的性能提升。以CNN模型为例, 其在THUCNews数据集、今日头条数据集和搜狗数据集上宏F1值分别提高了4.15、6.2和1.92个百分点。因此, 该方法也为文本分类任务提供了一种有效的解决方案。

关键词: 文本分类, 深度模型, 对比学习, 近似最近邻算法, 注意力机制

Abstract:

Text classification is a basic task in the field of natural language processing and plays an important role in information retrieval, machine translation, sentiment analysis, and other applications. However, most deep learning models do not fully consider the rich information in training instances during inference, resulting in inadequate text feature learning. To leverage training instance information fully, this paper proposes a text classification method based on contrastive learning and attention mechanism. First, a supervised contrastive learning training strategy is designed to optimize the retrieval of text vector representations, thereby improving the quality of the retrieved training instances during the inference process. Second, an attention mechanism is constructed to learn the attention distribution of the obtained training text features, focusing on adjacent instance information with stronger relevance and capturing more implicit similarity features. Finally, the attention mechanism is combined with the model network, fusing information from adjacent training instances to enhance the ability of the model to extract diverse features and achieve global and local feature extraction. The experimental results demonstrate that this method achieves significant improvements on various models, including Convolutional Neural Network(CNN), Bidirectional Long Short-Term Memory(BiLSTM), Graph Convolutional Network(GCN), Bidirectional Encoder Representations from Transformers(BERT), and RoBERTa. For the CNN model, the macro F1 value is increased by 4.15, 6.2, and 1.92 percentage points for the THUCNews, Toutiao, and Sogou datasets, respectively. Therefore, this method provides an effective solution for text classification tasks.

Key words: text classification, deep model, contrastive learning, approximate nearest neighbor algorithm, attention mechanism

钱来, 赵卫伟. 基于对比学习和注意力机制的文本分类方法[J]. 计算机工程, 2024, 50(7): 104-111.

Lai QIAN, Weiwei ZHAO. Text Classification Method Based on Contrastive Learning and Attention Mechanism[J]. Computer Engineering, 2024, 50(7): 104-111.

https://www.ecice06.com/CN/Y2024/V50/I7/104

图/表 8

图1 本文模型框架

Fig.1 Framework of the proposed model

图2 近邻实例和测试实例的标签相似度等级分布

Fig.2 Label similarity level distribution of nearest neighbor instances and test instances

图3

$ k $

不同取值对模型分类效果的影响

Fig.3 The effect of different

$ k $

values on the classification performance of models

参考文献 28

1	ETHAYARAJH K. How contextual are contextualized word representations? Comparing the geometry of BERT, ELMo, and GPT-2 embeddings[C]∥Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg, USA: Association for Computational Linguistics, 2019: 55-65.
2	KIM Y. Convolutional neural networks for sentence classification[C]∥Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: Association for Computational Linguistics, 2014: 1746-1751.
3	ZENG D J, LIU K, LAI S W, et al. Relation classification via convolutional deep neural network[C]∥Proceedings of COLING 2014. Dublin, Ireland: Dublin City University and Association for Computational Linguistics, 2014: 2335-2344.
4	郭丽丽, 丁世飞. 深度学习研究进展. 计算机科学, 2015, 42(5): 28- 33. URL
	GUO L L, DING S F. Research progress on deep learning. Computer Science, 2015, 42(5): 28- 33. URL
5	LIU G, GUO J B. Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing, 2019, 337(C): 325- 338.
6	HOCHREITER S, SCHMIDHUBER J. Long short-term memory. Neural Computation, 1997, 9(8): 1735- 1780. doi: 10.1162/neco.1997.9.8.1735
7	YANG Z C, YANG D Y, DYER C, et al. Hierarchical attention networks for document classification[C]∥Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, USA: Association for Computational Linguistics, 2016: 1480-1489.
8	万齐斌, 董方敏, 孙水发. 基于BiLSTM-Attention-CNN混合神经网络的文本分类方法. 计算机应用与软件, 2020, 37(9): 94-98, 201. URL
	WAN Q B, DONG F M, SUN S F. Text classification method based on BiLSTM-Attention-CNN hybrid neural network. Computer Applications and Software, 2020, 37(9): 94-98, 201. URL
9	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[EB/OL]. [2023-05-10]. https://arxiv.org/abs/1706.03762.
10	陈立潮, 秦杰, 陆望东, 等. 自注意力机制的短文本分方法. 计算机工程与设计, 2022, 43(3): 728- 734.
	CHEN L C, QIN J, LU W D, et al. Short text classification method based on self-attention mechanism. Computer Engineering and Design, 2022, 43(3): 728- 734.
11	石磊, 王明宇, 宋哲理, 等. 自注意力机制和BiGRU相结合的文本分类研究. 小型微型计算机系统, 2022, 43(12): 2541- 2548. URL
	SHI L, WANG M Y, SONG Z L, et al. Text classification research with the combination of self-attention mechanism and BiGRU. Journal of Chinese Computer Systems, 2022, 43(12): 2541- 2548. URL
12	殷亚博, 杨文忠, 杨慧婷, 等. 基于卷积神经网络和KNN的短文本分类算法研究. 计算机工程, 2018, 44(7): 193- 198. URL
	YIN Y B, YANG W Z, YANG H T, et al. Research on short text classification algorithm based on convolutional neural network and KNN. Computer Engineering, 2018, 44(7): 193- 198. URL
13	王坤, 段湘煜. 倾向近邻关联的神经机器翻译. 计算机科学, 2019, 46(5): 198- 202. URL
	WANG K, DUAN X Y. Neural machine translation inclined to close neighbor association. Computer Science, 2019, 46(5): 198- 202. URL
14	WANG Z G, HAMZA W, SONG L F. k-nearest neighbor augmented neural networks for text classification[EB/OL]. [2023-05-10]. https://arxiv.org/abs/1708.07863.
15	朱烨, 陈世平. 最近邻注意力和卷积神经网络的文本分类模型. 小型微型计算机系统, 2020, 41(2): 375- 380. URL
	ZHU Y, CHEN S P. Text classification model based on nearest neighbor attention and convolution neural network. Journal of Chinese Computer Systems, 2020, 41(2): 375- 380. URL
16	关紫微, 吕钊, 滕金保. 基于最近邻注意力与卷积神经网络的服装分类模型. 毛纺科技, 2023, 51(8): 105- 111. URL
	GUAN Z W, LÜ Z, TENG J B. Clothing classification model based on KNN-attention and convolution neural network. Wool Textile Journal, 2023, 51(8): 105- 111. URL
17	朱璐, 陈世平. 融合情感增强与注意力的文本情感分析模型. 小型微型计算机系统, 2022, 43(5): 957- 963. URL
	ZHU L, CHEN S P. Text sentiment analysis combined sentiment enhanced and attention. Journal of Chinese Computer Systems, 2022, 43(5): 957- 963. URL
18	YAN Z, RUI D H, ZUO Z L, et al. An unsupervised sentence embedding method by mutual in formation maximization[EB/OL]. [2023-05-10]. https://arxiv.org/abs/2009.12061.
19	REIMERS N, GUREVYCH I. Sentence-BERT: sentence embeddings using siamese BERT-networks[C]∥Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg, USA: Association for Computational Linguistics, 2019: 3973-3983.
20	高怡, 纪焘, 吴苑斌, 等. 基于标签增强和对比学习的鲁棒小样本事件检测. 中文信息学报, 2023, 37(4): 98- 108. URL
	GAO Y, JI T, WU Y B, et al. Robust few shot event detection based on label augmentation and contrastive learning. Journal of Chinese Information Processing, 2023, 37(4): 98- 108. URL
21	付海涛, 刘烁, 冯宇轩, 等. 基于对比学习方法的小样本学习. 吉林大学学报(理学版), 2023, 61(1): 111- 117. URL
	FU H T, LIU S, FENG Y X, et al. Few-shot learning based on contrastive learning method. Journal of Jilin University(Science Edition), 2023, 61(1): 111- 117. URL
22	甘红楠, 张凯. 参数自适应下基于近邻图的近似最近邻搜索. 计算机工程, 2022, 48(9): 28- 36. URL
	GAN H N, ZHANG K. Approximate nearest neighbor search based on neighbor graphs with parameter adaptation. Computer Engineering, 2022, 48(9): 28- 36. URL
23	李灿, 钱江波, 董一鸿, 等. M2LSH: 基于LSH的高维数据近似最近邻查找算法. 电子学报, 2017, 45(6): 1431- 1442. URL
	LI C, QIAN J B, DONG Y H, et al. M2LSH: an LSH based technique for approximate nearest neighbor searching on high dimensional data. Acta Electronica Sinica, 2017, 45(6): 1431- 1442. URL
24	MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[EB/OL]. [2023-05-10]. https://arxiv.org/pdf/1301.3781.
25	KINGMA D P, BA J. Adam: a method for stochastic optimization[C]∥Proceedings of the 3rd International Conference on Learning Representations. San Diego, USA: ICLR, 2015: 1-15.
26	YAO L, MAO C S, LUO Y. Graph convolutional networks for text classification[C]∥Proceedings of the AAAI Conference on Artificial Intelligence. [S. l.]: AAAI Press, 2019: 7370-7377.
27	LIU Y, OTT M, GOYAL N, et al. RoBERTa: a robustly optimized BERT pretraining approach[EB/OL]. [2023-05-10]. https://arxiv.org/abs/1907.11692.
28	孙宇冲, 程曦苇, 宋睿华, 等. 多模态与文本预训练模型的文本嵌入差异研究. 北京大学学报(自然科学版), 2023, 59(1): 48- 56. URL
	SUN Y C, CHENG X W, SONG R H, et al. Difference between multi-modal vs. text pre-trained models in embedding text. Acta Scientiarum Naturalium Universitatis Pekinensis, 2023, 59(1): 48- 56. URL

[1]	李俊俊, 董建刚, 李坤. 基于Kubernetes的集群节能策略研究[J]. 计算机工程, 2024, 50(9): 82-91.
[2]	林畅, 郭伟, 任哲聪, 金海波. 基于Transformer的目标跟踪与分割统一算法[J]. 计算机工程, 2024, 50(9): 130-141.
[3]	李泽霖, 吕兆峰, 陈富强, 李克. 基于多跳信息融合的实体对齐模型[J]. 计算机工程, 2024, 50(9): 142-152.
[4]	王汝英, 马嘉骏, 董建强, 刘万龙, 张海涛, 尹凯, 赵博超. 基于MTS-BiGRU-DMHSA的工业负荷预测方法[J]. 计算机工程, 2024, 50(9): 169-178.
[5]	朱凯, 李理, 张彤, 江晟, 别一鸣. 基于Transformer的多阶段运动模糊图像修复网络[J]. 计算机工程, 2024, 50(9): 276-285.
[6]	张天鹏, 韩晶, 吕学强. 基于多任务学习的超分辨率辅助小目标检测[J]. 计算机工程, 2024, 50(9): 304-312.
[7]	郭敏, 张熙涵, 李阳. 融合注意力的教师互一致性半监督医学图像分割[J]. 计算机工程, 2024, 50(9): 313-323.
[8]	曾钰琦, 刘博, 钟柏昌, 钟瑾. 智慧教育下基于改进YOLOv8的学生课堂行为检测算法[J]. 计算机工程, 2024, 50(9): 344-355.
[9]	饶日昕, 王怡文, 曾砺志, 童心恬, 赵海涛. 面向废旧电缆检测的轻量化网络模型[J]. 计算机工程, 2024, 50(8): 22-30.
[10]	李华昱, 张智康, 闫阳, 岳阳. 基于知识图谱增强的领域多模态实体识别[J]. 计算机工程, 2024, 50(8): 31-39.
[11]	王蕾, 党时鹏, 潘丰. 基于卷积神经网络的隐匿性旁路预测模型[J]. 计算机工程, 2024, 50(8): 40-49.
[12]	陈瀚, 赵春蕾, 蒋昊达, 王春东. 基于融合模型与语义网络的App用户意图识别研究[J]. 计算机工程, 2024, 50(8): 50-63.
[13]	高爽, 史轶伦, 徐巧枝, 于磊. 基于对比学习的非对称编解码结构的心脏MRI分割研究[J]. 计算机工程, 2024, 50(8): 290-300.
[14]	王夙喆, 张雪英, 陈晓玉, 李凤莲, 吴泽林. 基于有效注意力和GAN结合的脑卒中EEG增强算法[J]. 计算机工程, 2024, 50(8): 336-344.
[15]	王宇, 祁琦, 王纯, 许才. 储能变流器信号高精度故障诊断方法[J]. 计算机工程, 2024, 50(8): 389-396.

选择文件类型/文献管理软件名称

选择包含的内容