计算机工程 ›› 2019, Vol. 45 ›› Issue (11): 177-182.doi: 10.19678/j.issn.1000-3428.0052829

• 人工智能及识别技术 • 上一篇    下一篇

基于双通道词向量的卷积胶囊网络文本分类

康雁, 李晋源, 杨其越, 崔国荣, 王沛尧   

  1. 云南大学 软件学院, 昆明 650500
  • 收稿日期:2018-10-09 修回日期:2018-11-11 发布日期:2018-11-16
  • 作者简介:康雁(1972-),女,副教授,主研方向为机器学习;李晋源、杨其越、崔国荣、王沛尧,硕士研究生。
  • 基金项目:
    国家自然科学基金(61762092);云南省软件工程重点实验室开放基金(2017SE204)。

Text Classification Using Convolutional Capsule Network Based on Dual-Channel Word Vectors

KANG Yan, LI Jinyuan, YANG Qiyue, CUI Guorong, WANG Peiyao   

  1. School of Software, Yunnan University, Kunming 650500, China
  • Received:2018-10-09 Revised:2018-11-11 Published:2018-11-16

摘要: 基于向量空间模型的文本分类方法的文本表示具有高纬度、高稀疏的特点,特征表达能力较弱,且特征工程依赖人工提取,成本较高。针对该问题,提出基于双通道词向量的卷积胶囊网络文本分类算法。将Word2Vec训练的词向量与基于特定文本分类任务扩展的语境词向量作为神经网络的2个输入通道,并采用具有动态路由机制的卷积胶囊网络模型进行文本分类。在多个英文数据集上的实验结果表明,双通道的词向量训练方式优于单通道策略,与LSTM、RAE、MV-RNN等算法相比,该算法具有较高的文本分类准确率。

关键词: 双通道词向量, 卷积胶囊网络, 动态路由机制, 文本分类, 特征表达

Abstract: Text classification method based on space vector model has high latitude and sparse features in text expression,which leads to poor performance in feature description,and feature engineering relies on manual extraction,the cost of which is high.To address these problems,this paper proposes a text classification algorithm using convolutional capsule network based on dual-channel word vectors.This algorithm uses word vectors trained by Word2Vec and context vectors extended based on specific text classification tasks as two input channels of the neural network.Then a convolutional capsule network model with dynamic routing mechanism is used for text classification.Experimental results on multiple English datasets show that the dual-channel training method for word vectors has better performance than the single-channel training method.Also,the proposed algorithm has a higher accuracy rate in text classification compared with LSTM,RAE,MV-RNN and other algorithms.

Key words: dual-channel word vectors, convolutional capsule network, dynamic routing mechanism, text classification, feature description

中图分类号: