作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2018, Vol. 44 ›› Issue (7): 193-198. doi: 10.19678/j.issn.1000-3428.0047596

• 人工智能及识别技术 • 上一篇    下一篇

基于卷积神经网络和KNN的短文本分类算法研究

殷亚博 a,杨文忠 a,杨慧婷 a,许超英 b   

  1. 新疆大学 a.信息科学与工程学院; b.软件学院,乌鲁木齐 830046
  • 收稿日期:2017-06-14 出版日期:2018-07-15 发布日期:2018-07-15
  • 作者简介:殷亚博(1990—),男,硕士研究生,主研方向为机器学习、自然语言处理;杨文忠,副教授、博士;杨慧婷、许超英,硕士研究生。
  • 基金资助:

    国家重点基础研究发展计划项目(2014CB340500);国家自然科学基金(U1603115,61262087)。

Research on Short Text Classification Algorithm Based on Convolutional Neural Network and KNN

YIN Yabo a,YANG Wenzhong a,YANG Huiting a,XU Chaoying b   

  1. a.College of Information Science and Engineering; b.School of Software,Xinjiang University,Urumqi 830046,China
  • Received:2017-06-14 Online:2018-07-15 Published:2018-07-15

摘要:

为解决传统基于TF-IDF的K最近邻(KNN)分类算法在短文本分类时,出现特征维度过高和数据稀疏的问题,基于卷积神经网络和KNN,提出一种新的短文本分类算法。通过采用神经网络语言模型word2vec对短文本进行词向量的训练,并用训练好的词向量表示文本,使用卷积神经 网络对短文本进行抽象特征的提取,在提取出抽象特征的基础上,运用KNN分类器进行短文本分类。分别在短文本中句子数目为2、4、6、8的数据集上进行测试,结果表明,与基于TF-IDF的KNN分类算法相比,该算法在准确率、召回率和F1值上平均提高了10.2%、21.1%和15.5% 。

关键词: 社交网络, 卷积神经网络, K最近邻, 短文本, 机器学习, 深度学习

Abstract:

In order to solve the problem of high dimension and sparse data in the traditional K-Nearest Neighbor(KNN) short text classification algorithm based on TF-IDF,a short text classification algorithm based on convolutional neural network and KNN is proposed.The word vector is trained by word2vec which is a kind of neural network language model and used to represent the short text.These abstract features of short text are extracted by using the model of Convolutional Neural Network.The short text is classified by the KNN classifier based on the extracted abstract features.Experiments are performed on the data set which the number of sentences in short texts is 2,4,6 and 8 respectively.Experimental results show that compared with the KNN classification algorithm based on TF-IDF,this algorithm has an average increase of 10.2%,21.1% and 15.5% respectively in accuracy rate,recall rate and F1 value.

Key words: social network, Convolutional Neural Network(CNN), K-Nearest Neighbor(KNN), short text, machine learning, deep learning

中图分类号: