Linear Regression Text Classification Based on Class-wise Nearest Neighbor Dictionary

doi:10.19678/j.issn.1000-3428.0058692

Abstract

Abstract: In text classification, the high dimensionality of text representation increases the computational complexity. To address the problem, a Linear Regression Classification(LRC) model is constructed based on neighborhood dictionary. The K-Nearest Neighbor(KNN) method is used to construct the neighbor dictionary for each class, and the LRC algorithms based on the concatenate class-wise nearest neighbor dictionary and the class-wise nearest neighbor dictionary are proposed separately according to the different representations of the test sample. In addition, the correlation between the sample and the classes is measured to clip the noise data, alleviating the impact of noise data on classification performance. The experimental results show that the proposed model provides high classification accuracy and calculation efficiency for long texts and short texts. For those texts with multiple classes, the strategy of noise class clipping also enables it to display excellent classification performance.

Key words: Spares Representation Classification(SRC), K-Nearest Neighbor(KNN), dictionary learning, Linear Regression Classification(LRC), text classification

摘要： 文本表示的高维性会增加文本分类时的计算复杂度。针对该问题，构建基于类邻域字典的线性回归分类模型。采用K近邻方法构造各类别的类邻域字典，根据对测试样本的不同表示，分别提出基于级联类邻域字典和基于类邻域字典的线性回归分类算法。此外，为缓解噪声数据对分类性能的影响，通过度量测试样本与各个类别之间的相关度裁剪噪声类数据。实验结果表明，该模型对长文本和短文本均能够得到较高的分类精度和计算效率，同时，噪声类裁剪策略使其对包含较多类别数的文本语料也具有较好的分类性能。

关键词: 稀疏表示分类, K近邻, 字典学习, 线性回归分类, 文本分类

CLC Number:

TP181

WU Jiao, HONG Caifeng, GU Yongchun, GU Xingquan, JIN Shiju. Linear Regression Text Classification Based on Class-wise Nearest Neighbor Dictionary[J]. Computer Engineering, 2021, 47(8): 93-99,108.

武娇, 洪彩凤, 顾永春, 顾兴全, 金世举. 基于类邻域字典的线性回归文本分类[J]. 计算机工程, 2021, 47(8): 93-99,108.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0058692

http://www.ecice06.com/EN/Y2021/V47/I8/93

Figures/Tables 6

References

[1] DONOHO D L.Compressed sensing[J]. IEEE Transactions on Information Theory, 2006, 52(4): 1289-1306.
[2] WRIGHT J, YANG A Y, GANESH A, et al. Robust face recognition via sparse representation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(2): 210-227.
[3] FU L J, CHEN D Y, LIN K Z, et al. An improved SRC method based on virtual samples for face recognition[J]. Journal of Modern Optics, 2018, 65(13): 1565-1576.
[4] ZHANG L, YANG M, FENG X C, et al. Sparse representation or collaborative representation:which helps face recognition?[C]//Proceedings of 2011 International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2011:471-478.
[5] SONG W R, WANG H, MAGUIRE P, et al. Collaborative representation based classifier with partial least squares regression for the classification of spectral data[J]. Chemometrics and Intelligent Laboratory Systems, 2018, 182:79-86.
[6] KLOFT M, BREFELD U, SONNENBURG S, et al. Lp-norm multiple kernel learning[J]. The Journal of Machine Learning Research, 2011, 12:953-997.
[7] SAINATH T N, MASKEY S, KANEVSKY D, et al. Sparse representations for text categorization[C]//Proceedings of the 11th Annual Conference of the International Speech Communication Association.Washington D.C., USA:IEEE Press, 2010:2266-2269.
[8] 范少萍, 郑春厚, 王召兵.基于元样本稀疏表示分类器的文本资源分类[J]. 图书情报工作, 2011, 55(16): 115-118. FAN S P, ZHENG C H, WANG Z B.Metasample based sparse representation classification for text classifying[J]. Library and Information Service, 2011, 55(16): 115-118.(in Chinese)
[9] SHARMA N, SHARMA A, THENKANIDIYOOR V, et al. Text classification using combined sparse representation classifiers and support vector machines[C]//Proceedings of the 4th International Symposium on Computational and Business Intelligence.Washington D.C., USA:IEEE Press, 2016:181-185.
[10] GAO L W, ZHOU S G, GUAN J H.Effectively classifying short texts by structured sparse representation with dictionary filtering[J]. Information Sciences, 2015, 323:130-142.
[11] 彭烁.稀疏表示编码模型及其在文本分类中的应用[D]. 天津:天津大学, 2015. PENG S.Sparse representation coding model and its application in text categorization[D]. Tianjin:Tianjin University, 2015.(in Chinese)
[12] UNNIKRISHNAN P, GOVINDAN V K, KUMAR S D, et al. Enhanced sparse representation classifier for text classification[J]. Expert Systems with Applications, 2019, 129:260-272.
[13] ZHU Q, FENG Q X, HUANG J S, et al. Sparse representation classification based on difference subspace[C]//Proceedings of 2016 IEEE Congress on Evolutionary Computation.Washington D.C., USA:IEEE Press, 2016:4244-4249.
[14] ZHU Q, SUN H, FENG Q X, et al. CCEDA:building bridge between subspace projection learning and sparse representation-based classification[J]. Electronics Letters, 2014, 50(25): 1919-1921.
[15] YANG J, CHU D L, ZHANG L, et al. Sparse representation classifier steered discriminative projection with applications to face recognition[J]. IEEE Transactions on Neural Networks, 2013, 24(7): 1023-1035.
[16] ZHANG N, YANG J.K nearest neighbor based local sparse representation classifier[C]//Proceedings of 2010 Chinese Conference on Pattern Recognition.Washington D.C., USA:IEEE Press, 2010:1-5.
[17] LI C G, GUO J, ZHANG H G, et al. Local sparse representation based classification[C]//Proceedings of the 20th International Conference on Pattern Recognition.Washington D.C., USA:IEEE Press, 2010:649-652.
[18] HE Y, LI G F, LIAO Y J, et al. Gesture recognition based on an improved local sparse representation classification algorithm[J]. Cluster Computing, 2019, 22(5): 10935-10946.
[19] LI W, DU Q, ZHANG F, et al. Collaborative-representation-based nearest neighbor classifier for hyperspectral imagery[J]. IEEE Geoscience and Remote Sensing Letters, 2015, 12(2): 389-393.
[20] ZHENG C Y, WANG N N.Collaborative representation with k-nearest classes for classification[J]. Pattern Recognition Letters, 2019, 117:30-36.
[21] NASEEM I, TOGNERI R, BENNAMOUN M, et al. Linear regression for face recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(11): 2106-2112.
[22] Mallat S, ZHANG Z F.Matching pursuits with time-frequency dictionaries[J]. IEEE Transactions on Signal Processing, 1993, 41(12): 3397-3415.
[23] PATI Y C, REZAⅡTAR R, KRISHNAPRASAD P S, et al. Orthogonal matching pursuit:recursive function approximation with applications to wavelet decomposition[C]//Proceeding of the 27th Asilomar Conference on Signals, Systems and Computers. Grove, USA:[s.n.], 1993:40-44.
[24] LIU X J, XIA S T, FU F W, et al. Reconstruction guarantee analysis of basis pursuit for binary measurement matrices in compressed sensing[J]. IEEE Transactions on Information Theory, 2017, 63(5): 2922-2932.
[25] KIM S J, KOH K, LUSTIG M, et al. An interior point method for large-scale l1-regularized least squares[J]. IEEE Journal on Selected Topics in Signal Processing, 2007, 1(4): 606-617.
[26] EFRON B, HASTIE T, JOHNSTONE I M, et al. Least angle regression[J]. Annals of Statistics, 2004, 32(2): 407-499.
[27] BOYD S, VANDENBERGHE L.Convex optimization[M]. Cambridge, UK:Cambridge University Press, 2004.
[28] NLPIR.文本分类语料库(复旦)测试语料[DB/OL]. (2017-10-02)[2020-05-10]. http://www.nlpir.org/wordpress/2017/10/02/文本分类语料库(复旦)测试语料/. NLPIR.Text classification corpus(Fudan) test corpus[EB/OL]. (2017-10-02)[2020-05-10]. http://www.nlpir.org/wordpress/2017/10/02/文本分类语料库(复旦)测试语料/.(in Chinese)
[29] XU J M, XU B, WANG P, et al. Self-taught convolutional neural networks for short text clustering[J]. Neural Networks, 2017, 88:22-31.

Please choose a citation manager

Content to export