计算机工程 ›› 2019, Vol. 45 ›› Issue (6): 242-248.doi: 10.19678/j.issn.1000-3428.0050777

• 人工智能及识别技术 • 上一篇    下一篇

基于流形正则化极限学习机的文本分类算法研究

庞皓明,冀俊忠,刘金铎,姚垚   

  1. 北京工业大学 多媒体与智能软件技术北京市重点实验室,北京 100124
  • 收稿日期:2018-03-14 出版日期:2019-06-15 发布日期:2019-06-15
  • 作者简介:庞皓明(1993—),男,硕士研究生,主研方向为机器学习、自然语言处理;冀俊忠(通信作者),教授、博士生导师;刘金铎、姚垚,博士研究生。
  • 基金项目:
    国家自然科学基金(61375059,61672065)。

Research on text classification algorithm based on manifold regularization extreme learning machine

PANG Haoming,JI Junzhong,LIU Jinduo,YAO Yao   

  1. Beijing Key Laboratory of Multimedia and Intelligent Software Technology,Beijing University of Technology,Beijing 100124,China
  • Received:2018-03-14 Online:2019-06-15 Published:2019-06-15

摘要: 基于极限学习机的文本分类方法在对输入的文本特征进行随机映射时,会呈现一种非线性的几何结构,利用最小二乘法无法对其进行求解,影响文本的分类性能。为此,引入一种新的流形正则化思想,提出基于极限学习机的改进算法。利用拉普拉斯特征映射保持输入文本特征的几何结构。基于样本的类别信息对样本点之间的距离进行修正,优先选择类别相同的样本点,以改善分类性能。在Reuters和20newsgroup数据集上的实验结果表明,与正则化极限学习机算法、AdaBELM算法等相比,该算法分类性能较好,F1-measure值可达91.42%。

关键词: 文本分类, 监督学习, 正则化极限学习机, 流形正则化, 特征映射

Abstract: In the text classification process,the Extreme Learning Machine(ELM) randomly maps the input text features and presents a nonlinear geometric structure.As a result,the least square method cannot solve such nonlinear structures and thus affects the text classification performance.To solve this problem,this paper introduces a new manifold regularization and presents an improved algorithm based on extreme machine learning.The Laplace feature mapping is used to preserve the geometry of input text features.The distance between sample points is modified based on the category information of the sample,and the sample points with the same category are selected first to improve the classification performance.Experimental results on the datasets of Reuters and 20newsgroup show that,compared with the Regularization Extreme Learning Machine(RELM),AdaBELM and other algorithms,the proposed algorithm has better classification performance,and the F1-measure can reach 91.42%.

Key words: text classification, supervise learning, Regularization Extreme Learning Machine (RELM), manifold regularization, feature mapping

中图分类号: