作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2010, Vol. 36 ›› Issue (21): 4-6. doi: 10.3969/j.issn.1000-3428.2010.21.002

• 博士论文 • 上一篇    下一篇

基于概念网络的短文本分类方法

林小俊1,张 猛1,暴 筱1,李 军2,吴玺宏1   

  1. (1. 北京大学机器感知与智能教育部重点实验室,北京 100871;2. 北京市朝阳区档案局,北京 100020)
  • 出版日期:2010-11-05 发布日期:2010-11-03
  • 作者简介:林小俊(1981-),男,博士研究生,主研方向:自然语言处理,知识表示;张 猛,博士研究生;暴 筱,硕士研究生; 李 军,学士;吴玺宏,教授
  • 基金资助:
    国家自然科学基金资助项目(60535030, 60605016);国家“863”计划基金资助项目(2006AA012196);北京市档案科技基金资助项目(2009-13)

Short-text Classification Method Based on Concept Network

LIN Xiao-jun1, ZHANG Meng1, BAO Xiao1, LI Jun2, WU Xi-hong1   

  1. (1. Key Laboratory of Machine Perception, Ministry of Education, Peking University, Beijing 100871, China; 2. Beijing Chaoyang District Archives Bureau, Beijing 100020, China)
  • Online:2010-11-05 Published:2010-11-03

摘要: 针对档案领域的短文本分类,设计一种基于概念网络的自动分类方法。通过分析领域内短文本的语言特点构建领域本体,利用自然语言处理技术将短文本转化为资源描述框架表示的结构化概念网络,在此基础上定义概念网络间的语义相似度,从而实现档案的自动分类。实验结果表明,相比传统基于特征选择的短文本分类方法,该方法的分类错误率下降了24.2%,可有效改善系统性能。

关键词: 短文本分类, 概念网络, 文档相似度, 领域本体

Abstract: Aiming at the short-text classification in archive domain, this paper designs an automatic classification method based on concept network. It constructs domain ontology by analyzing the short-text language characteristic in domain, and converts the short-text of title to structural concept network which expresses through Resource Description Framework(RDF) by means of natural language processing technology. On that basis, it defines a similarity measure for archives to classify the retention period of archives. Experimental results show that this method gets a relative 24.2% decrease in classification error rate, and it improves the system performance compared with traditional short-text classification method based on characteristic selection.

Key words: short-text classification, concept network, document similarity, domain ontology

中图分类号: