摘要: 针对档案领域的短文本分类,设计一种基于概念网络的自动分类方法。通过分析领域内短文本的语言特点构建领域本体,利用自然语言处理技术将短文本转化为资源描述框架表示的结构化概念网络,在此基础上定义概念网络间的语义相似度,从而实现档案的自动分类。实验结果表明,相比传统基于特征选择的短文本分类方法,该方法的分类错误率下降了24.2%,可有效改善系统性能。
关键词:
短文本分类,
概念网络,
文档相似度,
领域本体
Abstract: Aiming at the short-text classification in archive domain, this paper designs an automatic classification method based on concept network. It constructs domain ontology by analyzing the short-text language characteristic in domain, and converts the short-text of title to structural concept network which expresses through Resource Description Framework(RDF) by means of natural language processing technology. On that basis, it defines a similarity measure for archives to classify the retention period of archives. Experimental results show that this method gets a relative 24.2% decrease in classification error rate, and it improves the system performance compared with traditional short-text classification method based on characteristic selection.
Key words:
short-text classification,
concept network,
document similarity,
domain ontology
中图分类号:
林小俊, 张猛, 暴筱, 李军, 吴玺宏. 基于概念网络的短文本分类方法[J]. 计算机工程, 2010, 36(21): 4-6.
LIN Xiao-Dun, ZHANG Meng, BAO Xiao, LI Jun, TUN Xi-Hong. Short-text Classification Method Based on Concept Network[J]. Computer Engineering, 2010, 36(21): 4-6.