计算机工程 ›› 2012, Vol. 38 ›› Issue (15): 62-65.doi: 10.3969/j.issn.1000-3428.2012.15.018

• 软件技术与数据库 • 上一篇    下一篇

基于领域本体的文本分类方法

韦婷婷1a,聂登国1b,王 驹1a,蒋运承2   

  1. (1. 广西师范大学 a. 计算机科学与信息工程学院;b. 数学科学学院,广西 桂林 541004; 2. 华南师范大学计算机学院,广州 510631)
  • 收稿日期:2011-11-07 出版日期:2012-08-05 发布日期:2012-08-05
  • 作者简介:韦婷婷(1986-),女,硕士研究生,主研方向:语义Web,文本分类;聂登国,硕士研究生;王 驹,研究员、博士;蒋运承,教授、博士
  • 基金项目:
    广西自然科学基金资助项目(桂科自0991100);广东省自然科学基金资助项目(10151063101000031);广西研究生教育创新计划基 金资助项目(2011106020812M60);贵州省科学技术基金资助项目“描述逻辑系统的非标准推理研究”(黔科合J字[2009]2068);贵州省科技厅基金资助项目(黔教科合J字[2012]2310号)

Texts Classification Method Based on Domain Ontology

WEI Ting-ting 1a, NIE Deng-guo 1b, WANG Ju 1a, JIANG Yun-cheng 2   

  1. (1a. College of Computer Science and Information Engineering; 1b. College of Mathematical Science, Guangxi Normal University, Guilin 541004, 2. School of Computer Science, South China Normal University, Guangzhou 510631, China)
  • Received:2011-11-07 Online:2012-08-05 Published:2012-08-05

摘要: 基于本体的文本分类方法未考虑本体概念自身所含有的信息量及忽略本体推理功能。为此,以旅游领域为背景,提出一种基于领域本体的文本分类方法。该方法采用本体自身结构作为分类标准,通过计算特征项和本体概念间的语义关联度及结合本体的推理功能,将文本划分到合适的本体概念下作为概念的实例。实验结果证明,与传统方法相比,该方法的分类方法F1值至少提高8.7%。

关键词: 领域本体, 文本分类, 本体概念, 信息量, 推理, 语义关联

Abstract: There are two shortcomings in the traditional Ontology-based Texts Classification(OTC) research, one is the Information Content(IC) that the concepts are not taken into account, the other is that they usually ignore the importance of the ontology reasoning abilities. This paper presents a tourism domain ontology based approach for texts classification, which solves the two problems mentioned above. This approach takes the structure of ontology as classification standard, which is realized by combining the semantic correlation degree of concepts and terms and the ontology reasoning abilities. The text is classified to the ontology concepts as the individuals. Experimental results show that this approach has at least 8.7% improvement over the traditional classification method on the measure of F1.

Key words: domain ontology, text classification, ontology concept, Information Content(IC), reasoning, semantic correlation

中图分类号: