作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2010, Vol. 36 ›› Issue (8): 60-63. doi: 10.3969/j.issn.1000-3428.2010.08.021

• 软件技术与数据库 • 上一篇    下一篇

基于世界知识的深网数据源增强分类模型

黄 黎1,2,赵朋朋1,方 巍1,崔志明1,孙振强3   

  1. (1. 苏州大学智能信息处理及应用研究所,苏州 215006;2. 江苏广播电视大学,南京 210017;3. 南大苏富特科技有限公司,苏州 215006)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2010-04-20 发布日期:2010-04-20

Enhanced Deep Web Data Sources Classification Model Based on World Knowledge

HUANG Li1,2, ZHAO Peng-peng1, FANG Wei1, CUI Zhi-ming1, SUN Zhen-qiang3   

  1. (1. Institute of Intelligent Information Processing and Application, Soochow University, Suzhou 215006;2. Jiangsu Radio and TV University, Nanjing 210017; 3. Nandasoft Company Ltd., Suzhou 215006)
  • Received:1900-01-01 Revised:1900-01-01 Online:2010-04-20 Published:2010-04-20

摘要: 针对传统词袋方法在深网(Deep Web)数据源分类应用中的局限性,提出一种基于世界知识的Deep Web数据源增强分类模型,通过对外部知识库的主题分析,建立特征映射,构造基于领域概念的辅助分类器,丰富Deep Web查询表单的特征集合。基于Wikipedia百科知识库对真实Web数据进行分类。实验结果证明该模型有效。

关键词: 深网, 数据源分类, 主题分析, 特征映射, 世界知识

Abstract: Bag of words method used in Deep Web sources classification shows many limitations. This paper proposes a novel Deep Web sources enhancing classification model based on world knowledge. It sets up the feature mappings by topic analysis of external knowledge, constructs an auxiliary classifier based on domain concepts, and enriches feature set of Deep Web forms. Experiment is performed based on Wikipedia encyclopedia, and experimental results verify this method is effective and scalable.

Key words: Deep Web, data sources classification, topic analysis, feature mapping, world knowledge

中图分类号: