作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 先进计算与数据处理 • 上一篇    下一篇

实体信息集成检索的深网数据源选择

邓松   

  1. (江西财经大学 软件与通信工程学院,南昌 330013)
  • 收稿日期:2015-10-12 出版日期:2016-10-15 发布日期:2016-10-15
  • 作者简介:邓松(1982—),男,讲师、博士,主研方向为Web数据管理、数据挖掘。
  • 基金资助:
    国家自然科学基金资助项目(61462037,61563016);江西省自然科学基金资助项目(20142BAB217014,20142BAB207009);江西省研究生创新基金资助项目(YC2012-B021)。

Deep Web Data Source Selection for Entity Information Integrated Retrieval

DENG Song   

  1. (School of Software and Communication Engineering,Jiangxi University of Finance and Economics,Nanchang 330013,China)
  • Received:2015-10-12 Online:2016-10-15 Published:2016-10-15

摘要: 在深网集成检索中,用户通常希望仅向少量数据源提交查询即可获得高质量的检索结果,因而数据源选择成为关键问题。为提升实体信息集成检索的效率,提出一种考虑相关性和重复度的数据源选择方法。给出基于主题与情感词的深网数据源摘要构建方法,利用用户反馈识别实体信息的主题类别,根据情感词度量数据源内容之间的重复性,并结合主题相关性和内容重复度设计相应的深网数据源计分策略。实验结果表明,该方法可以基于小数据摘要获得较高的准确率与召回率,为实体信息集成检索提供有效支撑。

关键词: 数据源选择, 深网, 实体, 信息集成, 用户反馈

Abstract: People usually want to submit queries to only a few data sources to obtain high quality search results,so data source selection becomes a key issue in Deep Web integrated retrieval.To enhance the efficiency of entity data integrated retrieval,this paper designs a data source selection method based on relevance and repeatability.Firstly,it proposes a summary construction method based on subject and emotional words.The above method identifies subject category of entity information based on user feedback and calculates the data repeatability between two Deep Webs based on emotional words.Then,it proposes a Deep Web data source scoring strategy based on query subject relevance and repetition of content.Experimental result shows that the proposed method has higher accuracy and recall,although using a small data summary.It can provide an effective support to entity information integrated retrieval.

Key words: data source selection, Deep Web, entity, information integration, user feedback

中图分类号: