作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2013, Vol. 39 ›› Issue (1): 71-75. doi: wangdong@fudan.edu.cn

• 软件技术与数据库 • 上一篇    下一篇

基于多角度关联模型的实体检索方法

王 东,牛军钰   

  1. (复旦大学计算机科学技术学院,上海 201203)
  • 收稿日期:2012-03-27 修回日期:2012-05-21 出版日期:2013-01-15 发布日期:2013-01-13
  • 作者简介:王 东(1987-),男,硕士,主研方向:信息检索,数据挖掘;牛军钰,副教授
  • 基金资助:
    国家“863”计划基金资助项目(2009AA01Z429)

Entity Retrieval Method Based on Multi-perspective Association Model

WANG Dong, NIU Jun-yu   

  1. (School of Computer Science, Fudan University, Shanghai 201203, China)
  • Received:2012-03-27 Revised:2012-05-21 Online:2013-01-15 Published:2013-01-13

摘要: 针对信息检索领域特定类型实体的检索问题,在传统搜索引擎的基础上,提出一种基于多角度关联模型的实体检索方法,综合运用实体名识别(NER)、文本向量、关联规则等技术以及Wikipedia、Stanford NER等工具,并在TREC2010实体检索项目中进行评测。实验结果表明,与基于BM25和贝叶斯模型的检索方法相比,该方法的nDCG@R值平均提高11.49%和18.09%。

关键词: 文本挖掘, 关联规则, 实体检索, 实体名识别, 词频-逆文档频率, 维基百科, 搜索引擎

Abstract: This paper proposes an entity search method based on multi-perspective association model for the problem of searching particular type of entities in information retrieval field. The method employs Named Entity Recognition(NER), text vector, association rules, etc, and traditional search engines as well as Wikipedia, Stanford NER etc. Experimental result on the large Web data collection provided show that, compared with BM25 and traditional Bayesian model, this method increases nDCG@R by 11.49% and 18.09% separately.

Key words: text mining, association rule, entity retrieval, Named Entity Recognition(NER), Term Frequency Inverse Document Frequency(TF-IDF), Wikipedia, search engine

中图分类号: