作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

基于类别模板挖掘的百科相关实体构建

覃华峥 1,胡忠顺 2,阳德青 1,肖仰华 1   

  1. (1.复旦大学 计算机科学技术学院,上海 200433; 2.上海理想信息产业(集团)有限公司,上海 201315)
  • 收稿日期:2015-09-10 出版日期:2016-09-15 发布日期:2016-09-15
  • 作者简介:覃华峥(1991-),男,硕士研究生,主研方向为数据挖掘;胡忠顺,研究员;阳德青,讲师;肖仰华,副教授。
  • 基金资助:

    国家自然科学基金资助项目(61472085);上海科技创新行动计划基础研究基金资助项目(15JC1400900);上海市科技启明星计划基金资助项目(13511504300)。

Encyclopedia Related Entity Construction Based on Category Template Mining

QIN Huazheng  1,HU Zhongshun  2,YANG Deqing  1,XIAO Yanghua  1   

  1. (1.School of Computer Science,Fudan University,Shanghai 200433,China;2.Shanghai Ideal Information Industry(Group) Co.,Ltd.,Shanghai 201315,China)
  • Received:2015-09-10 Online:2016-09-15 Published:2016-09-15

摘要:

针对现有百科数据知识零散,而人工构建相关实体代价过高,难以大规模构造的问题,提出一种基于相关实体类别模板的实体归类与相关度排序算法,用于对零散的百科实体进行自动的归类整理。利用类别相似的实体对应的页面中所引用的实体,挖掘出与查询实体相关的实体类别模板,并把相关实体直接通过其类别映射进模板中,再对模板中的实体进行相关度排序。实验结果表明,与基于聚类的算法相比,该算法能够取得更准确的实体归类整理效果,与先进行相关度排序再归类的方法相比有更低的时间复杂度,可降低人工构建百科相关实体的代价。

关键词: 信息检索, 模板挖掘, 实体相似度, noisy-or模型, 实体相关度

Abstract:

An entiting categorizing and correlation degree ranking algorithm based on related entity category template is proposed to automatically classify the fragmented encyclopedia entities,since the current encyclopedia data knowledge is scattered and related entities are hard to build in large scale by human labor.The proposed algorithm mines the category template of related entities with respect to a query entity using the referenced entities in the page corresponding to the similar category entities,then maps the related entities into the template according to their category respectively,and ranks the entities in the template according to their correlation degree.Experimental results show that the proposed algorithm can achieve better entity categorizing result when compared with clustering methods and lower ranking complexity when compared with the method which sorts the entity correlation degree first.Furthermore,the algorithm significantly reduces the human labor cost in building relevant entities.

Key words: information retrieval, template mining, entity similarity, noisy-or model, entity correlation degree

中图分类号: