作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2011, Vol. 37 ›› Issue (13): 43-45. doi: 10.3969/j.issn.1000-3428.2011.13.012

• 软件技术与数据库 • 上一篇    下一篇

一种用于结构化文档检索的贝叶斯网络

徐建民1,2a,陈振亚2b   

  1. (1. 天津大学系统工程研究所,天津 300072;2. 河北大学 a. 数学与计算机学院;b. 图书馆,河北 保定 071002)
  • 收稿日期:2010-12-07 出版日期:2011-07-05 发布日期:2011-07-05
  • 作者简介:徐建民(1966-),男,教授、博士后,主研方向:信息检索;陈振亚,馆员
  • 基金资助:
    中国博士后科学基金资助项目(20070420700)

Bayesian Network for Structured Document Retrieval

XU Jian-min  1,2a, CHEN Zhen-ya  2b   

  1. (1. Institute of Systems Engineering, Tianjin University, Tianjin 300072, China; 2a. College of Mathematics and Computer Science; 2b. Library, Hebei University, Baoding 071002, China)
  • Received:2010-12-07 Online:2011-07-05 Published:2011-07-05

摘要: 分析结构化文档的表示方法及检索特点,对一种用于结构化文档检索的贝叶斯网络进行研究。讨论该贝叶斯网络的构造方法、概率估计及推理过程。用网络节点表示文档索引术语和结构单元,用弧表示术语和结构单元的隶属关系,根据TF-IDF方法估计各节点的先验概率,当给定一个查询时,通过计算每个结构单元的条件概率得到该结构单元的相关值。实例验证了该贝叶斯网络的有效性。

关键词: 贝叶斯网络, 结构化文档, 信息检索, 先验概率估计

Abstract: This paper analyzes the representation method and characteristics of retrieval, studies a Bayesian network for structured document retrieval. It discusses the construct method and probability estimates of Bayesian network, and the retrieval process. Index terms and structural units can be represented by nodes, relationship among nodes can be described by arcs, and prior probability of nodes can be estimated according to TF-IDF. The relevance of each structural unit can be computed and ranked by its conditional probability when a user query is given. Example proves the effectiveness of the Bayesian network.

Key words: Bayesian network, structured document, information retrieval, prior probability estimate

中图分类号: