作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 先进计算与数据处理 • 上一篇    下一篇

基于扩展查询表达式的XML 关键字查询

朱菁华,王晓玲   

  1. (复旦大学计算机科学技术学院,上海200433)
  • 收稿日期:2013-11-05 出版日期:2014-10-15 发布日期:2014-10-13
  • 作者简介:朱菁华(1985 - ),男,硕士研究生,主研方向:XML 信息检索,数据库技术;王晓玲,教授、博士。
  • 基金资助:
    国家自然科学基金资助项目(60773075)。

XML Keyword Search Based on Extended Query Expression

ZHU Jing-hua,WANG Xiao-ling   

  1. (School of Computer Science,Fudan University,Shanghai 200433,China)
  • Received:2013-11-05 Online:2014-10-15 Published:2014-10-13

摘要: 目前可扩展标示语言(XML)关键字查询大多是基于最小公共祖先(LCA)语义子树产生查询结果,而未能加入除LCA 语义子树之外与用户查询意图相关的结果。为解决该问题,提出一种基于扩展查询表达式的XML 关键字查询方法。将用户查询日志作为查询扩展统计模型,对其进行统计分析,并结合最佳检索概念判断是否需要扩展查询表达式。使用XML TF-IDF 方法计算候选属性的权重,根据初检结果的上下文信息,利用聚类方法获得 与查询意图最相关的扩展查询关键字,从而扩展查询表达式。实验结果表明,与XSeek 和基于语义词典的查询扩展方法相比,该方法的平均F 度量值分别提高了7% 和17% ,具有较高的查询质量。

关键词: 信息检索, 可扩展标示语言, 最小公共祖先语义, 关键字查询, 查询扩展, 上下文信息

Abstract: Most existing eXtensible Markup Language ( XML ) keyword searches are based on Lowest Common Ancestor(LCA) semantics tree to generate search result,but they do not consider the data which is not included in LCA semantics tree while is relevant with user search intention. To solve this problem,an XML keyword query method based on extended query expression is proposed. The query expansion statistical model is based on user query log. Through analyzing query log and combined with optimal retrieval concept,it can judge whether the query expression should be expanded. After that,an XML TF-IDF method is employed to calculate the weight of candidate attribute. According to the context information and using cluster method,it gets the query expression keywords which are most relevant with search intention. Then the expanded query expression is generated. Compared with XSeek and semantics dictionary based query expression method,experimental result shows this method can improve the query quality by average 7% and 17% in Fmeasure respectively.

Key words: information retrieval, eXtensive Markup Language(XML), Lowest Common Ancestor(LCA) semantic, keyword search, query expansion, context information

中图分类号: