计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

基于互关联后继树的数学表达式检索

刘惠丛1,田冰洁2,田学东1   

  1. (1.河北大学 计算机科学与技术学院,河北 保定 071002; 2.河北金融学院 经济贸易系,河北 保定 071051)
  • 收稿日期:2016-12-06 出版日期:2017-06-15 发布日期:2017-06-15
  • 作者简介:刘惠丛(1989—),女,硕士研究生,主研方向为信息检索;田冰洁,硕士;田学东(通信作者),教授、博士、CCF 会员。
  • 基金项目:
    国家自然科学基金(61375075);河北省高等学校科学技术研究重点项目(ZD2017208)。

Mathematical Expression Retrieval Based on Inter-relevant Successive Tree

LIU Huicong 1,TIAN Bingjie 2,TIAN Xuedong 1   

  1. (1.School of Computer Science and Technology,Hebei University,Baoding,Hebei 071002,China;2.Department of Economic Trade,Hebei Finance University,Baoding,Hebei 071051,China)
  • Received:2016-12-06 Online:2017-06-15 Published:2017-06-15

摘要: 数学表达式结构复杂多样,给检索带来困难。为此,提出一种数学表达式索引与检索方法。在索引阶段,通过对LaTeX数学表达式特点的分析与归纳,定义面向表达式二维结构特性的数学表达式特征表示方式,将互关联后继树索引模型应用于数学表达式索引的构建,以解决树结构表示表达式的层次增长问题。在匹配阶段,设计包括精确匹配、相容匹配、子式匹配、模糊匹配等查询模式的匹配算法。在浏览器/服务器模式下采用51 076条数学表达式进行索引与匹配。实验结果表明,提出的方法可加快查询速度,减小索引存储空间,能够适应数学表达式的结构特点,取得较好的检索效果。

关键词: 数学表达式, 索引, 检索, LaTeX格式, 互关联后继树

Abstract: Aiming at the difficulties in achieving retrieval that result from the diversity of the mathematical expression structure,a method of mathematical expression indexing and retrieval is proposed.Through analysis and induction of LaTeX mathematical expression’s characteristics,a mathematical expression feature representation way is defined for the two-dimensional structure characteristic in the indexing stage.And the inter-relevant successive tree indexing pattern is applied to the construction of the mathematical expression indexing,so as to solve the problem of the hierarchical growth of the tree structure representation.In the matching stage,the matching algorithm of query pattern which includes exact matching,compatible matching,sub-expression matching and fuzzy matching is designed.In the browser/server mode,51 076 mathematical expressions are used in the experiment of indexing and matching.The results show the designed indexing and retrieval method accelerates the query speed and reduces the storage space,which can adapt the structure characteristics of the mathematical expression and achieve better retrieval effect.

Key words: mathematical expression, indexing, retrieval, LaTeX format, inter-relevant successive tree

中图分类号: