作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2006, Vol. 32 ›› Issue (19): 211-213. doi: 10.3969/j.issn.1000-3428.2006.19.077

• 人工智能及识别技术 • 上一篇    下一篇

基于统计特征的数学公式抽取方法的研究

田学东,张立平,杨 捧   

  1. (河北大学数学与计算机学院,保定 071002)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2006-10-05 发布日期:2006-10-05

Research on Mathematical Formulas Extraction from Chinese Document Based on Statistical Features

TIAN Xuedong, ZHANG Liping, YANG Peng   

  1. (College of Mathematics and Computer, Hebei University, Baoding 071002)
  • Received:1900-01-01 Revised:1900-01-01 Online:2006-10-05 Published:2006-10-05

摘要: 在分析公式特征的基础上,提出了一种将Parezen窗和Bayes分类规则相结合的公式抽取方法。对于孤立式公式采用改进后的Parzen窗方法将其从文档中抽取出来,对于内嵌公式通过Bayes分类规则将其从文本行中抽取出来。实验表明,这种抽取方法对中文文档具有较好的适应性和较高的成功率。

关键词: OCR技术, 数学公式抽取, Bayes法则

Abstract: Based on the analysis of formula features, an approach composed of Parzen windows and Bayes theorem is proposed to extract mathematical formulas. Improved Parzen windows approach is used to extract the isolated formulas from the printed documents and Bayes theorem is used to extract the embedded formulas from the text lines. The experiments show that the combination of the two methods can obtain satisfactory results.

Key words: OCR technique, Mathematical formulas extraction, Bayes theorem