作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2007, Vol. 33 ›› Issue (04): 206-208. doi: 10.3969/j.issn.1000-3428.2007.04.072

• 人工智能及识别技术 • 上一篇    下一篇

基于距离加权的自适应字线分离算法

李艳霞1,2,孙羽菲1,2,张玉志1   

  1. (1. 中国科学院计算技术研究所,北京 100080;2. 中国科学院研究生院,北京 100080)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-02-20 发布日期:2007-02-20

Adaptive Distance-weighted Character and Form Line Separating Algorithm

LI Yanxia1,2, SUN Yufei1,2, ZHANG Yuzhi1   

  1. (1. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080; 2. Graduate School, Chinese Academy of Sciences, Beijing 100080)
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-02-20 Published:2007-02-20

摘要: 提出一种基于距离加权的自适应字线分离算法。应用一定的启发式规则,计算表格线上像素点的权值,将权值与阈值相比较来判断该点是否为字符上的点,其中权值和阈值根据具体表格自动确定。该算法与表格线检测方法无关,且易于实现。实验结果表明,可以很好地处理字线交叠问题,提高了表格识别的正确率。

关键词: 文档分析和识别, 表格识别, 字线分离, OCR

Abstract: A new adaptive separating algorithm based on distance-weighted is proposed in this paper. Applying some heuristic rules, it counts the weights of the pixels on form line, then compares each weight with the threshold to judge whether the pixel belongs to character. The weights and the threshold are obtained automatically according to the processing form. The algorithm is independent of the form line detecting methods, and easier to develop. Experiments show that this method can do well with the overlaps easily with high quality, which can improve the accuracy of form recognition.

Key words: Document analysis and recognition, Form recognition, Separation of character and line, Optical character recognition