Abstract:
A new adaptive separating algorithm based on distance-weighted is proposed in this paper. Applying some heuristic rules, it counts the weights of the pixels on form line, then compares each weight with the threshold to judge whether the pixel belongs to character. The weights and the threshold are obtained automatically according to the processing form. The algorithm is independent of the form line detecting methods, and easier to develop. Experiments show that this method can do well with the overlaps easily with high quality, which can improve the accuracy of form recognition.
Key words:
Document analysis and recognition,
Form recognition,
Separation of character and line,
Optical character recognition
摘要: 提出一种基于距离加权的自适应字线分离算法。应用一定的启发式规则,计算表格线上像素点的权值,将权值与阈值相比较来判断该点是否为字符上的点,其中权值和阈值根据具体表格自动确定。该算法与表格线检测方法无关,且易于实现。实验结果表明,可以很好地处理字线交叠问题,提高了表格识别的正确率。
关键词:
文档分析和识别,
表格识别,
字线分离,
OCR
LI Yanxia; SUN Yufei; ZHANG Yuzhi. Adaptive Distance-weighted Character and Form Line Separating Algorithm[J]. Computer Engineering, 2007, 33(04): 206-208.
李艳霞;孙羽菲;张玉志. 基于距离加权的自适应字线分离算法[J]. 计算机工程, 2007, 33(04): 206-208.