作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2006, Vol. 32 ›› Issue (13): 9-11. doi: 10.3969/j.issn.1000-3428.2006.13.004

• 博士论文 • 上一篇    下一篇

一种新的表格单元格矩形识别算法

陈优广;顾国庆;张 薇;许彦冰   

  1. 华东师范大学信息科学技术学院,上海 200062
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2006-07-05 发布日期:2006-07-05

A New Form Cell Rectangle Recognition Algorithm

CHEN Youguang;GU Guoqing;ZHANG Wei;XU Yanbing   

  1. School of Information Science and Technology, East China Normal University, Shanghai 200062
  • Received:1900-01-01 Revised:1900-01-01 Online:2006-07-05 Published:2006-07-05

摘要: 现有的表格识别算法速度较慢,且仅能容忍表格线的微小断线,文章给出了基于顶点链编码的表格单元格矩形识别算法,利用边界标定自动机,标定表格单元格内环边界并生成顶点链编码,利用顶点链编码特性,有效地去除表格框线上的锯齿,修复断裂的框线,通过搜索单元格矩形4个角的顶点链编码来获得表格单元格的矩形区域。实验证明本算法具有速度快、鲁棒性高、抗表格框线断裂等优点。

关键词: 顶点链编码, 表格识别, 边界标定自动机

Abstract:

The form recognition algorithms in existence are inefficient, and only can abide tiny broken lines. This paper presents an algorithm based on vertex chain code for form cell rectangle recognition, the algorithm uses region-labeling robot to label the inner border of a form cell to get its vertex chain code, using the characters of the vertex chain code, the algorithm can remove the sawteeth on the form frame line efficiently and restore the form frame lines and get the region of the form cell by searching the vertex chain code of the four angles of the cell. Experiments prove that the algorithm has the advantages of high speed, high robustness and being able to resist broken form frame lines.

Key words: Vertex chain code, Form recognition, Region-labeling robot

中图分类号: