Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2007, Vol. 33 ›› Issue (15): 202-204. doi: 10.3969/j.issn.1000-3428.2007.15.071

• Artificial Intelligence and Recognition Technology • Previous Articles     Next Articles

Document Image Skew Correction Based on Page Layout Foreground and Least Square Method

CHEN Bo1, WANG Jia-jun1, WU Chen2   

  1. (1. School of Electronics and Information, Soochow University, Suzhou 215021; 2. School of Electronics and Information, Jiangsu University of Science and Technology, Zhenjiang 212003)
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-08-05 Published:2007-08-05

基于页面前景和最小二乘法的倾斜校正

陈 波1,王加俊1,吴 陈2   

  1. (1. 苏州大学电子信息学院,苏州 215021;2. 江苏科技大学电子信息学院,镇江 212003)

Abstract: For the complexity of document images, this paper proposes a method based on page’s layout foreground and least square method. In this method, foreground pixels are described by special patterns. Halftones, graphics and forms are excluded from the document images by pattern classification. The biggest pattern structure is obtained after merging the rest character pattern. The skew angle is counted by using the least square method according to the points, which is obtained by searching the biggest pure text pattern structure. Experimental result shows the fastness and effectiveness of the proposed algorithm. A most prominent superiority of this method is that patterns obtained in the process of skew angle detection can be used for further layout analysis.

Key words: skew correction, pattern structure, characteristic dots on baseline, page layout analysis

摘要:

鉴于页面版面复杂,提出了一种基于页面前景和最小二乘法的倾斜校正方法。该方法用特定的模式描述页面前景像素,利用模式粗分类分离页面中可能有的图像、图形和表格,通过合并余下的模式得到最大的文字模式结构体,依据该结构体所含基线特征点用最小二乘法拟合出基线方向即页面倾斜方向。实验表明该方法是有效的,速度快,它得到的模式结构体可以继续用来做版面分析。

关键词: 倾斜校正, 模式结构体, 基线特征点, 版面分析

CLC Number: