摘要: 通过分析表格的框线特征与结构特征,提出一种基于投影特征与结构特征的表格文本图像识别算法。该方法通过投影计算提取表格的框线特征,通过击中或击不中变换提取表格的结构特征,根据所提特征重要性的不同,设定分类判决阈值。实验结果表明,该方法能准确高效地区分表格文本图像与非表格文本图像,具有很强的实用性。
关键词:
投影特征,
结构特征,
表格识别
Abstract: According to the research of projection and structure features, a form document image recognition algorithm is proposed based on the projection and structure features of forms. The line feature of forms is obtained by projection, and the structure features are extracted by Hit/Miss transform. Due to the differences between the two features, it sets the threshold to distinguish form document images. Experimental results show that the proposed algorithm can distinguish between form and non-form document images accurately and efficiently.
Key words:
projection feature,
structure feature,
form recognition
中图分类号:
王绪, 平西建, 周林, 王会鹏. 基于投影特征与结构特征的表格图像识别[J]. 计算机工程, 2011, 37(01): 210-212.
WANG Xu, BENG Xi-Jian, ZHOU Lin, WANG Hui-Feng. Form Image Recognition Based on Projection and Structure Features[J]. Computer Engineering, 2011, 37(01): 210-212.