作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2021, Vol. 47 ›› Issue (9): 304-312. doi: 10.19678/j.issn.1000-3428.0059118

• 开发研究与工程应用 • 上一篇    下一篇

基于CNN与有限状态自动机的手写体大写金额识别

闫茹, 孙永奇, 朱卫国, 李宇霞   

  1. 北京交通大学 计算机与信息技术学院, 北京 100044
  • 收稿日期:2020-07-31 修回日期:2020-09-11 发布日期:2020-09-17
  • 作者简介:闫茹(1994-),女,硕士研究生,主研方向为图像处理、深度学习;孙永奇(通信作者),教授;朱卫国,博士研究生;李宇霞,硕士研究生。
  • 基金资助:
    国家自然科学基金(61572005,61672086,61272004)。

Recognition of Handwritten Capitalized Chinese Currency Amounts Based on CNN and Finite State Automata

YAN Ru, SUN Yongqi, ZHU Weiguo, LI Yuxia   

  1. School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China
  • Received:2020-07-31 Revised:2020-09-11 Published:2020-09-17

摘要: 手写票据识别是模式识别中的研究难点之一,手写体风格多样、票据背景复杂等原因导致手写票据识别的准确率不高。大写金额作为票据中最重要的部分,对其进行准确识别是手写票据自动识别的关键。对基于分割的手写体大写金额识别及处理问题进行研究,提出一种基于卷积神经网络(CNN)与有限状态自动机的手写体大写金额识别方法。在利用过分割和组合过分割项得到单字符后使用CNN对其进行识别。通过对字符进行分类、定义各类字符之间的逻辑关系构造用于语法检查的有限状态自动机,通过语法自动机在识别结果中选择符合语法规则的字符串,并在路径搜索中利用语法自动机优化搜索性能。在此基础上,运用语法自动机对模糊字符进行预测,以纠正CNN的识别错误。实验结果表明,该方法在对大写金额单字符和文本行进行识别时准确率分别高达98.2%与96.6%。

关键词: 卷积神经网络, 有限状态自动机, 手写票据识别, 大写金额, 光学字符识别, 模式识别

Abstract: The handwritten check recognition is a tough problem in pattern recognition.The challenge is that various handwriting styles and complex check backgrounds reduce the recognition accuracy.The capitalized Chinese currency amount is the most important part of a check, and its recognition is key to automatic processing of handwritten check images.This paper presents the study of segmentation-based recognition of handwritten capitalized Chinese currency amounts, and on this basis proposes a recognition method based on Convolutional Neural Network(CNN) and finite state automata.The method employs the over-segmented items and their combinations to obtain single characters, which are subsequently recognized by using CNN.Then the characters are categorized, and the logic relationships between them are defined to construct a finite state automaton for grammar detection.The automaton is used to select the grammatically correct strings from the recognition results, and the grammar automaton is used to optimize the performance of paths search.In addition, the grammar automaton is used to predict the fuzzy characters to correct the errors in the recognition results of CNN.The experimental results show that the accuracy of the proposed method achieves 98.2% for capitalized single characters, and 96.6% for text lines of currency amounts.

Key words: Convolutional Neural Network(CNN), finite state automata, handwritten bank check recognition, capitalized Chinese currency amounts, optical character recognition, pattern recognition

中图分类号: