作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (10): 171-177. doi: 10.19678/j.issn.1000-3428.0065874

• 图形图像处理 • 上一篇    下一篇

自然场景盲文图像数据集及盲文段检测方法

卢利琼1,2, 陈长江1, 吴东1,2, 熊建芳1   

  1. 1. 岭南师范学院 计算机与智能教育学院, 广东 湛江 524048
    2. 广东省特殊儿童发展与教育重点实验室, 广东 湛江 524048
  • 收稿日期:2022-09-28 出版日期:2023-10-15 发布日期:2023-10-10
  • 作者简介:

    卢利琼(1980—),女,讲师、博士,主研方向为深度学习、文本识别

    陈长江,学士

    吴东,副教授、硕士

    熊建芳,讲师、硕士

  • 基金资助:
    广东省教育厅特色创新项目(2021KTSCX065); 广东省特殊儿童发展与教育重点实验室项目(TJ202011); 广东省湛江市科技发展专项资金竞争性分配项目(2022A01005)

Natural Scene Braille Image Dataset and Braille Segment Detection Method

Liqiong LU1,2, Changjiang CHEN1, Dong WU1,2, Jianfang XIONG1   

  1. 1. School of Computer Science and Intelligence Education, Lingnan Normal University, Zhanjiang 524048, Guangdong, China
    2. Guangdong Provincial Key Laboratory of Development and Education for Special Needs Children, Zhanjiang 524048, Guangdong, China
  • Received:2022-09-28 Online:2023-10-15 Published:2023-10-10

摘要:

盲文检测是利用人工智能技术自动检测出图像中的盲文位置,是盲文书籍电子化、盲文自动阅卷以及加强正常人与盲人无障碍交流的关键技术。然而现有盲文检测研究领域缺乏自然场景盲文图像数据集和盲文段检测方法。为此,利用手机拍摄、网络下载等手段构建包含554幅图像的自然场景盲文图像数据集,并对数据集中每一幅图像的盲文段位置进行手动标记。从亮度、对比度和柔和度变化的角度设计图像增强策略来扩充自然场景盲文图像数据集,以辅助卷积神经网络(CNN)训练。在此基础上,分析自然场景盲文段在书写形式和结构上的特点,基于Faster R-CNN算法的思想,提出一种自然场景盲文段检测方法。以ResNet50作为主干网络,通过设计多尺寸CNN特征融合策略挖掘不同尺寸盲文段的特征,设计从32到512的多种锚框参数,以适应自然场景图像中盲文段高度变化小而宽度变化大以及存在较多小尺寸盲文段的特点。实验结果表明,与经典目标检测算法Faster R-CNN和SSD相比,该方法Hmean值分别从0.793 5和0.800 1提升至0.887 9,检测性能提升明显。

关键词: 自然场景图像, 盲文段检测, 卷积神经网络, Faster R-CNN算法, SSD算法

Abstract:

Braille detection uses artificial intelligence technology to automatically detect the position of Braille in an image. Braille detection is a key technology for Braille book digitization, automatic marking of Braille test papers, and enhancing Braille-free communication between blind and non-blind people. The existing research on Braille detection lacks natural scene Braille image datasets and Braille segment detection methods. To this end, this study constructs a natural scene Braille image dataset containing 554 images using mobile phone photography, network download, and other means, and the Braille segment position of each image in the dataset is manually marked. At the same time, to better train the Convolutional Neural Network(CNN), image enhancement strategies are designed from the perspective of brightness, contrast, and softness changes to expand the natural scene Braille image dataset. Then, the characteristics of natural scene Braille segments in writing form and structure are analyzed, and based on the idea of the Faster R-CNN algorithm, a natural scene Braille segment detection method is proposed. This method used ResNet50 as the backbone, and a fusion strategy of multi-scale CNN features is designed to mine the features of different-sized Braille segments. A variety of anchor parameters from 32 to 512 are designed to adapt to the characteristics, including small changes in height and large changes in the width of Braille segments in natural scene images where small-sized Braille segments existed. Experimental results show that the proposed Braille segment detection method achieves an Hmean of 0.887 9. Compared with Faster R-CNN and SSD, for which the Hmean is 0.793 5 and 0.800 1, respectively, the detection performance is significantly improved.

Key words: natural scene image, Braille segment detection, Convolutional Neural Network(CNN), Faster R-CNN algorithm, SSD algorithm