作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (3): 244-252,262. doi: 10.19678/j.issn.1000-3428.0060828

• 图形图像处理 • 上一篇    下一篇

基于学习主动中心轮廓模型的场景文本检测

谢斌红, 秦耀龙, 张英俊   

  1. 太原科技大学 计算机科学与技术学院, 太原 030024
  • 收稿日期:2021-02-07 修回日期:2021-03-25 发布日期:2021-04-15
  • 作者简介:谢斌红(1971-),男,副教授、硕士,主研方向为图像处理、智能化软件工程、机器学习;秦耀龙,硕士研究生;张英俊,教授。
  • 基金资助:
    山西省重点研发计划(重点)高新领域项目(201703D111027);山西省重点研发计划项目(201803D121048,201803D121055)。

Scene Text Detection Based on Learning Active Center Contour Model

XIE Binhong, QIN Yaolong, ZHANG Yingjun   

  1. School of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan 030024, China
  • Received:2021-02-07 Revised:2021-03-25 Published:2021-04-15

摘要: 在场景文本检测领域,存在由于文本尺寸波动较大导致的小文本漏检、大文本欠检测和多尺度文本边界检测错误的情况。针对上述问题,提出一种基于学习主动中心轮廓模型的场景文本检测网络。在残差网络ResNet的基础上构建多尺度特征权重融合模型,对输入的场景文本图片进行多尺度特征提取和权重融合,并计算出最终的特征融合图,适应场景文本长宽比变化较大的情况。在此基础上,将融合后的特征图输入到学习主动中心轮廓模型预测文本框的中心点和边界,该模型为场景文本检测提供丰富先验知识,以解决多尺度文本检测框包含过多背景或部分包围文本造成的边界检测错误问题。在MSRA-TD500、IC13、IC15和IC17MLT数据集上的实验结果表明,该网络能够提高多尺度场景文本检测的准确率,其中在MSRA-TD50数据集上F-measure为0.83,相较于MSR方法提升1%,在IC13数据集上F-measure为0.91,相较于PixelLink网络提升2%,在IC15数据集上F-measure值为0.87,相较于PSENet网络提升1%,在IC17MLT数据集上F-measure值为0.74,相较于TridentNet网络提升1%。

关键词: 场景文本检测, 多尺度特征提取, 权重融合, 主动轮廓模型, 学习主动中心轮廓模型

Abstract: In the field of scene text detection, there are several problems such as missing small text and insufficient precision for large text and multi-scale text boundary detection errors caused by large text size fluctuation.To solve the above problems, a scene text detection network based on a Learning Active Center Contour(LACC) model is proposed. First, the Multi-scale Feature Weight Fusion(MSWF) model is constructed on the basis of a Residual Network(ResNet) to extract multi-scale features and fuse weights of the input scene text images.Then the final feature fusion map is calculated to adapt to the situation where the aspect ratio of the scene text changes significantly.Finally the feature fusion map is then input into the LACC model to predict the center point and boundary of the text box, which provides rich prior knowledge for scene text detection to solve the problem of boundary detection errors caused by multi-scale text detection boxes containing too many backgrounds or partially enclosing text.Experimental results on MSRA-TD500, IC13, IC15 and IC17MLT datasets show that this network can improve the accuracy of text detection in multi-scale scenarios.The F-measure on MSRA-TD50 datasets is 0.83, which is 1% higher than the MSR method.The F-measure on IC13 datasets is 0.91, which is 2% higher than the PixelLink network.The F-measure on IC15 datasets is 0.87, which is 1% higher than PSENet.The F-measure on IC17MLT datasets is 0.74, which is 1% higher than TridentNet.

Key words: scene text detection, multi-scale feature extraction, weight fusion, active contour model, Learning Active Center Contour(LACC) model

中图分类号: