作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (11): 80-88. doi: 10.19678/j.issn.1000-3428.0068394

• 人工智能与模式识别 • 上一篇    下一篇

基于多粒度特征增强网络的交通文本检测方法

朱彦斌, 王润民*(), 陈华, 曹小菲, 朱祯琳, 丁亚军   

  1. 湖南师范大学信息科学与工程学院, 湖南 长沙 410081
  • 收稿日期:2023-09-17 出版日期:2024-11-15 发布日期:2024-04-01
  • 通讯作者: 王润民
  • 基金资助:
    湖南省自然科学基金面上项目(2020JJ4057); 湖南省教育厅重点基金(21A0052); 长沙市重点研发计划(kq2004050)

Traffic Text Detection Method Based on Multi-Granularity Feature Enhancement Network

ZHU Yanbin, WANG Runmin*(), CHEN Hua, CAO Xiaofei, ZHU Zhenlin, DING Yajun   

  1. School of Information Science and Engineering, Hunan Normal University, Changsha 410081, Hunan, China
  • Received:2023-09-17 Online:2024-11-15 Published:2024-04-01
  • Contact: WANG Runmin

摘要:

深度学习极大地推动了自然场景文本检测和识别领域的发展, 然而, 对行车环境中的交通文本检测研究相对匮乏。为此, 提出一种新颖的端到端文本检测框架, 实现对车载摄像头捕获到的交通文本检测。设计多粒度文本特征增强模块(MTFEM), 通过无缝集成交通文本的粗粒度特征和细粒度特征, 进行全面理解和分析, 以提高对交通文本的特征表达能力。此外, 为了优化网络学习, 保持模型训练的稳定性, 避免像素预测误差所导致梯度急剧变化, 设计一种新颖的联合损失函数。实验结果表明, 该方法在交通文本数据集CTST-1600和TPD上的F1值分别达到了93.7%和94.1%, 与主流方法相比具有更高的检测结果。为了进一步验证所提方法的适应性, 在多方向自然场景文本数据集ICDAR 2015和多语言文本数据集MSRA-TD500上的F1值分别取得了87.7%和87.0%, 具有较强的鲁棒性。

关键词: 卷积神经网络, 文本检测, 智慧交通, 多粒度特征增强, 联合损失函数

Abstract:

Deep learning has significantly advanced the field of natural scene text detection and recognition. However, research on traffic text detection in driving environments remains relatively lacking. This study proposes a novel end-to-end text detection framework to effectively detect traffic text captured by in-vehicle cameras. First, a Multi-granularity Text Feature Enhancement Module(MTFEM) is designed to improve the feature representation of traffic text by seamlessly integrating coarse- and fine-grained features of traffic text. In addition, a novel joint loss function is designed to optimize network learning, maintain the stability of model training, and avoid problems such as sharp gradient changes caused by pixel prediction errors. The experimental results indicate that this method achieves F1 values of 93.7% and 94.1% on the CTST-1600 and TPD traffic text datasets, respectively, and has higher detection results compared with mainstream methods. To further validate the adaptability of the proposed method, F1 values of 87.7% and 87.0% are achieved on the multi-directional natural scene text dataset ICDAR 2015 and multilingual text dataset MSRA-TD500, respectively, demonstrating strong robustness.

Key words: convolutional neural network, text detection, intelligent traffic, multi-granularity feature enhancement, joint loss function