作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (6): 296-303. doi: 10.19678/j.issn.1000-3428.0067995

• 图形图像处理 • 上一篇    下一篇

基于改进SwiftNet的堆场图像实时分割网络

陈晓玉1, 沈晨1, 沈阅2, 孔德明3   

  1. 1. 燕山大学信息科学与工程学院, 河北 秦皇岛 066004;
    2. 河北燕大燕软信息系统有限公司, 河北 秦皇岛 066000;
    3. 燕山大学电气工程学院, 河北 秦皇岛 066004
  • 收稿日期:2023-07-04 修回日期:2023-09-04 发布日期:2023-10-12
  • 通讯作者: 孔德明,E-mail:demingkong@ysu.edu.cn E-mail:demingkong@ysu.edu.cn
  • 基金资助:
    国家自然科学基金(62173289);航空科学基金(20200016099002)。

Real-Time Segmentation Network of Yard Images Based on Improved SwiftNet

CHEN Xiaoyu1, SHEN Chen1, SHEN Yue2, KONG Deming3   

  1. 1. School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, Hebei, China;
    2. Hebei Yandayanruan Information System Technology Company, Qinhuangdao 066000, Hebei, China;
    3. School of Electrical Engineering, Yanshan University, Qinhuangdao 066004, Hebei, China
  • Received:2023-07-04 Revised:2023-09-04 Published:2023-10-12

摘要: 在堆场环境下,实时图像语义分割可以提供直观的场景类别信息。为节约工控机等边缘设备的硬件资源以及为多源信息融合提供图像语义类别信息,提出一种轻量化的实时语义分割网络模型。首先提出基于空间注意力引导的上采样融合模块,通过引入空间注意力和残差注意力结构设计一种轻量化的解码器,在上采样过程中还原空间细节,抑制冗余信息,进而融合不同来源的特征图;其次提出一种轻量化的级联空洞空间金字塔模块,利用级联的空洞卷积单元增大网络感受野,有效提取多尺度特征;最后使用通道分离、通道混洗、通道池化等操作,降低多尺度聚合过程中的计算开销。在公开数据集Camvid上,该模型的平均交并比(MIoU)为70.1%,推理速度为146.3 帧/s,分割精度和推理速度优于ENet、ICNet等模型,消融实验结果也证明了所提各模块的有效性;在实际堆场图像数据集上,该模型的MIoU为93.5%,推理速度为123.8帧/s,证明模型结构具有良好的泛化性能。

关键词: 实时语义分割, 注意力机制, 空洞卷积, 感受野, 堆场图像

Abstract: In a storage yard environment, real-time image semantic segmentation can provide intuitive scene category information. To save the limited hardware resources of edge equipment, such as industrial computers, and provide image semantic category information for multi-source information fusion, this study proposes a lightweight real-time semantic segmentation network model. First, an upsampling fusion module based on spatial attention guidance is proposed. By introducing a spatial attention and residual attention structure, a lightweight decoder is designed to restore spatial details in the upsampling restoration process, suppress redundant information, and fuse feature maps from different sources. Second, a lightweight cascaded atrous space pyramid module is proposed, which uses cascaded atrous convolution elements to enhance the network receptive field and effectively extract multi-scale features. Simultaneously, the calculation cost of multi-scale polymerization is reduced by channel splitting, channel shufflement, and channel pooling. On the publicly available Camvid dataset, the Mean Intersection over Union (MIoU) of the model is 70.1%, inference speed is 146.3 frame/s, and the segmentation accuracy and inference speed are better than those of models such as ENet and ICNet. The ablation experiment results also prove the effectiveness of the proposed modules. In the actual storage yard image dataset, the MIoU of the model is 93.5%, and the inference speed is 123.8 frame/s, proving that the model structure has good generalization performance.

Key words: real-time semantic segmentation, attention mechanism, atrous convolution, receptive field, yard image

中图分类号: