作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2026, Vol. 52 ›› Issue (6): 278-287. doi: 10.19678/j.issn.1000-3428.0070436

• 多模态与信息融合 • 上一篇    下一篇

基于多模态可见光和红外图像融合的船舶检测方法

于梦源, 刘向阳*()   

  1. 河海大学数学学院, 江苏 南京 211100
  • 收稿日期:2024-10-08 修回日期:2025-01-06 出版日期:2026-06-15 发布日期:2025-01-13
  • 通讯作者: 刘向阳
  • 作者简介:

    于梦源,女,硕士,主研方向为目标检测、图像处理

    刘向阳(通信作者),副教授、博士

  • 基金资助:
    云南省重大科技专项(202002AE090010)

Ship Detection Method Based on Multimodal Visible and Infrared Image Fusion

YU Mengyuan, LIU Xiangyang*()   

  1. School of Mathematics, Hohai University, Nanjing 211100, Jiangsu, China
  • Received:2024-10-08 Revised:2025-01-06 Online:2026-06-15 Published:2025-01-13
  • Contact: LIU Xiangyang

摘要:

单一模态图像在全天候的船舶检测中易受光照、天气等环境影响, 导致船舶检测精度低、漏检率高。为此, 提出了一种融合可见光与红外图像信息的船舶检测方法VIF-RTDETR。该方法根据可见光图像丰富的细节和颜色信息以及红外图像在低光照环境下的稳定表现, 构建了四通道输入模型; 设计可见光与红外图像信息的融合模块VIF, 实现了不同模态信息的互补融合, 使得在检测网络中更加合理利用两种模态的信息; 在主干Backbone特征提取网络中结合通道注意力, 为通道动态分配不同的权重, 以增强通道的特征表达能力来进一步优化特征提取能力。此外, 为进一步提升船舶检测中船舶小目标的检测性能, 设计了一种加权的边界框损失函数, 使模型能够有效地关注不同尺寸目标的特征表达, 提高模型在不同目标尺寸下的检测精度。实验结果表明, 在船舶可见光和红外数据集上, 该模型的检测精度AP0.5∶0.95、AP0.5分别达到了78.3%、98.5%, 相对于单一模态的可见光和红外模型的AP0.5∶0.95分别提升了4.7、9.2百分点; 召回率AR0.5∶0.95达到了85.2%, 相对于单一模态模型分别提升了3.1、7.3百分点, 显著提高船舶的检测精度且降低漏检情况。

关键词: 船舶检测, 可见光和红外图像, 全天候检测, 双模态融合, 注意力机制

Abstract:

Single-modal images are easily affected by light, weather, and other environmental conditions in all-weather ship detection. This leads to a low ship detection accuracy and high leakage rate. To address these issues, this paper proposes a ship detection method, VIF-RTDETR, which fuses visible light and infrared image information. The method fully utilizes the rich details and color information of visible images and the stable performance of infrared images in low-light environments, and constructs a four-channel input model. The complementary fusion of varied modal information is realized by designing the fusion module VIF such that it makes more reasonable use of the information from the two modalities (visible light and infrared) in the detection network. The channel attention in the backbone feature extraction network is combined to further optimize the feature extraction capability by dynamically assigning different weights to the channels, thereby enhancing the feature expression capability of the channels. To further enhance the detection performance of small targets in ship detection, a weighted bounding box loss function is designed so that the model can effectively focus on the feature expression of targets of different sizes and improve the detection accuracy under different target sizes. The experimental results show that in the visible and infrared datasets for the ships, the detection precision AP0.5∶0.95, AP0.5 of the model reaches 78.3% and 98.5%, respectively, reflecting improvements by 4.7 and 9.2 percentage points relative to AP0.5∶0.95 the single-modal visible and infrared models. Further, the recall rate AR0.5∶0.95 reaches 85.2%, reflecting improvements by 3.1 and 7.3 percentage points relative to the single-modal visible and infrared models, respectively. Thus, the findings contribute to significantly improving the precision of ship detection and reducing the leakage rate.

Key words: ship detection, visible and infrared images, all-weather detection, multimodal fusion, attention mechanism