基于梯形跨尺度特征耦合网络的SAR图像舰船检测

引用本文

黄帅, 张毅. 基于梯形跨尺度特征耦合网络的SAR图像舰船检测[J]. 计算机工程, 2022, 48(12), 270-280. DOI: 10.19678/j.issn.1000-3428.0063327.

HUANG Shuai, ZHANG Yi. Ship Detection in SAR Images Based on Trapezoidal Cross-scale Feature-coupling Network[J]. Computer Engineering, 2022, 48(12), 270-280. DOI: 10.19678/j.issn.1000-3428.0063327.

基金项目

国家部委基金

作者简介

黄帅（1996—），男，硕士研究生，主研方向为合成孔径雷达目标识别;
张毅，正高级研究员

文章历史

收稿日期：2021-11-23
修回日期：2022-02-12

Contents Abstract Full text Figures/Tables PDF

基于梯形跨尺度特征耦合网络的SAR图像舰船检测

黄帅^1,2 , 张毅¹

1. 中国科学院空天信息研究院, 北京 1000190;
2. 中国科学院大学电子电气与通信工程学院, 北京 100049

收稿日期：2021-11-23；修回日期：2022-02-12

基金项目：国家部委基金

作者简介：黄帅（1996—），男，硕士研究生，主研方向为合成孔径雷达目标识别; 张毅，正高级研究员.

E-mail：huangshuai19@mails.ucas.ac.cn

摘要：在合成孔径雷达(SAR)图像舰船检测中，现有检测方法难以有效提取多尺度语义信息，无法准确地表示其在整个网络中的信息权重，且定位模块与分类模块相关性较弱，导致定位不准确。提出一种梯形跨尺度特征耦合网络，通过梯形特征金字塔网络提取各级语义信息，采用交叉结构代替跳连结构，提高网络的泛化能力和语意表征能力，并引入可训练权重因子表示各级语义信息的重要性。在此基础上，将定位模块与分类模块通过耦合检测头增强两者之间的相关性，引入可变形卷积对最终的定位输出进行二次校准，从而提高检测精度。实验结果表明，与FasterRCNN、CascadeRCNN、RetinaNet等主流网络相比，该网络在SSDD数据集上的检测精度提高了2.74个百分点以上，具有良好的检测性能。在近岸复杂场景下，该网络能更有效地检测密集目标和多尺度目标，降低误检和漏检的概率。

Ship Detection in SAR Images Based on Trapezoidal Cross-scale Feature-coupling Network

HUANG Shuai^1,2 , ZHANG Yi¹

1. Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China;
2. School of Electronics, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China

Abstract: In ship detection in Synthetic Aperture Radar(SAR) images, effectively extracting multi-scale semantic information by using existing detection methods is difficult.Additionally, the information weight in the whole network cannot be accurately represented, and the correlation between the positioning module and the classification module is weak, leading to inaccurate positioning.This study presents a trapezoidal cross-scale feature-coupling network, in which the semantic information at all levels is extracted by employing a trapezoidal feature pyramid network.The cross structure is used to replace the jump structure to improve the generalization ability and semantic representation ability of the network.A trainable weight factor is introduced to represent the importance of semantic information at all levels.The correlation between the positioning module and the classification module can be enhanced by coupling the detection head, and a deformable convolution is introduced to calibrate the final positioning output and improve the detection accuracy.Experimental results show that compared with mainstream networks, such as the FasterRCNN, CascadeRCNN, and RetinaNet, the detection accuracy of the proposed network on the SSDD dataset is improved by more than 2.74 percentage points, demonstrating good detection performance.In the near-shore complex scene, the network can detect dense and multi-scale targets more effectively and reduce the probability of false or missed detections.

开放科学(资源服务)标志码(OSID)：

0 概述

合成孔径雷达能够全天时、全天候地监控，可以在复杂条件下提供高分辨率的图像。合成孔径雷达图像中的目标检测在区域管理、信息筛选等领域具有重要作用。但合成孔径雷达图像中目标的尺度多样性和背景散射的强干扰，给研究工作带来了极大的挑战。近年来，研究人员针对这些问题提出了各种解决方案。

传统的SAR图像目标检测方法中，恒虚警检测（Constant False Alarm Rate，CFAR）算法及其变种是典型的检测算法。恒虚警率检测器通过比较噪声处理后的输入信号与预设的阈值来确定是否有目标。但CFAR算法的准确率在很大程度上受阈值影响，许多学者对此进行了改进。文献[1]提出一种新型的闭环CFAR处理器，通过移位寄存器和神经网络选择最佳CFAR，从而保证其性能的连续性。文献[2]针对密集目标提出一种基于稀疏信号处理的方法。文献[3]提出一种基于AIS数据辅助的瑞利恒虚警率舰船检测算法，通过设计一种基于自适应阈值的杂波修整方法，消除局部背景窗口中的高度异常值。与文献[3]类似，AI等^[4]提出一种基于双边阈值的策略，通过自动裁剪局部参考窗口中的样本，消除高强度和低强度的异常值。这些方法的性能在很大程度上取决于海杂波的统计建模和所选模型的参数估计，而相关研究方法的改进也多是基于这2个方面。考虑到海杂波的非均匀性，相关研究人员提出了各种杂波模型来拟合复杂的海况，比如对称alpha分布^[5]和广义gamma分布^[6-7]。但是这些方法都是基于特定场景，手工设计特征的鲁棒性较差，尤其在复杂场景下容易产生虚警，不恰当的近海区域建模会导致许多内陆地区散射的误报。此外，手工设计的特征相对复杂，给研究工作带来一定工作量。

随着深度学习技术的发展，深度卷积神经网络在SAR图像上进行特征自动提取表现出优越的性能。物体检测方法分为无锚的方法和基于锚的方法两类。文献[8]将对象边界框视作一对关键点，即左上角的点和右下角的点，然后使用卷积神经网络检测成对的关键点。文献[9]将对象检测看作基于外观的关键点估计问题，再通过关键点估计网络，以检测5个关键点，包括最顶部点、最底部点、最左侧点、最右侧点。文献[10]将对象检测框看作3个关键点来进行检测。FU等^[11]提出一种新的特征平衡与细化网络，在解决SAR图像中的背景干扰问题与小目标难以检测问题的同时，进一步提高定位精度。文献[12]提出密集注意力特征聚合网络，该方法通过密集连接和迭代融合获取多尺度的高分辨率特征图。MAO等^[13]在U-Net的基础上进行简化，提出一种轻量级的检测网络。CUI等^[14]在CenterNet中引入空间混洗组增强注意力模块来抑制噪声和获得更强的语义特征。这些方法在提高检测速度的同时，也带来了一些问题。当2个物体的中心重合时，无锚的方法会产生语义模糊，导致检测器的准确度下降。

基于锚的方法根据是否存在感兴趣区域提议分为单步检测器和两步检测器。两步检测器如Faster RCNN、mask RCNN，首先在特征图上进行感兴趣区域提取，然后对物体进行检测。WANG等^[15]提出一种基于faster RCNN自动标记方位歧义进行检测的方法。GUI等^[16]提出一种将上下文信息进行多层融合的网络，从而获得语义互补的特征图。相比于单步检测器，两步检测器虽然具有更高的检测准确度，但是网络结构更加复杂，运算速度更慢。

本文提出一种可以在复杂场景下对SAR图像进行多尺度目标检测的网络，称为梯形的跨尺度特征耦合网络。为有效提取多尺度目标的特征，提出一种梯形的特征金字塔网络（Trapezoidal Feature Pyramid Network，TFPN），在TFPN网络中省略跳连结构而采用交叉结构，使目标的语义信息能更有效地在网络中传递和被提取。由于不同层级的特征图对输出的语义贡献并不一样，本文在特征图传递与融合的过程中引入额外的权重因子，用来表征不同层级的语义特征图对网络输出的重要程度。基于定位分支与分类分支之间的弱相关性，本文设计一种定位分类耦合模块，使分类输出和定位输出进行耦合，加强两者之间的联系，并引入可变形卷积对定位进行二次校准。

1 相关研究

OverFeat网络^[17]是早期单步检测器的代表之一，通过累积边界框增加检测置信度。文献[18]采用双向密集连接模块降低网络运行的复杂度。文献[19]在SSD^[20]的基础上引入额外的大尺度上下文信息，提高对小目标的检测精度。谷歌团队^[21]对主干网络、特征提取网络和预测网络同时进行宽度、深度、分辨率的统一复合尺度缩放，以此提高模型的检测效率。文献[22]提出一种更加简单、灵活的检测框架，通过省略锚框和提议框来避免复杂的计算，并在文献[23-25]的基础上，分别提出一些技巧在模型检测速度和准确性之间实现平衡。LIN等^[26]发现单步检测器的性能落后于两步检测器的主要原因在于极端的前景与背景类别失衡，并针对这一问题创造性地提出焦点损失。相似地，文献[27]提出一种在线困难样本挖掘算法来自动选择困难样本并对其进行训练，在一定程度上解决了正负样本不均衡的问题。

但是，上述网络在SAR图像的复杂杂波干扰下进行特征的有效提取仍然存在一定困难。例如近岸的船舶很难被有效检测出来，小目标会存在漏检的情况。此外，在检测网络中，分类分支与定位分支之间的相关性较弱，导致定位不准确。在标准的非极大值抑制过程中，这种弱相关性会导致高定位准确度低的分类置信度预测结果被低定位准确度高的分类置信度预测结果抑制。文献[28]针对边界框回归的不确定问题提出一种新的边界框回归损失算法，通过网络学习定位方差提高定位精度。WU等^[29]通过增加一个IoU预测分支来加强分类预测与定位预测之间的相关性。JIANG等^[30]直接将预测的IoU作为分类置信度来优化NMS程序。文献[31]在非极大值抑制算法的基础上提出Soft-NMS算法并表现出良好的性能。YU等^[32]设计一种IoU损失函数将检测的4个边界作为一个整体进行回归预测。UnitBox^[32]不仅可以进行准确定位，而且具有强大的鲁棒性。文献[33]提出Fitness NMS方法来更好地匹配IoU最大化的目标，该方法还可以与Soft NMS一起使用。

在SAR图像中，由于目标的尺度大小往往并不一致，因此在检测过程中，进行跨尺度地识别极为必要。特征金字塔网络（Feature Pyramid Networks，FPN）^[34]通过自上而下的横向连接提取各种尺度的语义特征图，从而适应不同规模的目标检测。NAS-FPN网络^[35]通过神经体系结构搜索，在结构空间中发现新的金字塔结构。在自下而上的路径中，路径聚合网络（Path Aggregation Network，PANet）^[36]使用定位信号缩短较低层与最顶层语义特征图之间的信息路径。并行特征金字塔网络（Parallel Feature Pyramid Network，PFPNet）^[37]通过增加网络宽度而非网络深度来生成多尺度特征图。文献[38-40]针对这一问题提出了各种改进方案。

注意力机制可以让模型专注于重要的信息而忽略不重要的信息，以此提升模型的性能。注意力机制通常分为空间域注意力和通道域注意力。SENet^[41]是通道域注意力的典型代表之一。SENet网络通过挤压进行特征聚合，之后再通过激励进行特征的再次校准。BELLO等^[42]提出一种二位相对自注意力模块生成注意力特征图，通过与卷积特征图级联来增强特征图的语义表示。与文献[42]不同，CBAM^[43]将注意力特征图与卷积特征图相乘来自动地适应细化特征。WANG等^[44]则表明避免降维对通道域注意力很重要，并提出ECA模块来平衡网络性能与复杂性。

传统的特征金字塔网络包括FPN网络、PANet网络、NAS-FPN网络、BiFPN网络^[21]等，通常由自下而上的下采样路径和自上而下的上采样路径组成，如图 1所示。

	Download: JPG larger image
图 1 传统特征金字塔网络 Fig. 1 Traditional feature pyramid network

2 本文网络

和绝大多数物体检测网络相似，本文网络由3个部分组成：用于特征提取的主干网络、用于多尺度特征生成的梯形金字塔网络和用于精确检测与定位的预测模块。本文选取残差网络^[45]作为主干网络。

SAR图像经过残差网络分别输出3个不同尺度的特征图，表示为$ {C}_{l} $，其中$ l=\mathrm{3, 4}, 5 $。这些表示不同尺度的特征图通过梯形金字塔网络产生更具表征能力和包含更多语义信息的多级特征图，表示为$ {P}_{i} $，其中$ i=\mathrm{3, 4}, \mathrm{5, 6}, 7 $。得到多级语义信息$ {P}_{i} $之后，本文将其分别送入用于定位与分类的检测模块，从而得到输出结果。

接下来，本文将详细介绍网络的各个部分，并给出具体的实现细节。

2.1 梯形特征金字塔网络

梯形特征金字塔网络TFPN与图 1类似，其结构如图 2所示。

	Download: JPG larger image
图 2 梯形特征金字塔网络结构 Fig. 2 Structure of trapezoidal feature pyramid network

TFPN网络需要五级输入特征$ {P}_{i} $，但是经过主干网络的输出却只有三级特征，所以$ {P}_{6} $与$ {P}_{7} $通过以下方式获得：

$ {P}_{6\_\mathrm{i}\mathrm{n}}=\mathrm{M}\mathrm{a}\mathrm{x}\mathrm{P}\mathrm{o}\mathrm{o}\mathrm{l}\left(\mathrm{B}\mathrm{N}\left(\mathrm{C}\mathrm{o}\mathrm{n}{\mathrm{v}}_{1\times 1}\left({C}_{5}\right)\right)\right) $

(1)

$ {P}_{7\_\mathrm{i}\mathrm{n}}=\mathrm{M}\mathrm{a}\mathrm{x}\mathrm{P}\mathrm{o}\mathrm{o}\mathrm{l}\left({P}_{6\_\mathrm{i}\mathrm{n}}\right) $

(2)

其中：$ \mathrm{C}\mathrm{o}\mathrm{n}{\mathrm{v}}_{1\times 1} $表示卷积核为$ 1\times 1 $的卷积层；$ \mathrm{B}\mathrm{N} $表示批归一化操作；$ \mathrm{M}\mathrm{a}\mathrm{x}\mathrm{P}\mathrm{o}\mathrm{o}\mathrm{l} $表示最大值池化操作，用来对特征图进行下采样，同时保证操作前后特征图分辨率不变。其过程如图 3所示，先对特征进行填充，$ {p}_{\mathrm{t}\mathrm{o}\mathrm{p}} $、$ {p}_{\mathrm{b}\mathrm{o}\mathrm{t}\mathrm{t}\mathrm{o}\mathrm{m}} $、$ {p}_{\mathrm{l}\mathrm{e}\mathrm{f}\mathrm{t}} $、$ {p}_{\mathrm{r}\mathrm{i}\mathrm{g}\mathrm{h}\mathrm{t}} $分别表示为了保持分辨率不变所需要的上、下、左、右方向的填充数，可由以下公式计算得到：

$ {e}_{h}=\left(\left[\frac{W}{s}\right]-1\right)\times s-W+k $

(3)

$ {e}_{v}=\left(\left[\frac{H}{s}\right]-1\right)\times s-H+k $

(4)

$ {p}_{\mathrm{l}\mathrm{e}\mathrm{f}\mathrm{t}}={e}_{h}/2 $

(5)

$ {p}_{\mathrm{r}\mathrm{i}\mathrm{g}\mathrm{h}\mathrm{t}}={e}_{h}-{p}_{\mathrm{l}\mathrm{e}\mathrm{f}\mathrm{t}} $

(6)

$ {p}_{\mathrm{t}\mathrm{o}\mathrm{p}}={e}_{v}/2 $

(7)

$ {p}_{\mathrm{b}\mathrm{o}\mathrm{t}\mathrm{t}\mathrm{o}\mathrm{m}}={e}_{v}-{p}_{\mathrm{t}\mathrm{o}\mathrm{p}} $

(8)

	Download: JPG larger image
图 3 下采样操作 Fig. 3 Downsampling operation

其中：$ W $、$ H $分别表示特征的分辨率；$ s $、$ k $分别表示最大池化操作的步进与核大小；W^*、H^*分别是填充后图片的宽度和高度。

$ {P}_{l\_\mathrm{i}\mathrm{n}} $（$ l=\mathrm{3, 4}, 5 $）可通过一个卷积层来获得，计算式如式（9）所示：

$ {P}_{l\_\mathrm{i}\mathrm{n}}=\mathrm{B}\mathrm{N}\left(\mathrm{C}\mathrm{o}\mathrm{n}{\mathrm{v}}_{1\times 1}\left({C}_{l}\right)\right) $

(9)

FPN^[34]只通过一条自上而下的路径对多尺度信息进行有限程度的聚合，PANet^[36]在FPN的基础上增加了一条自下而上的路径，但作用有限。BiFPN^[21]增加了交叉结构与直连结构来增强语义信息的表达能力。NAS-FPN^[35]使用神经架构搜索以寻找最优网络结构，但是最终得出的网络往往不规则，难以修改与迁移到其他场景。此外，使用神经架构搜索需要耗费大量的时间，这是极不划算的。FPG^[46]采用大量的横向连接与密集的网络结构进行语义信息的聚合，但这种方法需要大量而冗余的参数和较高的计算成本。

不难想象，低层次的特征图包含更多的语义信息，但是同时也包含更多的噪声。相应地，本文需要更深的网络对它进行处理。高层次的特征图经过低层次的特征图提炼而来，因而包含更加精确的语义信息和更少的噪声，但是语义信息的广度也相应更少。所以，本文只需要对其进行简单处理，这意味着只需要更浅的网络。针对这种不同层次不同尺度的语义特征图进行不同深度的网络处理，形成梯形的结构，如图 2所示，本文将其称为梯形金字塔网络，处理过程如式（10）所示：

$ \begin{array}{l}P\left(i, l\right)=\mathrm{C}\mathrm{o}\mathrm{n}{\mathrm{v}}_{3\times 3}\left({s}_{w}\left(U\left(P\left(i+1, l-1\right)\right)\right.\right.+\\ P\left(i, l-1\right)+\left.\left.D\left(P\left(i-1, l-1\right)\right)\right)\right)\end{array} $

(10)

其中：$ i=\mathrm{3, 4}, \mathrm{5, 6}, 7 $表示特征的层次；$ l=\mathrm{0, 1}, \cdots , 7-i $表示网络的层次；$ D $表示下采样；$ U $表示上采样；$ {s}_{w} $表示激活函数。$ {s}_{w} $的计算式如下所示：

$ {s}_{w}\left(x\right)=x\times \mathrm{s}\mathrm{i}\mathrm{g}\mathrm{m}\mathrm{o}\mathrm{i}\mathrm{d}\left(\beta x\right) $

(11)

在梯形金字塔网络的最后一层，也就是输出层，式（10）则变为式（12），此处的$ l=8-i $。

$ \begin{array}{l}P\left(i, l\right)=\mathrm{C}\mathrm{o}\mathrm{n}{\mathrm{v}}_{3\times 3}\left({s}_{w}\right(P\left(i, l-1\right)+\\ \left.\left.D\left(P\left(i-1, l-1\right)\right)+D\left(P\left(i-1, l+1\right)\right)\right)\right)\end{array} $

(12)

得益于文献[35, 46]的启发，本文既想在空间中搜索最佳的神经网络架构，又想训练时间和计算成本不至于太高，为此本文设计了一种权重因子，其表达式如式（13）所示：

$ w\left(i, l, k\right)=\left\{\begin{array}{l}0, i+l=7\;\mathrm{o}\mathrm{r}\;i=3, k=2\\ \alpha \left(i, l, k\right), \mathrm{其}\mathrm{他}\end{array}\right. $

(13)

其中：$ i=\mathrm{3, 4}, \mathrm{5, 6}, 7 $；$ l=\mathrm{0, 1}, \mathrm{2, 3} $；$ k=\mathrm{0, 1}, 2 $。由于不同尺度的特征包含的语义信息量并不相同，对模型的输出重要程度也不相同，因此这里的权重因子可以通过训练获得最佳值。梯形金字塔网络可表述如式（14）所示：

$ P\left(i, l\right)=\left\{\begin{array}{l}\mathrm{C}\mathrm{o}\mathrm{n}{\mathrm{v}}_{3\times 3}\left({s}_{w}\right(w\left(i+1, l-\mathrm{1, 2}\right)\times U\left(P\left(i+1, l-1\right)\right)+\\ w\left(i, l-\mathrm{1, 1}\right)\times P\left(i, l-1\right)+\\ w\left(i-1, l-\mathrm{1, 0}\right)\times D\left(P\left(i-1, l-1\right)\right)\left)\right)\text{，}i+l\ne 8\\ \mathrm{C}\mathrm{o}\mathrm{n}{\mathrm{v}}_{3\times 3}\left({s}_{w}\right(w\left(i, l-\mathrm{1, 0}\right)\times P\left(i, l-1\right)+\\ w\left(i-1, l-\mathrm{1, 0}\right)\times D\left(P\left(i-1, l-1\right)\right)+\\ D\left(P\left(i-1, l+1\right)\right)\left)\right)\text{，}\mathrm{其}\mathrm{他}\end{array}\right. $

(14)

需要注意的是，本文提出梯形金字塔网络中的特征图分辨率是在变化的。举个例子，如果输出的SAR图像分辨率为$ 640\times {1}_{}024 $像素，则$ {P}_{3} $的分辨率为$ 80\times 128 $像素，$ {P}_{7} $的分辨率为$ 5\times 8 $像素。换句话说，在第$ i $级的特征图具有输入图像的$ 1/{2}^{i} $的分辨率。

注意力机制能够有效判别信息是否重要。为尽量在模型性能与模型复杂度之间取得平衡，本文在梯形金字塔结构中引入有效的通道注意力（Efficient Channel Attention，ECA）模块^[44]。在对特征图进行下采样的过程中引入注意力模块，从而使有用信息进一步被增强，无用噪声进一步被抑制。ECA-Net首先通过全局平均池化操作来聚合特征，然后通过内核大小为$ k $的快速一维卷积生成注意力通道权重。令特征图为$ x\in {\mathbb{R}}^{C\times W\times H} $，其中$ C $、$ W $、$ H $分别为通道数、宽度和高度。Channel-wise全局平均池化可表示为式（15）所示：

$ {y}_{\mathrm{p}\mathrm{o}\mathrm{o}\mathrm{l}\mathrm{i}\mathrm{n}\mathrm{g}}=\frac{1}{WH}\sum\limits_{i=1, j=1}^{W, H}{x}_{ij} $

(15)

通道权重$ \omega $的计算式如式（16）所示：

$ \omega =\sigma \left(\mathrm{C}\mathrm{o}\mathrm{n}{\mathrm{v}}_{k}\left({y}_{\mathrm{p}\mathrm{o}\mathrm{o}\mathrm{l}\mathrm{i}\mathrm{n}\mathrm{g}}\right)\right) $

(16)

其中：$ \sigma $是Sigmoid函数。卷积核的大小$ k $由通道数$ C $来确定，如式（17）所示：

$ k={\left[\frac{\mathrm{l}\mathrm{b}C}{\lambda }+\frac{\alpha }{\lambda }\right]}_{\mathrm{o}\mathrm{d}\mathrm{d}} $

(17)

其中：$ \alpha $、$ \lambda $为人为设置的参数；$ {\left[x\right]}_{\mathrm{o}\mathrm{d}\mathrm{d}} $表示距离$ x $最近的奇数。整个注意力网络如图 4所示。

	Download: JPG larger image
图 4 注意力网络结构 Fig. 4 Structure of attention network

在梯形金字塔结构中引入改进的ECA-Net，则有：

$ D\left(P\left(i, l\right)\right)=\stackrel{\sim }{D}\left(\mathrm{E}\mathrm{C}\mathrm{A}\left(P\left(i, l\right)\right)\right) $

(18)

2.2 定位分类耦合检测头

在SAR图像中，由于散射的模糊性和较低分辨率的影响，对目标进行精确定位往往存在较大难度。与此同时，以往的检测网络在定位与分类两个分支之间缺少有效的联系与相互作用，导致定位分支的精确度下降。如图 5所示，由于船尾散射较弱，导致定位精确的图 5（a）中置信度反而小于定位次精确的图 5（b）。此外，对于高速航行的船舶，其尾部的水浪也会产生定位模糊，如图 5（c）和图 5（d）所示。

	Download: JPG larger image
图 5 以往检测网络的结果 Fig. 5 Results of previous networks

针对上述问题，本文提出如图 6所示的检测网络，并称其为定位分类耦合检测头（Positioning and Classification Coupling Detection Head，PCCDH）。$ {x}_{l} $表示各个尺度的特征图，也就是梯形网络的输出。$ {x}_{l} $分别经过4层内核大小为$ 3\times 3 $的卷积层，得到分类输出$ {C}_{\mathrm{o}\mathrm{u}\mathrm{t}}^{\mathrm{c}\mathrm{l}\mathrm{s}} $和回归输出$ {r}_{\mathrm{o}\mathrm{u}\mathrm{t}}^{\mathrm{r}\mathrm{e}\mathrm{g}} $。图 6中的Attention结构可见于图 4。与ECA-Net不同的是，最终阶段的逐元素乘积为$ {C}_{\mathrm{o}\mathrm{u}\mathrm{t}}^{\mathrm{c}\mathrm{l}\mathrm{s}} $与$ {r}_{\mathrm{o}\mathrm{u}\mathrm{t}}^{\mathrm{r}\mathrm{e}\mathrm{g}} $相乘。最终预测结果$ {y}_{l} $如下：

$ \stackrel{\sim }{{y}_{l}}=\mathrm{E}\mathrm{C}{\mathrm{A}}^{\mathrm{\text{'}}}\left({C}_{\mathrm{o}\mathrm{u}\mathrm{t}}^{\mathrm{c}\mathrm{l}\mathrm{s}}, {r}_{\mathrm{o}\mathrm{u}\mathrm{t}}^{\mathrm{r}\mathrm{e}\mathrm{g}}\right)+{r}_{\mathrm{o}\mathrm{u}\mathrm{t}}^{\mathrm{r}\mathrm{e}\mathrm{g}} $

(19)

$ {y}_{l}=\mathrm{D}\mathrm{e}\mathrm{f}\mathrm{o}\mathrm{r}\mathrm{m}\left(\stackrel{\sim }{{y}_{l}}, \mathrm{C}\mathrm{o}\mathrm{n}{\mathrm{v}}_{3\times 3}\left({r}_{\mathrm{o}\mathrm{u}\mathrm{t}}^{\mathrm{r}\mathrm{e}\mathrm{g}}\right)\right) $

(20)

	Download: JPG larger image
图 6 定位分类耦合检测头 Fig. 6 Positioning and classification coupling detection head

其中：Deform表示可变形卷积网络^[47]。

可变形卷积通过在标准卷积的常规采样网格中附加一个额外的偏置，使采样网格可以任意地变形，从而增强卷积网络的跨界信息抽取能力。网格的偏置是二维的，并且可通过另一个卷积层学习语义信息得到。令在特征图$ {x}_{l}\in {\mathbb{R}}^{C\times W\times H} $中的规则网格为$ G $，计算式如式（21）所示：

$ G=\left\{\left(\stackrel{\sim }{w}, \stackrel{\sim }{h}\right)|\stackrel{\sim }{w}=\mathrm{0, 1}, \cdots , W-1;\stackrel{\sim }{h}=\mathrm{0, 1}, \cdots , H-1\right\} $

(21)

令网格偏置为$ \left\{\mathrm{\Delta }{p}_{j}|j=\mathrm{1, 2}, \cdots , \left|G\right|\right\} $，则在任意位置$ {p}_{0} $处进行可变形卷积的输出为：

$ \mathrm{D}\mathrm{e}\mathrm{f}\mathrm{o}\mathrm{r}\mathrm{m}\left({p}_{0}, \mathrm{\Delta }{p}_{j}\right)=\sum\limits_{{p}_{i}\in G}w\left({p}_{i}\right)v\left({p}_{0}+{p}_{i}+\mathrm{\Delta }{p}_{j}\right) $

(22)

其中：$ v\left({p}_{0}\right) $表示在$ {p}_{0} $处的像素值。当$ {p}_{0}+{p}_{i}+\mathrm{\Delta }{p}_{j} $表示一个分数值时，本文采用双线性差值获得该位置的像素值。

2.3 损失函数与评价指标

在训练中本文使用的损失函数包含分类损失$ {L}_{\mathrm{c}\mathrm{l}\mathrm{s}} $和回归损失$ {L}_{\mathrm{r}\mathrm{e}\mathrm{g}} $，表达式如式（23）所示：

$ \mathrm{L}\mathrm{o}\mathrm{s}\mathrm{s}={L}_{\mathrm{c}\mathrm{l}\mathrm{s}}+{L}_{\mathrm{r}\mathrm{e}\mathrm{g}} $

(23)

采用焦点损失^[19]作为分类损失$ {L}_{\mathrm{c}\mathrm{l}\mathrm{s}} $，表达式如式（24）所示：

$ {L}_{\mathrm{c}\mathrm{l}\mathrm{s}}=-{\alpha }_{t}{\left(1-{\widehat{p}}_{t}\right)}^{\gamma }\mathrm{l}\mathrm{o}{\mathrm{g}}_{\mathrm{a}}\left({\widehat{p}}_{t}\right) $

(24)

其中$ {\widehat{p}}_{t} $定义如下：

$ {\widehat{p}}_{t}=\left\{\begin{array}{l}\widehat{p}\text{，}{C}_{\mathrm{o}\mathrm{u}\mathrm{t}}^{\mathrm{c}\mathrm{l}\mathrm{s}}=1\\ 1-\widehat{p}\text{，}\mathrm{其}\mathrm{他}\end{array}\right. $

(25)

采用焦点损失可以在一定程度上缓解正负样本不均衡的影响^[48]。至于回归损失$ {L}_{\mathrm{r}\mathrm{e}\mathrm{g}} $，本文采用smooth L1 Loss函数，表达式如式（26）所示：

$ {L}_{\mathrm{r}\mathrm{e}\mathrm{g}}=\left\{\begin{array}{l}0.5{y}_{l}^{2}\text{，}\left|{y}_{l}\right| < 1\\ \left|{y}_{l}\right|-0.5, \mathrm{其}\mathrm{他}\end{array}\right. $

(26)

采用精度（Precision）、召回率（Recall）、$ {f}_{1}-\mathrm{s}\mathrm{c}\mathrm{o}\mathrm{r}\mathrm{e} $、均值平均精度（mean Average Precision，mAP），包括$ \mathrm{m}\mathrm{A}{\mathrm{P}}_{0.5:0.95} $、$ \mathrm{m}\mathrm{A}{\mathrm{P}}_{0.5} $、$ \mathrm{m}\mathrm{A}{\mathrm{P}}_{0.75} $来定量评估模型的性能，其中$ \mathrm{m}\mathrm{A}{\mathrm{P}}_{0.5} $表示在阈值$ \xi =0.5 $下的均值平均精度，以此类推。使用$ {I}_{\mathrm{I}\mathrm{o}\mathrm{U}} $表示预测框与Ground Truth的交并比，用$ {T}_{\mathrm{T}\mathrm{P}} $表示$ {I}_{\mathrm{I}\mathrm{o}\mathrm{U}} $大于阈值$ \xi $的检测框数量，用$ {F}_{\mathrm{F}\mathrm{P}} $表示$ {I}_{\mathrm{I}\mathrm{o}\mathrm{U}} $小于或者等于$ \xi $的检测框数量，用$ {F}_{\mathrm{F}\mathrm{N}} $表示没有检测到Ground Truth的数量。各指标的计算式如下：

$ {P}_{\mathrm{P}\mathrm{r}\mathrm{e}\mathrm{c}\mathrm{i}\mathrm{s}\mathrm{i}\mathrm{o}\mathrm{n}}=\frac{{T}_{\mathrm{T}\mathrm{P}}}{{T}_{\mathrm{T}\mathrm{P}}+{F}_{\mathrm{F}\mathrm{P}}} $

(27)

$ {R}_{\mathrm{R}\mathrm{e}\mathrm{c}\mathrm{a}\mathrm{l}\mathrm{l}}=\frac{{T}_{\mathrm{T}\mathrm{P}}}{{T}_{\mathrm{T}\mathrm{P}}+{F}_{\mathrm{F}\mathrm{N}}} $

(28)

$ {f}_{1}=\frac{2\times {P}_{\mathrm{P}\mathrm{r}\mathrm{e}\mathrm{c}\mathrm{i}\mathrm{s}\mathrm{i}\mathrm{o}\mathrm{n}}\times {R}_{\mathrm{R}\mathrm{e}\mathrm{c}\mathrm{a}\mathrm{l}\mathrm{l}}}{{P}_{\mathrm{P}\mathrm{r}\mathrm{e}\mathrm{c}\mathrm{i}\mathrm{s}\mathrm{i}\mathrm{o}\mathrm{n}}+{R}_{\mathrm{R}\mathrm{e}\mathrm{c}\mathrm{a}\mathrm{l}\mathrm{l}}} $

(29)

本文将$ \mathrm{P}\mathrm{r}\mathrm{e}\mathrm{c}\mathrm{i}\mathrm{s}\mathrm{i}\mathrm{o}\mathrm{n} $作为纵坐标，将$ \mathrm{R}\mathrm{e}\mathrm{c}\mathrm{a}\mathrm{l}\mathrm{l} $作为横坐标绘制PR曲线，并计算PR曲线下的面积，其计算式如式（30）所示：

$ {m}_{\mathrm{m}\mathrm{A}{\mathrm{P}}_{\xi }}=\frac{1}{N}\sum\limits_{i\in \mathrm{\Omega }}\underset{0}{\overset{1}{\int }}{P}_{i}\left({R}_{i}\right)\mathrm{d}{R}_{i} $

(30)

其中：$ \mathit{\Omega} $表示目标的类别集合；$ N $为集合$ \mathit{\Omega} $中元素数目。对于$ \mathrm{m}\mathrm{A}{\mathrm{P}}_{0.5:0.95} $则有：

$ {m}_{\mathrm{m}\mathrm{A}{\mathrm{P}}_{0.5:0.95}}=\frac{1}{10}\sum\limits_{\xi =0.05i}{m}_{\mathrm{m}\mathrm{A}{\mathrm{P}}_{\xi }}, i=\mathrm{10, 11}, \cdots , 19 $

(31)

3 实验结果与分析

本节将展示实验的具体结果并证明本文方法的有效性。

3.1 数据集和实验设置

采用SSDD数据集训练和测试算法性能。SSDD数据集共有1 160张图像和2 456个舰船。这些图像分别来自RadarSat-2、TerraSAR和Sentinel-1这3种不同传感器，具有HH、HV、VV、VH共4种极化方式，分辨率在1~15 m之间，包含近岸地区、离岸海域等不同场景。本文将其分为训练集和测试集两部分，其中训练集包含928张图片，测试集包含232张图片。首先对所有图片进行归一化操作，之后将其左右翻转。最后本文将每张图片按其宽高比近似调整到128像素的整数倍，但是不超过$ 640\times {1}_{}024 $分辨率。当然，本文也会对训练标签与图片进行相同的处理。采用预训练的ResNet152作为主干网络，并引入Adam作为优化器，其初始学习率设为1×10^-5。鉴于图片分辨率和GPU显存大小的限制，本文将批大小设为1，并通过8次梯度累积得到与批大小为8时相同的结果。在训练网络时，如果超过3个epoch损失不再下降，将动态地调整学习率。本文实验在Pytorch 1.6框架中实现，在NVIDIA 2070 Super上实施。

3.2 消融实验

本文共提出3个模块用于SAR图像的舰船目标检测。为分析和说明这些模块的性能及其对模型的性能的影响，本文实施了几组消融实验。在控制变量的前提下，研究只改变某一模块对实验结果带来的影响。表 1给出了各个模块定量化的模型贡献度，其中$ \mathrm{P}\mathrm{r}\mathrm{e}\mathrm{c}\mathrm{i}\mathrm{s}\mathrm{i}\mathrm{o}\mathrm{n} $，$ \mathrm{R}\mathrm{e}\mathrm{c}\mathrm{a}\mathrm{l}\mathrm{l} $，$ {f}_{1}-\mathrm{s}\mathrm{c}\mathrm{o}\mathrm{r}\mathrm{e} $均是在阈值$ \xi =0.5 $时的数据，相应的PR曲线如图 7所示。

下载CSV 表 1 消融实验结果 Table 1 Results of ablation experiment

	Download: JPG larger image
图 7 消融实验PR曲线 Fig. 7 PR curve of ablation experiment

由表 1可知，本文提出的2个子模块均可在不同程度上提高模型的检测性能。与基线网络相比，TFPN模块的引入明显改进了性能，在$ \mathrm{m}\mathrm{A}{\mathrm{P}}_{0.5} $、$ \mathrm{m}\mathrm{A}{\mathrm{P}}_{0.75} $两个指标上分别提高了1.860、6.174个百分点。在$ {f}_{1}-\mathrm{s}\mathrm{c}\mathrm{o}\mathrm{r}\mathrm{e} $指标上TFPN模块相比基线网络提高了3.137个百分点。TFPN模块采用交叉结构，使语义信息可以很好地在网络中流动和传递，其中的特征通过加权进行融合，能够有效筛选语义信息。PCCDH模块在基线网络的基础上$ \mathrm{m}\mathrm{A}{\mathrm{P}}_{0.5} $、$ \mathrm{m}\mathrm{A}{\mathrm{P}}_{0.75} $分别提高了1.740、3.943个百分点。PCCDH模块侧重于解决精准定位的问题，因此它在$ \mathrm{m}\mathrm{A}{\mathrm{P}}_{0.75} $上提升的性能几乎是在$ \mathrm{m}\mathrm{A}{\mathrm{P}}_{0.5} $上的两倍。这意味着，本文提出的模块可以很好地解决预测边界框与Ground Truth之间的定位偏移问题。

图 8所示为有无TFPN检测模块的结果，可以看出，当同一场景具有多个不同尺度的目标时，基线网络往往存在漏检的情况。相反，TFPN模块可以很好地处理多尺度目标，尤其是场景中的小目标。这意味着相比于FPN模块，本文提出的TFPN模块能够更有效地抽取多尺度物体的语义信息。

	Download: JPG larger image
图 8 有无TFPN检测模块的结果对比 Fig. 8 Comparison of results with and without TFPN detection module

图 9所示为有无PCCDH检测模块的结果对比（彩色效果见《计算机工程》官网HTML版本），其中红色方框代表没有检测到或被错误检测的舰船目标，橙色方框代表不够精确的检测结果（这种不够精确主要是将舰船运动的尾迹也当做了舰船的一部分）。由图 9可以看出，加入PCCDH模块后，模型对紧密接触的物体可以辨别和区分。同时，对高速运动的目标，其尾迹和物体本身也能被模型甄别出来。在综合TFPN模块和PCCDH模块后，模型的性能达到了最优，$ \mathrm{m}\mathrm{A}{\mathrm{P}}_{0.5} $为94.948%，$ \mathrm{m}\mathrm{A}{\mathrm{P}}_{0.75} $为68.121%。TFPN模块能有效聚合语义特征，PCCDH模块能精准地定位目标位置信息，这两种改进措施加到一起，进一步提高了网络的表现性能。

	Download: JPG larger image
图 9 有无PCCDH检测模块的结果对比 Fig. 9 Comparison of results with or without PCCDH detection module

3.3 对比实验

本文提出两种改进措施来提高网络对SAR图像物体检测性能，并与现有网络^[49]进行比较，包括FasterRCNN、RetinaNet、CascadeRCNN等网络，结果如表 2所示，相应的PR曲线见图 10。

下载CSV 表 2 不同网络的定量检测性能比较 Table 2 Comparison of quantitative detection performance of different networks%

	Download: JPG larger image
图 10 不同网络的PR曲线 Fig. 10 PR curves of different networks

由表 2可知，本文网络的$ {f}_{1}-\mathrm{s}\mathrm{c}\mathrm{o}\mathrm{r}\mathrm{e} $、$ \mathrm{m}\mathrm{A}{\mathrm{P}}_{0.5} $、$ \mathrm{m}\mathrm{A}{\mathrm{P}}_{0.75} $值分别超过其他网络4、2、1个百分点以上，显著提高了各种场景下的船舶检测性能。对图 5所描述的情况，本文网络取得了一定程度上的性能提升，这一点可以从图 10中看出（意味着更加严格的定位标准），本文网络的PR曲线位于最外围，这意味着本文网络具有最佳的性能。此外，对于多尺度的小目标问题，本文网络可以得到更精确的检测结果。

由表 2可知，无锚网络FCOS和YOLOv3的检测性能相较于有锚网络更差，这是因为预先设置的锚点包含了目标尺寸的先验信息，从而降低了训练的难度。表 2中几种网络的实际检测效果如图 11所示。

	Download: JPG larger image
图 11 不同网络的检测结果对比 Fig. 11 Comparison of detection results of different networks

对于近岸场景，传统网络很容易受到干扰而检测出许多并不存在的船舶。这一点可以从图 11中看出，图 11（c）和图 11（d）显示出很多错误的检测结果。而且其他网络并不能很好地区分船舶的边界，导致实际上只有一个船舶，检测器却检测出多个。图 11（e）的检测结果不够精确，这一点可以从图中的检测置信度看出（图中方框上的数字表示检测置信度），而本文网络能很好地解决这些问题。对于其他不同场景，本文将在下一节讨论。

3.4 结果分析

本节将定量分析模型在不同情况下的性能以及模型的鲁棒性。

3.4.1 背景干扰对模型性能的影响

由于检测环境复杂，不同背景散射会对模型造成不同影响^[50-51]。已知在对船舶检测时，内陆的背景干扰要远超过近海地区，导致近岸船舶的检测比近海船舶的检测更加困难。本文分别对两种情况下的模型性能进行对比，结果如表 3所示。由表 3可知，在近岸场景下，模型的性能会受到一定程度的影响。但相较于基线网络，本文网络在$ \mathrm{m}\mathrm{A}{\mathrm{P}}_{0.5} $、$ {f}_{1}-\mathrm{s}\mathrm{c}\mathrm{o}\mathrm{r}\mathrm{e} $指标上分别提高了16.75、14.65个百分点。造成在近岸场景下本文网络性能下降的原因主要有2个：

下载CSV 表 3 不同场景下的检测性能对比 Table 3 Comparison of detection performance in different scenarios

1）近岸地区的船舶一般比较密集，导致船舶之间的边界不清晰；

2）近岸地区的港口等环境造成的散射对模型区分目标造成了一定的困难。

3.4.2 网络宽度对模型性能的影响

网络宽度是影响模型性能的另一个超参数。网络宽度越大，模型的参数越多，检测精度越高，但是泛化能力越低；反之，网络宽度越小，模型的参数越少，检测精度越低，但是泛化能力越高。为了在检测精度与模型泛化能力之间取得平衡，本文选取模型宽度为$ 256\alpha $，并给出在不同$ \alpha $下模型的性能曲线，如图 12所示。

	Download: JPG larger image
图 12 网络宽度对模型性能的影响 Fig. 12 Influence of network width on model performance

由图 12可知，当$ \alpha $小于1时，随着$ \alpha $的增大，模型性能也在增强。当$ \alpha $大于1时，随着$ \alpha $的增大，模型的性能有小幅下降，这是因为模型参数过多，导致模型过拟合。由图 12（a）可知，当模型宽度为256时，模型具有最佳性能。由图 12（b）可知，随着网络宽度的增大，模型参数也在变多，导致检测所耗时间更长。

4 结束语

本文提出一种能在复杂场景下对SAR图像进行多尺度目标检测的网络，通过设计梯形特征金字塔模块TFPN，并采用交叉结构代替跳连结构，提高泛化能力和语义表征能力。将改进的ECA模块嵌入到TFPN模块中，提高检测性能。引入可训练的权重因子，使不同层级间的特征能更好地进行融合，并在定位分类耦合检测头中加入可分离卷积，以进行二次校准，提高检测精度。实验结果表明，与FasterRCNN、CascadeRCNN、RetinaNet等主流网络相比，本文网络显著提高了检测精度和鲁棒性。下一步将通过模型剪枝、轻量化网络设计等方法，在保证精度的前提下，提高SAR图像舰船检测模型的运算速度。

参考文献

[1]	HATEM G M, ABDUL SADAH J W, SALMAN J, et al. Improve CFAR algorithm based on closed loop by neural network[C]//Proceedings of the 4th Scientific International Conference Najaf. Washington D. C., USA: IEEE Press, 2019: 15-19.
[2]	CHEN J X, ZHOU S H, VARSHNEY P K, et al. A sparsity based CFAR algorithm for dense radar targets[C]//Proceedings of IEEE Radar Conference. Washington D. C., USA: IEEE Press, 2020: 1-6.
[3]	AI J Q, PEI Z L, YAO B D, et al. AIS data aided Rayleigh CFAR ship detection algorithm of multiple-target environment in SAR images[C]//Proceedings of IEEE Transactions on Aerospace and Electronic Systems. Washington D. C., USA: IEEE Press, 2021: 1266-1282.
[4]	AI J Q, MAO Y X, LUO Q W, et al. Robust CFAR ship detector based on bilateral-trimmed-statistics of complex ocean scenes in SAR imagery: a closed-form solution[C]//Proceedings of IEEE Transactions on Aerospace and Electronic Systems. Washington D. C., USA: IEEE Press, 2021: 1872-1890.
[5]	LIAO M S, WANG C C, WANG Y, et al. Using SAR images to detect ships from sea clutter[C]//Proceedings of IEEE Geoscience and Remote Sensing Letters. Washington D. C., USA: IEEE Press, 2008: 194-198.
[6]	LI H C, HONG W, WU Y R, et al. On the empirical-statistical modeling of SAR images with generalized gamma distribution[J]. IEEE Journal of Selected Topics in Signal Processing, 2011, 5(3): 386-397. DOI:10.1109/JSTSP.2011.2138675
[7]	GAO G, OUYANG K W, LUO Y B, et al. Scheme of parameter estimation for generalized gamma distribution and its application to ship detection in SAR images[C]//Proceedings of IEEE Transactions on Geoscience and Remote Sensing. Berlin, Germany: Springer, 2016: 1812-1832.
[8]	LAW H, DENG J. Cornernet: detecting objects as paired keypoints[C]//Proceedings of the European Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2018, 734-750.
[9]	ZHOU X Y, ZHUO J C, KRÄHENBÜHL P. Bottom-up object detection by grouping extreme and center points[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2019: 850-859.
[10]	DUAN K W, BAI S, XIE L X, et al. CenterNet: keypoint triplets for object detection[C]//Proceedings of IEEE/CVF International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2019: 6568-6577.
[11]	FU J M, SUN X, WANG Z R, et al. An anchor-free method based on feature balancing and refinement network for multiscale ship detection in SAR images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 59(2): 1331-1344. DOI:10.1109/TGRS.2020.3005151
[12]	GAO F, HE Y S, WANG J, et al. Anchor-free convolutional network with dense attention feature aggregation for ship detection in SAR images[J]. Remote Sensing, 2020, 12(16): 19-24.
[13]	MAO Y X, YANG Y Q, MA Z Y, et al. Efficient low-cost ship detection for SAR imagery based on simplified U-net[J]. IEEE Access, 2012, 8: 69742-69753.
[14]	CUI Z Y, WANG X Y, LIU N Y, et al. Ship detection in large-scale SAR images via spatial shuffle-group enhance attention[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 59(1): 379-391. DOI:10.1109/TGRS.2020.2997200
[15]	WANG J, CHEN J, WANG P B, et al. An algorithm for azimuth ambiguities detection in SAR images using faster-RCNN[C]//Proceedings of the 6th Asia-Pacific Conference on Synthetic Aperture Radar. Washington D. C., USA: IEEE Press, 2019: 1-5.
[16]	GUI Y C, LI X H, XUE L, et al. A scale transfer convolution network for small ship detection in SAR images[C]//Proceedings of the 8th Joint International Information Technology and Artificial Intelligence Conference. Washington D. C., USA: IEEE Press, 2019: 1845-1849.
[17]	SERMANET P, EIGEN D, ZHANG X, et al. OverFeat: integrated recognition, localization and detection using convolutional networks[EB/OL]. [2021-10-20]. https://arxiv.org/abs/1312.6229.
[18]	ZHOU L, WEI S Y, CUI Z M, et al. Lira-YOLO: a lightweight model for ship detection in radar images[J]. Journal of Systems Engineering and Electronics, 2020, 31(5): 950-956. DOI:10.23919/JSEE.2020.000063
[19]	FU C Y, LIU W, RANGA A, et al. DSSD: deconvolutional single shot detector[EB/OL]. [2021-10-20]. https://arxiv.org/abs/1701.06659.
[20]	LIU W, ANGUELOV, ERHAN D, et al. SSD: single shot multibox detector[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 21-37.
[21]	TAN M X, PANG R M, LE Q V. EfficientDet: scalable and efficient object detection[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2016: 10778-10787.
[22]	WANG T, ZHU X G, PANG J M, et al. FCOS3D: fully convolutional one-stage monocular 3D object detection[C]//Proceedings of IEEE/CVF International Conference on Computer Vision Workshops. Washington D. C., USA: IEEE Press, 2019: 913-922.
[23]	REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2017: 6517-652.
[24]	LONG X, DENG K P, WANG G Z, et al. PP-YOLO: an effective and efficient implementation of object detector[EB/OL]. [2021-10-20]. https://arxiv.org/abs/2007.12099.
[25]	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2016: 779-788.
[26]	LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of IEEE International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2017: 2999-3007.
[27]	SHRIVASTAVA A, GUPTA A, GIRSHICK R. Training region-based object detectors with online hard example mining[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2016: 761-769.
[28]	HE Y H, ZHU C C, WANG J R, et al. Bounding box regression with uncertainty for accurate object detection[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2019: 2883-2892.
[29]	WU S, LI X, WANG X. IoU-aware single-stage object detector for accurate localization[J]. Image and Vision Computing, 2020, 97(5): 11-20.
[30]	JIANG B, LUO R., MAO J, et al. Acquisition of localization confidence for accurate object detection[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 784-799.
[31]	BODLA N, SINGH B, CHELLAPPA R, et al. Soft-NMS—improving object detection with one line of code[C]//Proceedings of IEEE International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2017: 5562-5570.
[32]	YU J H, JIANG Y N, WANG Z Y, et al. UnitBox: an advanced object detection network[C]//Proceedings of the 24th ACM International Conference on Multimedia. New York, USA: ACM Press, 2016: 516-520.
[33]	TYCHSEN-SMITH L, PETERSSON L. Improving object localization with fitness NMS and bounded IoU loss[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 6877-6885.
[34]	LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2017: 936-944.
[35]	GHIASI G, LIN T Y, LE Q V. NAS-FPN: learning scalable feature pyramid architecture for object detection[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2019: 7029-7038.
[36]	LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 8759-8768.
[37]	KIM S W, KOOK H K, SUN J Y, et al. Parallel feature pyramid network for object detection[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 234-250.
[38]	WOO S, HWANG S, KWEON I. S. Stairnet: top-down semantic aggregation for accurate one shot detection[C]///Proceedings of 2018 IEEE Winter Conference on Applications of Computer Vision. Washington D. C., USA: IEEE Press, 2018: 1093-1102.
[39]	ZHOU P, NI B B, GENG C, et al. Scale-transferrable object detection[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 528-537.
[40]	KONG T, SUN F C, YAO A B, et al. RON: reverse connection with objectness prior networks for object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2017: 5244-5252.
[41]	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 7132-7141.
[42]	BELLO I, ZOPH B, LE Q, et al. Attention augmented convolutional networks[C]//Proceedings of IEEE/CVF International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2019: 3285-3294.
[43]	WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[EB/OL]. [2021-10-20]. https://arxiv.org/abs/1807.06521.
[44]	WANG Q L, WU B G, ZHU P F, et al. ECA-net: efficient channel attention for deep convolutional neural networks[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2020: 11531-11539.
[45]	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2016: 770-778.
[46]	CHEN K, CAO Y H, LOY C C, et al. Feature pyramid grids[EB/OL]. [2021-10-20]. https://arxiv.org/abs/2004.03580.
[47]	DAI J F, QI H Z, XIONG Y W, et al. Deformable convolutional networks[C]//Proceedings of IEEE International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2017: 764-773.
[48]	徐志京, 黄海. 基于多重连接特征金字塔的SAR图像舰船目标检测[J]. 激光与光电子学进展, 2021, 58(8): 287-294. XU Z J, HUANG H. Ship detection in SAR image based on multiple connected features pyramid network[J]. Laser&Optoelectronics Progress, 2021, 58(8): 287-294. (in Chinese)
[49]	侯笑晗, 金国栋, 谭力宁. 基于深度学习的SAR图像舰船目标检测综述[J]. 激光与光电子学进展, 2021, 58(4): 53-64. HOU X H, JIN G D, TAN L N. Survey of ship detection in SAR images based on deep learning[J]. Laser & Optoelectronics Progress, 2021, 58(4): 53-64. (in Chinese)
[50]	周雪珂, 刘畅, 周滨. 多尺度特征融合与特征通道关系校准的SAR图像船舶检测[J]. 雷达学报, 2021, 10(4): 531-543. ZHOU X K, LIU C, ZHOU B. Ship detection in SAR images based on multiscale feature fusion and channel relation calibration of features[J]. Journal of Radars, 2021, 10(4): 531-543. (in Chinese)
[51]	孙忠镇, 戴牧宸, 雷禹, 等. 基于级联网络的复杂大场景SAR图像舰船目标快速检测[J]. 信号处理, 2021, 37(6): 941-951. SUN Z Z, DAI M C, LEI Y, et al. Fast detection of ship targets for complex large-scene SAR images based on a cascade network[J]. Journal of Signal Processing, 2021, 37(6): 941-951. (in Chinese)