作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

基于特征交织与仿生中央凹聚焦的实时小目标检测方法

  • 发布日期:2026-05-20

Real-Time Small Object Detection via Feature Weaving and Bio-inspired Foveal Focusing

  • Published:2026-05-20

摘要: 复杂场景下的小目标检测任务长期面临两大技术瓶颈:一是微弱目标特征在深层神经网络中极易衰减,二是环境背景噪声干扰严重。针对上述瓶颈,本研究提出一种端到端实时小目标检测模型WF-DETR。特征提取阶段设计特征交织网络,摒弃简单层级堆叠方式,采用异构特征交织策略;借助跨层级特征互校正机制,将深层语义信息与浅层几何细节紧密交织并双向校准,在保证高层语义强度的同时,有效抑制特征传递过程中的空间信息衰减,缓解小目标特征丢失问题。颈部网络部分受人类视觉生理机制启发,提出FoveaFormer模块,通过自适应稀疏注意力机制与门控单元模拟人眼中央凹成像机制,动态过滤背景冗余噪声,聚焦高价值目标区域,显著提升特征纯度。此外,引入哈尔小波下采样算子重构下采样过程,从频域角度克服传统池化导致的高频纹理细节不可逆丢失弊端,进一步增强小目标特征辨识度。在VisDrone2019基准数据集上的实验结果表明,模型mAP@0.5:0.95达23.7%,推理速度高达166.3 FPS。实验结果充分验证WF-DETR在复杂背景小目标检测任务中的实时性与优越性。

Abstract: Small object detection in complex scenarios has long grappled with two major technical bottlenecks: the propensity for weak object features to attenuate within deep neural networks, and the severe interference caused by environmental background noise. To address these challenges, this study proposes WF-DETR, an end-to-end real-time small object detection model. In the feature extraction stage, a Feature Weaving Network (WeaveNet) is designed. Diverging from simple hierarchical stacking, WeaveNet employs a heterogeneous feature weaving strategy. Leveraging a cross-level feature mutual correction mechanism, it tightly interweaves and bidirectionally calibrates deep semantic information with shallow geometric details. This approach effectively suppresses the attenuation of spatial information during feature transmission and mitigates small object feature loss, all while maintaining high-level semantic strength. Inspired by human visual physiological mechanisms, the neck network incorporates a FoveaFormer module. By simulating the human foveal imaging mechanism via adaptive sparse attention and gating units, this module dynamically filters redundant background noise and focuses on high-value target regions, significantly enhancing feature purity. Furthermore, a Haar Wavelet Downsample (HWD) operator is introduced to reconstruct the downsampling process. From a frequency domain perspective, this overcomes the irreversible loss of high-frequency texture details caused by traditional pooling, further augmenting the discriminability of small object features. Experimental results on the VisDrone2019 benchmark dataset demonstrate that the proposed model achieves mAP@0.5:0.95 of 23.7% and an inference speed of 166.3 FPS. These results fully validate the real-time performance and superiority of WF-DETR in small object detection tasks within complex backgrounds.