Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering

   

Document Detection Method with Multi-scale Feature and Semantic Optimization

  

  • Published:2026-05-07

多尺度特征与语义优化的文档检测方法

Abstract: To address the issues of unbalanced multi-scale feature expression, cross-level fusion loss, and insufficient bounding box localization accuracy in document detection, a document detection method with multi-scale feature and semantic optimization is proposed. This method includes three parts of design and improvement: first, a multi-branch convolutional attention fusion module is constructed, which expands the receptive field via multi-scale strip convolution and integrates the attention mechanism with the C3k module; second, a multi-scale neck coordinated with global semantics and high-order correlation is designed, which achieves fusion through global feature collection, hypergraph convolution-based correlation mining, and multi-scale scattering; third, the bounding box regression loss is optimized, and dual-threshold interval mapping is adopted to enhance the discrimination of sample losses. Experimental results on the EXAM, CDLA, D4LA, and PubLayNet datasets show that the average detection accuracy of this method is significantly higher than that of existing methods. Experimental results indicate that this method can break through the performance bottleneck of YOLO11n in the field of document detection, improve accuracy while ensuring efficiency, and provide a scientific and feasible application scheme for document detection.

摘要: 为解决文档检测中多尺度特征表达不均衡、跨层级融合损耗及边界框定位精度不足的问题,提出了一种多尺度特征与语义优化的文档检测方法。该方法包含三部分设计与改进:一是构建多分支卷积注意力融合模块,通过多尺度条带卷积扩展感受野,结合注意力机制与C3k模块;二是设计全局语义与高阶关联协同的多尺度颈部,依托全局特征收集、超图卷积关联挖掘及多尺度散射完成融合;三是优化边界框回归损失,采用双阈值区间映射增强样本损失区分度。在EXAM、CDLA、D4LA和PubLayNet数据集上的实验结果表明,该方法平均检测精度较现有方法有显著提升。实验结果显示,该方法可突破YOLO11n在文档检测领域的性能瓶颈,在保证效率的同时提升精度,为文档检测提供科学可行的应用方案。