Author Login Chief Editor Login Reviewer Login Editor Login Remote Office
Home Browse Just accepted

Just accepted

Please wait a minute...
  • Select all
    |
  • Hao Yaohui, Cai Jintian, Cui Xinyue, Lu Xianling
    Accepted: 2026-02-04
    Aiming at the problems that, in the prediction and analysis of public-opinion information spreading based on mean-field epidemic models, it is difficult to iteratively correct parameters within the model itself, which can lead to prediction bias, and that the LSTB model shows poor long-term prediction accuracy for public-opinion information propagation, the SEI⊃3;R-BiLSTM model integrating communication dynamics and deep learning technology was proposed. Firstly, the SEIR model was improved by classifying user states during the dissemination of online public opinion information into six categories: S (Information Unaware), E (Information Hesitant), I₁ (Positive Communicator), I₂ (Negative Communicator), I₃ (Neutral Communicator), and R (Information Immune), with clear definitions of the transition relationships between these states. Secondly, to enhance the model’s accuracy, the attention mechanism and residual connection were introduced by combining the BiLSTM neural network model, enabling the prediction of changes in the number of public opinion information communicators. Finally, 659,000 posts were collected from Sina Weibo across three high-profile public opinion events, including "the Jiang Ping Mathematics Competition", "Qin Lang Losing Homework", and "Fat Cat Jumping into the River", for experimental validation and analysis. The results showed that the time-series curves of the number of three types of communicators (I₁, I₂, and I₃) predicted by the SEI⊃3;R-BiLSTM model were generally consistent with the actual propagation trends, with high fitting accuracy. Furthermore, the performance of SEI⊃3;R-BiLSTM model was better than the four models including SEI⊃3;R-LSTM and SEI⊃3;R-ARIMA, based on four evaluation metrics including RMSE (0.162), MAPE (16.6%), Jaccard (0.74), and F1 score (0.72). In addition, the ablation experiment further confirmed the model’s rationality and effectiveness. These findings provide a model reference for predicting the development of online public opinion.
  • LUO Yangxia, YAO Yuanle, LI Xiaoyu, Zhao Jinlong
    Accepted: 2026-02-04
    To address the problem of exponential growth in malware and variants, and the limited capability of traditional detection methods to identify unknown threats, this paper proposes a MobileNetV2_AD detection method combining "multimodal visualization + lightweight" approaches. The main feature is the fusion of multi-source semantic visual information, representing byte entropy, disassembled instruction streams, and API call sequences as RGB three-channel images to achieve "one image integrating three domains." This reveals the complementary discriminative patterns of different semantic modalities in the image space, offering finer-grained feature extraction compared to grayscale images. Secondly, the lightweight backbone with strong scale perception incorporates Atrous Spatial Pyramid Pooling (ASPP) into MobileNetV2, enhancing the model's receptive field and multi-scale feature extraction capabilities. Additionally, a "category-feature" dual decoupled distillation approach is employed, using ResNeXt50 as the teacher model to simultaneously transfer macro classification logic and micro feature distributions. This resolves the "precision-generalization" trade-off issue in lightweight student models, resulting in an 11.7% increase in F1 score on unknown family samples after distillation. Finally, cross-dataset performance validation is conducted on the Kaggle (400 GB) and DataCon (latest attack-defense competition) public benchmarks, achieving accuracy rates of 96.41% and 98.68% respectively for MobileNetV2_AD, which is 6.31% and 4.21% higher than the original MobileNetV2. The inference speed reaches 280 samples per second, meeting the real-time detection requirements of terminal devices. The experimental results demonstrate that the proposed method significantly improves malware detection effectiveness in resource-constrained scenarios, providing an effective technical solution for cybersecurity defense.
  • Huang Jianwen, Chen Xuhang , Cheng Lianglun, Huang Jiajie , Huo Yejing
    Accepted: 2026-02-03
    To address the insufficient perception of geometric pose relationships and the difficulty of unified modeling for multiple tooth types in existing three dimensional dental keypoint detection methods, a quaternion based geometric aware and adaptive expert network (QGAE Net) is proposed. The method introduces a Multi Scale Quaternion based Geometric Positional Encoder (MS QGPE) that combines quaternion representation with geometric shape descriptors to learn local to global geometric structures of point clouds and enhance spatial relationship modeling. A Quaternion Guided Geometric Pose Attention (QG GPA) module is designed to constrain attention weights using quaternion similarity, allowing feature aggregation according to true geometric correlations. Furthermore, a Classification Driven Expert Routing Mechanism (CD ERM) is constructed to achieve unified modeling of heterogeneous tooth types and personalized feature learning through dynamically activated expert subnetworks. Experiments conducted on a clinical dataset containing 19,200 tooth samples demonstrate that the proposed method achieves mean absolute errors of 0.179 mm, 0.233 mm, 0.188 mm, and 0.301 mm for incisors, canines, premolars, and molars, respectively, with corresponding recalls of 85.1%, 87.1%, 91.5%, and 67.5%, and an overall classification accuracy of 97.5%. In addition, experiments on the public Teeth3DS+ and KeypointNet datasets demonstrate consistent performance improvements over existing methods, confirming the model’s strong generalization capability on public benchmarks and cross-category scenarios. Overall, QGAE Net effectively enhances keypoint detection accuracy while maintaining high deployment efficiency and scalability, making it suitable for automatic landmark annotation across diverse dental scenarios.
  • GONG Hongyi, LU Anwen, TANG Yijun, WANG Xiangxue, XU Jun, JIAO Yiping
    Accepted: 2026-02-03
    Integrating pathological images with genomic data through deep learning can significantly improve the accuracy of cancer prognosis prediction. However, in clinical practice, only a subset of patients have complete genomic sequencing results, which limits the comprehensive application of multimodal models. How to fully leverage limited genomic data to enhance the prognostic capability of pathological models is crucial for improving the clinical applicability and generalization ability of multimodal approaches. To this end, this paper proposes VMEF, a pathology enhancement framework based on the Variational Mixture of Experts(VMoE) module, designed to address training scenarios where pathological images are complete but genomic data is partially missing. The framework learns cross-modal mapping relationships between pathology and genomics using samples with complete modalities, generating imputed features for missing samples to improve overall prognostic performance. VMEF comprises three core modules: (1) a multi-source pathology encoding module that fuses global tissue structure with tumor microenvironment prior information, providing a rich pathological foundation for genomic feature generation; (2) a VMoE-based imputation module that models diverse pathology-to-genomics mapping relationships through a dual-expert structure and dynamic routing mechanism, adaptively generating biologically plausible genomic representations; (3) a prior-guided fusion module that leverages prior features to guide mutual calibration between genomic features and pathological representations, effectively alleviating inter-modal heterogeneity. Experiments on three TCGA cancer datasets demonstrate that when only 60% of training samples have genomic sequencing data, the average C-index reaches 0.6149; under complete modality conditions, the average C-index reaches 0.6370, surpassing existing multimodal methods. The experimental results demonstrate the effectiveness and robustness of the VMEF framework for cancer prognosis under modality-missing scenarios, providing strong support for its application in randomly missing data scenarios.
  • Wang Huiyong, Zhou Rumeng, Zhang Yi, Feng Tao, Zhang Xiaoming
    Accepted: 2026-02-03
    In recent years, large language models have demonstrated exceptional performance in natural language processing tasks. However, in domain-specific question answering tasks such as those in the medical field, lightweight large language models lack sufficient support from vertical domain knowledge, resulting in deficiencies in the reliability and accuracy of their generated outputs. To enhance the accuracy of lightweight large language models in medical question answering tasks, this paper proposes a knowledge graph-enhanced medical question answering approach for large language models based on entity recognition and knowledge filtering, named ERKF-MedQA. This approach mainly consists of two components: precise initial entity recognition and knowledge filtering. Entity recognition is implemented using a multi-stage prompting method. First, entity normalization retrieval is performed on the input question. Then, relevance assessment is conducted on the retrieved entities to determine the final valid entities. Knowledge filtering is accomplished using the Multi-Task Semantic Scoring Model (M-TSSM). This model integrates question and path information, scores the initially retrieved knowledge, and filters out the knowledge highly relevant to the question. Finally, the filtered relevant knowledge is integrated into prompts and input into the large language model, which then performs reasoning and generates answers. Experimental results show that the proposed method outperforms all baseline models in terms of BERTScore. Compared with the best-performing baseline model, the proposed method achieves improvements of 0.44%, 0.25%, and 0.34% in Precision, Recall, and F1-Score, respectively.
  • Luo Li, Li Bo, Wu Jiani, Wen Yuan, Dai Lu
    Accepted: 2026-02-02
    Urban underground pipeline defect detection is essential for ensuring the normal operation of underground pipeline systems. Due to the diversity, complex shapes, and varying scales of underground pipeline defects, existing detection methods often suffer from insufficient accuracy, resulting in many false positives and missed detections. This paper proposes an effective underground pipeline defect detection model, MEG-DETR, based on the RT-DETR framework.A Multi-scale Attention-based Intra-scale Feature Interaction (M-AIFI) module is designed, which combines Multi-scale Multi-head Self-attention(M2SA) to establish channel and spatial dependencies within high-level semantic features, enabling the comprehensive capture of fine-grained defect features. A Spatial Prior Multi-scale Feature Pyramid Network(SP-MSFPN) is constructed, introducing Efficient Local Attention (ELA) and adding a shallow feature layer to achieve efficient fusion across different scales, enhancing detection of small defects. Furthermore, a Gated Semantic Enhancement Module(GSEM) is developed, combining a multi-scale convolutional gated linear unit and a GSBottleneck to achieve collaborative enhancement of semantic and structural features, improving representation of complex defect semantics and structural details. Experimental results show that MEG-DETR achieves higher accuracy in underground pipeline defect detection, with an mAP of 83.44%, an improvement of 2.74% over the baseline; Precision and Recall increase by 1.69% and 3.03%, respectively. Compared with mainstream detection models, MEG-DETR demonstrates superior overall performance, verifying its effectiveness in complex defect scenarios.
  • YUN Jian , ZHANG Xueyi
    Accepted: 2026-02-02
    To address challenges in cross-domain collaboration posed by data privacy and compliance constraints, Federated Learning (FL) integrated with blockchain mitigates centralization risks in traditional FL, yet existing solutions face insufficient model update quality assessment and validator trust crises. This paper introduces a decentralized blockchain-based federated learning framework. It features a dynamic closed-loop system that coordinates quality, trust, and equity. It works by:1)Validator Quality Score,quantifies validator performance using multi-round cross-validation and spatiotemporal weighting, converting quality scores into dynamic voting weights to suppress collusion attacks;2)Model Quality Factor,tracks worker nodes' historical contributions via sliding windows and dynamically adjusts update thresholds using validator accuracy to distinguish high-value updates from malicious perturbations; 3)Model Quality-Driven Dynamic Proof-of-Stake,binds node stakes to contribution quality,ensuring high-stake nodes deliver high-quality outputs.The framework is tested on multiple datasets. Its synergistic mechanisms maintain strong performance under malicious attacks in Non-IID environments. Results show a 12.5% average accuracy gain over baselines. Defense effectiveness on CIFAR-10 improves by up to 38%. The system suppresses malicious nodes' stake to only 1%, far below the 13% baseline level. Communication costs remain comparable. This method successfully solves the consistency problem between model quality and validator performance.
  • Liu-Chengke, Guan-Donghai, Yuan-Weiwei
    Accepted: 2026-01-30
    Imbalanced time series classification represents a significant challenge in the field of deep learning, especially when critical information is concentrated in the minority class. Conventional data augmentation techniques, such as undersampling and oversampling, are designed to increase the proportion of minority class samples. However, they often give rise to issues including information loss, elevated overfitting risk, and the introduction of noise. While "Dual Augmentation Joint Label Learning" (JobDA) has been proven effective in alleviating such problems to a certain extent, it still lacks explicit mechanisms tailored to the minority class. To address this issue, this study proposes a novel approach named "Dual Augmentation with Minority Class Label Merging" (DAMLM). Specifically, this method first expands the training set through dual augmentation of samples and labels, and then uses a label mapping mechanism to merge the minority class labels, which effectively increases the proportion of minority class samples compared with JobDA. In detail, the method performs sample augmentation by repeating the original data, thus avoiding noise introduction. Meanwhile, during the training process, it adopts joint labels for the majority class and retains the original labels for the minority class—this forms clearer classification boundaries compared with other methods. On 38 imbalanced datasets from the UCR archive, we conducted experiments with six time-series classification models and compared methods by averaging the results across these models. Compared with seven representative baseline augmentation methods, DAMLM improves the mean F1 score by 1.24–6.27 percentage points and achieves the best performance on G-mean and other metrics.
  • Wang Liang, Deng Song
    Accepted: 2026-01-30
    As a critical infrastructure, the power system is vulnerable to threats such as equipment failures and malicious data tampering, while the scarcity of abnormal samples restricts the performance of traditional detection models. To address the problem of abnormal data imbalance in the power system, this paper proposes a data augmentation method based on the Mixture of Experts Wasserstein Generative Adversarial Network (LT-MoEWGAN). This method innovatively integrates Long Short-Term Memory (LSTM) and Temporal Convolutional Network (TCN) as dual expert modules, and realizes dynamic weight allocation at the feature level through a gating network to construct a multi-scale temporal feature extractor for generating high-quality samples. Simulation experiments based on real power system datasets show that: 1) Based on the Wasserstein distance metric, the distribution difference between the data generated by this method and real samples is the smallest (with medians of 0.043 and 0.135 respectively), and taking WGAN as the baseline, the generation stability is improved by 33%; 2) On classifiers such as XGBoost, LightGBM, Random Forest, Decision Tree, CNN, GAT, and MTGF-Conv the Area Under the Curve (AUC) of the proposed algorithm is improved by 1.5%–2% compared with baseline methods such as SMOTE, ADASYN, Borderline-SMOTE, GAN, WGAN, WGAN-GP, DCGAN, and WM_CVAE. This method effectively enhances anomaly detection performance through high-quality data augmentation, thus providing a reliable data augmentation solution for abnormal detection in power systems, and its innovative architecture has theoretical reference value for time-series data generation tasks.
  • Zhu Guozheng, Peng Wanda, Zhang Shuo, Cheng Xinru, Zhang Liye, Li Pengfei
    Accepted: 2026-01-30
    图像模型的跨域迁移已成为解决视频理解问题的有效范式,但其使用的方法仍有改进空间:全量微调计算开销大且易产生性能波动;多数参数高效迁移学习(PETL)方案采用单一适配器,在长程时序依赖与小样本场景中的时空表征能力易受限,更关键的是,现有方法普遍依赖隐式时序建模而忽视显式运动先验,导致难以充分捕捉复杂运动模式。为此,本文提出结构化适配器框架FDA4Video,基于PETL范式实现图像模型的高效适配:设计解耦式双路径适配器架构,同步捕捉局部动作细节与长程时序关联;提出光流移位协同注意力机制,将显式运动表征深度融合到时序建模过程中以强化跨帧依赖;同步引入可学习时间位置嵌入提供时序坐标基准,通过分阶段残差融合策略保障表征完整性。实验表明,该框架在Kinetics-400、UCF101和HMDB51上分别取得85.6%、98.2%与83.9%的准确率,较基线方法在减少约26%新增参数的前提下平均精度提升1.6%~2.2%,整体性能可媲美先进PETL策略,为图像模型的视频化迁移提供了一条兼顾精度、轻量与效率的技术路径。
  • Wang Yaoning, Wang Zhirui, Li Yun
    Accepted: 2026-01-30
    Large Vision-Language Models (LVLMs) have achieved remarkable progress in multimodal understanding and generation tasks. However, recent studies have revealed that these models exhibit significant vulnerability when exposed to adversarial attacks. Although several targeted black-box attack methods have been proposed to enhance the cross-model transferability of adversarial examples against LVLMs, their effectiveness and stability remain far from satisfactory. To address this issue, we propose a novel black-box targeted attack method with high transferability, termed Intermediate-Guided Transfer Attack (IGTA). The core idea of IGTA is to leverage a pre-trained vision encoder as a surrogate model and align the intermediate-layer features of the adversarial example with those of a target image. This intermediate-layer alignment strategy enables more direct and fine-grained manipulation of the model’s visual semantic understanding and high-level decision-making processes. Moreover, to further enhance transferability, the method incorporates fine-grained data augmentation techniques during optimization. Extensive black-box attack experiments on various mainstream LVLMs demonstrate that IGTA can efficiently generate highly transferable adversarial examples across different model architectures and task scenarios, significantly outperforming existing baseline approaches. Our findings reveal critical security risks in the visual reasoning components of current LVLMs and provide valuable insights for developing more robust multimodal models and corresponding defense mechanisms.
  • Shuni Chang, Miaomiao Yan, Yongqiang Cheng, Runfang Hao, Yiwei Shi, Yangyang Wei, Lei Zhao, Yuxuan Fan
    Accepted: 2026-01-30
    Pneumoconiosis is a chronic progressive interstitial lung disease. Accurate staging plays an important role in diagnosis, treatment planning, and prognosis evaluation. To solve the problem that single-branch deep learning models fail to capture both global and local features, this paper proposes a two-stage dual-branch feature fusion model for intelligent staging. In the first stage, the method preprocesses chest X-ray images by using a DualAttention-Net++ segmentation network. The network applies channel–spatial attention to remove cardiac and mediastinal interference. Biorthogonal wavelet reconstruction and spatial texture fusion are used to enhance small-sample class data. In the second stage, a Dual-Branch Feature Fusion Network (DBFF-Net) is designed. The main branch based on EfficientNetV2 extracts global morphological features, while the auxiliary branch based on InceptionV3 captures multi-scale local lesion features. An adaptive feature fusion module combines complementary features from both branches. Lung regions are divided according to the GBZ70-2015 standard, and the KD-Tree algorithm locates key points to achieve accurate lung field partitioning. The method is tested on 3006 multi-center chest X-ray images, including normal and stage I–III pneumoconiosis. The model achieves 86.2% accuracy, 89% precision, 88.5% recall, 98.5% specificity, 86.1% F1-score, and 92.4% AUC. The results show that the method improves the accuracy and robustness of pneumoconiosis staging and provides a practical approach for intelligent clinical diagnosis.
  • LIU Teng , CHEN Xingyu, LI Fengyong
    Accepted: 2026-01-30
    Aiming at the problems of difficult identification of microscale tampering and insufficient fusion of multi-domain features, this paper proposes an image tampering detection framework that integrates Frequency Domain Enhancement (FDE) and bottleneck Aggregated Attention (CFAM). The method adopts RGB+DCT dual-stream: FDE divides the features into low/mid/high frequency sub-bands in the frequency domain, uses frequency band attention to suppress redundant low frequencies, enhances tampered sensitive frequency bands, and uses multi-scale convolution to capture boundaries and mid/high frequency disturbances. After enhancement, it is refluxed through IDCT to achieve air-frequency complementarity. During the fusion stage, CFAM models channel importance and spatial significance in parallel within a 1×1 bottleneck and aligns two types of attention through linear aggregation in the same domain. This is different from the serial or single-dimensional modeling of existing attention mechanisms (such as SE and CBAM), which not only reduces computational overhead but also decreases information transmission loss, significantly improving the response to small targets and weak boundaries. Weighted loss and perturbation enhancement are introduced in training to alleviate class imbalance and strengthen robustness. Unified caliber evaluation and ablation experiments on multiple sets of public datasets show that this method outperforms recent comparable methods in terms of accuracy, robustness and cross-domain generalization, and FDE and CFAM have synergistic gains. It can still generate high-precision tampered masks in strongly disturbed scenarios such as recompression, blurring and scaling, and has good efficiency and deployability.
  • ZHOU Yan, ZHAO Xiaole
    Accepted: 2026-01-30
    Image complexity prediction holds significant research importance in the fields of visual cognition and computer vision. Existing methods still face challenges in effectively simulating the human visual system, balancing model complexity and efficiency, and generating interpretable pixel-level heatmaps. To address these issues, an Interpretable Global-Local Complexity Fusion Network (IGLCFN) is proposed. IGLCFN primarily consists of three key modules: a complexity-aware encoding block, a cross-domain interaction module, and an interpretable image complexity heatmap generation module. The complexity-aware encoding block adopts a dual-branch structure, integrating the global semantic modeling capabilities of Vision Transformers with the local micro-modeling capabilities of Convolutional Neural Networks. This design simulates the multi-level perception process of the human visual system for image complexity. The cross-domain interaction module fully considers the characteristics of features extracted by different branches and is responsible for aligning Transformer sequence features with Convolutional Neural Network spatial features. Furthermore, the interpretable image complexity heatmap generation module generates pixel-level heatmaps that align with human visual system perception and are interpretable, by constructing a pseudo-label dataset based on the accumulation of local perceptual scores for supervised learning. Quantitative experimental results on the IC9600 dataset demonstrate that IGLCFN achieves state-of-the-art performance across all key metrics. Compared to various mainstream baselines, image quality, and image complexity prediction models, IGLCFN achieves the highest prediction performance while maintaining low computational resource consumption. Additionally, experiments on the SAVOIAS dataset further validate the model's generalization ability and stability. Ablation studies further confirm the rationality and effectiveness of key modules such as the complexity-aware encoding block and the cross-domain interaction module. Qualitative analysis indicates that the heatmaps generated by IGLCFN can more accurately focus on human visual perception regions.
  • ZHANG Xuan, CAO Suzhen, LIU Guorui, HAN Lei, ZHANG Tianhao, YANG Xiaodong
    Accepted: 2026-01-30
    Under the digital transformation of supply chains, multi-source data exchange and cross-enterprise sharing face security risks such as data leakage and policy exposure. Traditional attribute-based encryption schemes suffer from inefficiency and insufficient dynamic control, making them unsuitable for such scenarios. To address these challenges, this paper proposes a revocable ciphertext search scheme with fully hidden access policies under a cloud-edge collaborative architecture, designed to meet the demands for efficient data interaction and privacy protection in complex supply chain environments. The scheme integrates the real-time processing capabilities of edge computing with the robust computational power of cloud computing. By leveraging collaborative ciphertext caching and pre-decryption services at the edge, it significantly reduces decryption latency and local computational load on users. Access policies are fully hidden through a combination of multi-value attributes with wildcard support, gate-based access structures, and Bloom filters, effectively preventing sensitive policy leakage during data transmission and storage. Furthermore, blockchain and smart contracts are introduced to enable efficient search and dynamic revocation: a "search contract" automates keyword trapdoor validation to shorten matching time, while a "revocation contract" dynamically updates permissions for attribute revocation. Under the Decisional Bilinear Diffie–Hellman (DBDH) assumption, the scheme achieves indistinguishability against chosen-plaintext attacks. Performance analysis shows that it incurs low computational overhead, particularly during the search phase, the computational cost is increased by nearly two-thirds compared to the BADS scheme, thereby providing efficient and secure data retrieval for collaborative supply chain management.
  • Chen Xin, Wang Mingwen
    Accepted: 2026-01-30
    xisting self-supervised monocular depth estimation models typically use convolutional neural networks and Transformers for feature encoding and decoding. However, these architectures struggle to flexibly and efficiently capture geometric features of irregular and complex objects in scenes. Moreover, as the network deepens, high-frequency edge information in the image is progressively weakened, resulting in depth features lacking crucial edge details and ultimately degrading model performance. To address these issues, this paper proposes a self-supervised monocular depth estimation model that integrates graph neural networks with Laplacian pyramid. Firstly, a Vision Graph Neural Network (ViG) is employed as the backbone to model global topological structure relationships within the scene. Secondly, a Laplacian Residual Fusion module is designed. It first concatenates the Laplacian pyramid residuals with the encoded and decoded features along the spatial dimension and then employs channel attention to recalibrate the channel weights. This achieves efficient fusion of the Laplacian pyramid in both spatial and channel dimensions, thereby enhancing the edge details of the decoded features. Finally, an Edge-Guided Graph Reasoning module is proposed, which treats pixels at object boundaries as graph nodes and then performs explicit graph reasoning to enhance the quality of depth estimation in these boundary regions. Experiment results compared with the baseline method Monodepth2 on the KITTI dataset demonstrate that the proposed model achieves a 12.2% reduction in the absolute relative error Abs Rel, a 21.4% reduction in the squared relative error Sq Rel, and the threshold accuracy reaches 89.6% at a threshold value of 1.25. Furthermore, experimental results on the Make3D dataset demonstrate that the model also achieves good depth estimation performance on unseen scenes. Qualitative visualizations also indicate that the proposed model achieves superior performance in predicting depth maps with sharper edges and richer detail.
  • Miao Qingqing, Xu Ming
    Accepted: 2026-01-30
    In the process of maritime informatization, underwater acoustic communication faces challenges such as Doppler shift, limited bandwidth resources, and quantum attacks. Traditional key encapsulation mechanisms cannot meet the complex demands of underwater acoustic communication when addressing these issues. To effectively resolve the aforementioned challenges, an Indistinguishability under Adaptive Chosen-Ciphertext Attack (IND-CCA2) secure N-th Truncated Polynomial Ring Unit (NTRU)-based underwater multicarrier key encapsulation mechanism (DTRM) is first proposed by combining Orthogonal Frequency Division Multiplexing (OFDM) technology with an NTRU dual encryption scheme, thereby achieving resistance against quantum attacks in the process of underwater acoustic communication. Secondly, to address the challenge of limited bandwidth resources in underwater acoustic communication, a small ciphertext expansion OFDM multicarrier fragmented transmission scheme is proposed, sig-nificantly improving ciphertext transmission efficiency under limited bandwidth. Additionally, in response to the complex attack environment in underwater acoustic communication networks, a Latin square session key structure based on ocean noise is designed during the key generation phase to implement a subsequent dynamic key update mechanism. This mechanism enables secure updates of session keys without recovering the session key and even when partial key fragments are lost, thereby significantly enhancing the system's forward security and robustness. Furthermore, the IND-CCA2 security of the scheme is formally proven. Finally, through experimental verification and analysis, DTRM has improved communication efficiency and achieved quantum-resistant security, significantly enhancing the overall performance of the underwater acoustic encrypted communication system.
  • LI Liang, XIAO Mingzhi , CHEN Xi
    Accepted: 2026-01-27
    To address the single point of failure, tampering risks, opaque verification, and the spread of misinformation in centralized news architectures, this paper proposes and implements a blockchain-based decentralized news retrieval and aggregation architecture that enables trustworthy storage, verifiable retrieval, and transparent governance of multi-source news data. The architecture integrates blockchain, smart contracts, and distributed storage to form an integrated chain–contract–storage trust system; a consensus mechanism ensures trustworthy data sources, and smart contracts automatically enforce governance rules to achieve traceability and high trustworthiness. An MVF robust task allocation algorithm and a KMPT/TMPT dual-layer verifiable indexing mechanism are proposed to optimize task scheduling and index verification, thereby improving retrieval and verification efficiency. The system incorporates Merkle tree–based integrity verification and a multi-source reputation weighting mechanism to enable adaptive adjustment of source reputation, enhancing retrieval accuracy and system robustness. The system is deployed in a distributed manner on an OpenStack private cloud and experimentally validated using 106,532 news records collected in 2024. Experimental results show that, compared with traditional solutions, trustworthy verification accuracy improves by 15.6% (P<0.01), the anti-tampering detection success rate reaches 99.6%, and the fake news suppression rate reaches 92.4%. The deep integration of retrieval and verification processes further improves the overall effectiveness of trustworthy retrieval by 22.3% (P<0.05). These findings verify the feasibility and engineering effectiveness of blockchain in trustworthy retrieval and data governance, providing theoretical support and engineering reference for building a highly transparent and auditable news ecosystem.
  • Zhang Sen, Li Jie, Zhang Jianwei, Li Hui, Wu Shouying, Wu Fengguo
    Accepted: 2026-01-27
    Recent advancements in Deep Reinforcement Learning (DRL) have shown strong capabilities in Unmanned Aerial Vehicle (UAV) decision-making, achieving significant results in various UAV control tasks. However, existing DRL studies on UAV control mostly assume an idealized “zero-delay” environment, overlooking the signal delays ubiquitous in real-world links. Directly transferring generic delay-handling methods validated on MuJoCo-like benchmarks to highly dynamic UAV platforms often fails to reproduce their original benefits and can even aggravate performance degradation. Through simulated experiments, this study verifies, in a simulated environment, the significant decline in performance experienced by DRL-trained UAV agents under signal delays, and provides an in-depth analysis of the causes. We propose the State-Prediction and Adaptive Decision-enhanced (SPADE), which effectively handles both fixed and non-fixed delays. In a 1v1 close-range combat task, SPADE achieves control performance comparable to that of a delay-free system. Within a delay range of 300–2400 milliseconds, SPADE demonstrates an average win rate improvement of 19.4% (for fixed delays) and 13.0% (for non-fixed delays) over baseline methods. It significantly compensates for the performance degradation of the Soft Actor-Critic (SAC) algorithm caused by delays and outperforms existing methods aimed at mitigating delay effects in deep reinforcement learning for UAV control. In general, this research highlights the negative impact of signal delay on the UAV control system and introduces SPADE as a robust solution to address these challenges, significantly improving UAV control performance under delay conditions.
  • ZHONG Zhifeng, PENG Zhaikun, HUANG Peipei, WANG Junyi, WANG Huifang, SONG Yufan
    Accepted: 2026-01-27
    Aiming at the existing problems of multi-scale feature decoupling, insufficient representation of deformable objects, and limited retention of shallow features in UAV-based small object detection,this paper proposes an improved YOLOv8s-based detection algorithm named AC-YOLO (Accurate-YOLO). The proposed algorithm introduces a Multi-scale Dilated Convolution Residual (MDCR) mechanism into the backbone network to enhance the convolutional module and architecture. By designing parallel convolutional structures with varying dilation rates, the model simultaneously expands the receptive field and strengthens local detail perception, while reducing redundant parameters. In the feature fusion stage, the neck structure is reconstructed by integrating a lightweight architecture (Slim-neck) and an additional P2 detection layer, which effectively improves the utilization of shallow features and significantly reduces the missed detection rate. To address challenges such as complex object contours and large-scale background variations, an improved deformable convolution-based dynamic detection head Deformable Convolutional Networks v3-dynamic head, (DCNv3-dyhead) is proposed. This module learns adaptive sampling positions in key regions, enabling better perception and representation of irregular-shaped objects. In addition, the Inner-IoU concept is introduced into the construction of the Shape-IoU loss function, replacing the original loss with Inner-shapeIoU to further enhance localization accuracy and bounding box regression performance. Comparative experiments conducted on the VisDrone-2019 dataset demonstrate that the improved model achieves a 12.8 percentage point increase in mAP50, reaching 52.6%, and a 9.1 percentage point increase in mAP50-95, reaching 32.6%. On the Flow-Img dataset, the model achieves an mAP50 of 84.5%. These results indicate that the proposed model exhibits superior accuracy and strong generalization capability for small object detection tasks from UAV perspectives.
  • Du Xiaogang, Zhang Cui, Wang Yingbo, Liu Tongfei, Lei Tao
    Accepted: 2026-01-27
    Medical image segmentation plays an important role in disease diagnosis, treatment planning, and surgical navigation. Traditional methods rely on large amounts of annotated data, which cannot meet privacy protection requirements. Source-free domain adaptation (SFDA) becomes a research focus because it does not require source data and fits privacy constraints. However, SFDA still faces some challenges, such as lack of spatial consistency, insufficient multi-scale feature representation, and significant pseudo-label noise. To address these issues, we propose a source-free domain adaptation segmentation network based on Multi-branch Collaborative Calibration and Reliable Weighted Consistency (MCC-RWC). MCC-RWC includes two key advantages. First, we design a multi-branch collaborative calibration module, which utilizes multi-decoder prediction and inverse transform calibration to generate high-quality spatially consistent probability predictions, and captures detailed anatomical structures and long-range feature dependencies through an embedded hierarchical feature aggregation module to enhance multi-scale feature representations. Second, we design a reliable weighted consistency module to generate high-quality pseudo labels through three rounds of differential forward propagation and confidence selection, and to suppress noise using weighted loss and consistency constraints to enhance the robustness of the model. Experiments on multi-center cardiac MRI and polyp datasets demonstrate that MCC-RWC outperforms some existing popular methods. MCC-RWC provides an efficient and privacy-preserving solution for cross-center clinical segmentation tasks.
  • Chunying Luo, Shifei Ding, Jian Zhang, Xuan Li, Wei Du
    Accepted: 2026-01-27
    Although value decomposition methods are widely used in multi-agent reinforcement learning, bias propagation caused by bootstrapping and maximization often leads to overestimation of Q-values, causing agents to get stuck in suboptimal strategies and resulting in significant fluctuations between success and failure in training. Traditional exploration strategies struggle to address this issue because they cannot guide agents out of suboptimal strategies. To solve this, we propose the Quest method, which dynamically underestimates Q-values to break the balance of suboptimal convergence in training and uses asymmetric bias to guide agents in more effective policy search. The key contribution of this paper is overcoming the limitations of traditional methods that optimize Q-network decomposition to improve performance. We propose an external intervention mechanism that dynamically guides the agent's exploration, bypassing the bottleneck of complex decomposition structures and effectively enhancing the agent's performance. We tested Quest in the StarCraft II multi-agent scenarios, where it showed a 130% improvement in robustness against suboptimal convergence and a 190% increase in win rate in complex scenarios like 6h_vs_8z. These results show that Quest improves exploration, stability, and performance in multi-agent scenarios.
  • Xiaoning Wu, Siqi Du, Yaxin Luopan, Rui Han, Ke Qiu, Haiting Hou, Zhen Chen, Yu Zhao, Shuo Wang
    Accepted: 2026-01-23
    With the rapid informatization of the maritime defense, federated learning becomes crucial for secure distributed data sharing. However, maritime defense sites face two challenges: 1) significant discrepancies in local data and tasks leading to negative knowledge transfer, and 2) difficulties in sharing knowledge due to limited communication bandwidth (remote locations of maritime defense sites) when continuously updating local model. Existing federated continual learning (FCL) methods primarily focus on extracting and aggregating the latest local model, yet lack in-depth research at the task level. This paper proposes FedTask, a task-grained FCL method, designed to adapt to multimodal maritime defense data (images, text, tabular data) while supporting task-level knowledge extraction, compact weight compression for narrow bandwidth. Through identification of similar task pairs among multiple clients (coastal defense sites), FedTask also enables positive knowledge transfer in both centralized and decentralized scenarios. Meanwhile, a unified architecture for federated continual learning is established. The system provides an integrated data processing pipeline and a scalable algorithm container, supporting low-code collaborative training across distributed nodes. Evaluations on real maritime defense datasets demonstrate that, compared to state-of-the-art 9 FCL methods, FedTask improves accuracy by 23% and 5% in centralized and decentralized scenarios, and reduces communication overhead by 75% and 85% under equivalent training time. The code is available at: https://github.com/LINC-BIT/FCLOnMDefenseData.
  • Xuanxuan-Ou , Wenhan-Hou, Xinhao-Hu, Jindong-zhao
    Accepted: 2026-01-23
    Federated learning improves the ability to protect data security by locally training models on distributed devices and sharing parameter updates, but still faces challenges such as privacy leakage and malicious client attacks. Traditional secure aggregation protocols suffer from client privacy leakage caused by server inference attacks. On the other hand, poisoning attacks by malicious clients can lead to performance degradation, reduced convergence, and even targeted misclassification of the global model. To address this issue, a bidirectional defense federated learning framework is proposed: clients upload the fully connected layers from their local models and the ciphertext of local model parameters obtained after adding noise. The fully connected layer parameters are used by the server to detect poisoning attacks, while the encrypted model parameters prevent server inference attacks. The noise added by clients automatically cancels out during the server's aggregation operation, ensuring the correctness of the aggregation result without affecting its accuracy. Under this unified framework, both client privacy protection and defense against malicious client poisoning attacks are achieved. Analysis shows that using an optimized elliptic curve key generation algorithm achieves privacy protection with reduced overhead. Meanwhile, experiments demonstrate that this method achieves high detection accuracy for four types of poisoning attacks on the MNIST and CIFAR-10 datasets, with a false positive rate below 6%. This provides an efficient and robust bidirectional privacy and security defense solution for federated learning.
  • QIN Na, SONG Menghao, LIU Yuan, ZHAO Yijing
    Accepted: 2026-01-23
    As an important intellectual property, the secure and trustworthy deployment of deep learning models is of great significance for promoting the application and innovation of artificial intelligence. The current model watermarking method based on the response of a single trigger set has security flaws: Firstly, the static decision boundary of the trigger set is vulnerable to adversarial perturbation attacks, resulting in a sudden drop in the verification accuracy rate; Secondly, the watermarking mechanism based on model features is highly sensitive to the dynamic reconstruction of the model structure and the pruning of high-proportion parameters. A Double cross-validation watermarking (DCVW) framework is proposed for the above problems. Firstly, the adversarial trigger sample set synthesized by projected gradient descent is adopted as the first watermark. Then, the deep feature extraction network is called to get the high-order implicit representations of the model application scenarios. The Bloom filter is further utilized to generate a dynamic hash chain to construct the second watermark, the watermark and key are handed over to a third-party institution for preservation as the model fingerprint. In the verification process, the ownership statement needs to satisfy the matching of the trigger set response and the zero watermark hash similarity of the model features at the same time. The robustness evaluation results of the watermark demonstrate that the DCVW scheme effectively preserves model accuracy even under 75% structured pruning. Under ambiguity attacks, the bit difference rate and false detection rate have improved by 5.15% and 1.04%, respectively, compared to contrast algorithms. The double cross-validation mechanism enhances the non-forgeability of the model watermark, offering a reliable solution for copyright protection in deep learning models.
  • Hanbing Xue, Chen Ni, Yuying Li, Jia Guan, Kai Fang, Wenqian Cui
    Accepted: 2026-01-23
    Entity redundancy, where multiple nodes represent the same real-world entity due to heterogeneous data sources or extraction errors, severely impacts knowledge graph quality and utility. To address entity canonicalization within a single knowledge graph, we propose a two-stage framework named CRGC-SRM. The core innovations are threefold: (1) A Contrastive Representation-Guided Clustering (CRGC) method that combines dual-view (context and definition) contrastive learning and employs the Minimum Description Length (MDL) principle for adaptive hierarchical clustering, eliminating manual thresholding; (2) A Submodular Redundancy Minimization (SRM) algorithm that models representative selection as submodular coverage maximization under partition matroid constraints, explicitly balancing knowledge coverage and redundancy with theoretical approximation guarantees; (3) Task-specific enhancements including type-consistency penalty and hard-negative mining to mitigate over-merging of polysemous entities. Experiments on multiple public and internal datasets demonstrate that CRGC-SRM improves clustering quality by approximately 2.7 percentage points over the strongest baselines, subsequently reducing entity redundancy from 29.7% to 7.8% on average (a relative redundancy reduction of 73.7%) while maintaining ≥98% knowledge coverage. Furthermore, it significantly improves query performance, increasing Mean Reciprocal Rank (MRR) by approximately 15.4%, Hits@1 by approximately 18.5%, and reducing the 95th percentile (P95) query latency by 27.7–35.9%. Our framework offers an efficient, theoretically-grounded, and practical solution for single-graph entity canonicalization.
  • Wang Yu, Yang Jun
    Accepted: 2026-01-22
    Mobile Edge Computing (MEC) serves as a key technology to meet the low-latency and low-energy requirements of computation-intensive applications by offloading tasks from user devices to nearby edge servers. However, in heterogeneous multi-server environments, traditional heuristic methods and single-agent Deep Reinforcement Learning (DRL) algorithms suffer from disconnection between perception and decision-making, difficulty in learning high-dimensional action spaces, and inefficiency in constraint handling, resulting in slow convergence and poor adaptability. To address these issues, an Efficient Coupled Collaborative Computing Multi-Agent Deep Reinforcement Learning framework (ECCC-MADRL) is proposed to optimize task offloading and resource allocation in heterogeneous multi-server scenarios. The proposed framework adopts a dual-agent collaborative architecture composed of client and master agents. It integrates an efficient coupled feature extraction module to capture the multidimensional correlation between task and resource features and employs a Per-action DQN decision mechanism to decompose high-dimensional combinatorial actions, enabling dynamic cooperation among multiple users and servers. A “constraint internalization” dimensionality reduction strategy is designed to exclude subchannel identifiers from the state and action spaces, significantly reducing action dimensionality. Furthermore, a heterogeneous multi-server collaborative model is established based on feature matching and load balancing mechanisms to achieve dynamic cross-server resource scheduling. Experimental results show that ECCC-MADRL achieves a 30–37% improvement in reward performance and reduces task deadline violations by 25–55% compared with MAPPO and MADDPG-based baselines across multiple representative scenarios. In energy-constrained settings, it further decreases battery-level violations by around 40%, demonstrating clear advantages in convergence, efficiency, and robustness. The findings indicate that the ECCC-MADRL framework provides an efficient and robust solution for task offloading in heterogeneous edge environments and offers valuable insights for the design and optimization of intelligent edge computing systems.
  • Huang Chen, Gao Yiyu, Jiang Cuiling, Wan Yongjing
    Accepted: 2026-01-21
    Fine-grained semantic segmentation plays an important role in obtaining accurate object boundaries. However, in real imaging scenarios, object edge regions often appear blurry, and insufficient modeling of edge features easily leads to inaccurate details in the segmentation results.Existing methods usually fail to pay sufficient attention to edge regions or rely on extra steps to extract edge information, which increases processing complexity. To this end, this paper proposes an edge-aware semantic segmentation network, EAM-UNet (Edge Aware Mamba-UNet). The network uses an improved Visual Mamba to capture long-range dependencies and reduces the computation of existing Visual Mamba modules through a bidirectional dilated selective scanning mechanism.It then uses a spatially guided dynamic upsampling module to dynamically control the upsampling process in edge regions and ensures accurate segmentation of edge details. Meanwhile, the network introduces an edge aggregation–aware module that extracts and aggregates edge features from semantic features and enhances representation on edge regions. Experimental results show that EAM-UNet performs excellently in scenarios that require high edge segmentation accuracy. On the medical image segmentation datasets ISIC 2017 and ISIC 2018, the method achieves mIoU of 82.52% and 84.07%, accurately depicting lesion boundaries and helping improve diagnostic reliability. On the industrial eyeglass-frame segmentation dataset GIS, the method achieves an mIoU of 98.37% and significantly improves the reliability of virtual try-on for frames. In addition, the method also outperforms existing approaches on Boundary IoU, a metric that focuses on edge segmentation quality.
  • MENG Kun, LI Mingxing, DING Jianwen, ZHOU Huachun
    Accepted: 2026-01-16
    With the in-depth digital transformation of enterprises, more and more enterprises have migrated their core businesses to the cloud. While the elastic scaling and "pay-as-you-go" features of cloud computing have significantly improved operational efficiency, risks such as natural disasters, cyberattacks, human operational errors, and hardware failures have also intensified. Once these risks occur, they will lead to cloud-based business interruptions and the loss of critical data, causing huge economic losses to enterprises. Therefore, Cloud Disaster Recovery(CDR)technology has become a core link in ensuring the stability of enterprises' information technology architectures and business continuity. CDR technology has gone through multiple stages of evolution, from early high-cost on-premises tape backups and self-built data centers, to gradual exploration combined with virtualization technology, and now to diversified disaster recovery solutions based on cloud computing. It has derived various business types such as cloud-based, hybrid cloud, and multi-cloud, with obvious differences in technical index requirements such as RTO (Recovery Time Objective) and RPO (Recovery Point Objective) among different types. However, systematic sorting and integrated research on the CDR technology system in the current industry are still relatively insufficient. Based on this, this paper integrates the current development status of CDR, the research first sorts out the key nodes in its development history, clarifies the core concepts of backup and disaster tolerance, and typical application scenarios in finance, manufacturing, medical care, etc. Then, it focuses on the "two locations and three centers" architecture, a mainstream architecture, to deeply analyze the research progress of key technologies such as data synchronization, distributed consistency verification, and fault detection. Finally, it summarizes existing challenges such as heterogeneous cloud resource synchronization and cluster split-brain recovery, and points out future research directions such as intelligent fault prediction using AI, providing technical references for enterprises to formulate disaster recovery and cloud migration strategies.
  • WANG Mingjun, LI Chaofeng
    Accepted: 2026-01-16
    The openness and broadcast nature of wireless communication make transmitted signals vulnerable to illegal eavesdropping and malicious attacks. Physical layer security becomes an effective way to enhance confidentiality, but existing methods have limited key space, fixed encryption parameters, and insufficient secrecy performance, so they cannot counter stronger attackers. To improve secure transmission at the physical layer, this paper proposes a novel multi-domain joint modulation secure communication method with dynamic control by a three-dimensional chaotic map. The method introduces sine, cosine, and exponential nonlinear terms into the Hénon map and constructs a three-dimensional Hénon-Sine-Cosine-Exponent chaotic mapping (3D-HSCE). The paper verifies its chaotic properties by bifurcation diagrams, Lyapunov exponent spectra, and phase portraits. The method uses polarization modulation (PM) to expand the modulation dimension and combines it with the Generalized Multi-Parameter Weighted Fractional Fourier Transform (GMPWFRFT) to scramble the signal constellation. The method applies 3D-HSCE to encrypt the constellation amplitude and phase and designs a state-feedback update mechanism for the initial values to dynamically regulate the key parameters. The system thus achieves dynamic constellation encryption. Simulation results show that the encrypted signal exhibits Gaussian-like characteristics. The key space reaches 2326 and the scheme shows high sensitivity and strong security. Even when the eavesdropper key differs from the correct key by 10-16 or 10-15, the bit error rate remains close to 0.5 and no valid information is recoverable. The method significantly strengthens resistance to parameter-scanning and modulation-recognition attacks. Compared with the 2D-HCE-DDL-GMPWFRFT method, the key space increases by 2108 times. When SNR = 10 dB, the system bit error rate decreases by about two orders of magnitude. Under three parameter conditions, the system secrecy capacity increases by about 332.5%, 45.8%, and 6.7%. The proposed method therefore enables secure transmission at the physical layer.
  • DING Tongguang, DU Shengdong, ZHAO Han, QIAN Wei, GUO Chushan, LIU Fan
    Accepted: 2026-01-14
    Short-term photovoltaic power forecasting is the foundation for optimizing power dispatch in power systems, whose accuracy directly impacts the overall system efficiency. However, photovoltaic power, affected by multiple meteorological factors, exhibits short-term volatility and randomness, bringing severe challenges to high-precision forecasting. In recent years, with the rapid development of deep learning, it has shown excellent ability in excavating intrinsically correlated features, offering a promising technical path for precise forecasting. The application of deep learning in short-term photovoltaic power forecasting is systematically studied. First, the typical application paradigm of mainstream deep learning models is stated. Then, the application status and performance of deep neural networks, such as convolutional neural networks, recurrent neural networks, Transformer, and graph neural networks, in the task are discussed, under the ideal scenario of sufficient available data with static and no incremental changes. Furthermore, the strategies and progress of deep neural networks based on technologies such as deep data augmentation, transfer learning, federated learning, and online learning are analyzed in addressing real-world challenges such as data scarcity and limited access. Finally, the challenges regarding robustness, generalization, and adaptation faced by existing research are discussed, along with the future research routes from the perspectives of forecasting architecture, optimization strategies, and so on.
  • Huang Ying, Ouyang Peng, Hou Yingwei, Wu Weigang
    Accepted: 2026-01-14
    In the real cloud-edge-end hierarchical federated learning scenario (such as the Internet of vehicles environment), the terminal equipment will switch edge servers due to the movement of physical location, resulting in the gradient update of the newly connected edge servers after the terminal moves based on different versions of different edge nodes, which will lead to the deviation of aggregation results and the decline of training efficiency. Existing studies only consider the reallocation of device belonging edge nodes after global aggregation, and the traditional asynchronous federated learning algorithm is difficult to adapt to the scene of frequent device movement, and it is difficult to solve the cross domain problem of mobile terminals. To address this, this research proposes FedSAQ(Federated Learning with Staleness-aware Adaptive Quantization), a mobile-oriented hierarchical federated compression algorithm for cloud-edge-end architectures: firstly, the aging coefficient is calculated by calculating the round difference between the download model and the upload gradient of the terminal device, measuring the difference between the local model and the edge node model, and then the aggregation weight of the edge server is adjusted according to the coefficient, and the adaptive quantization gradient compression algorithm based on aging is adopted, which can effectively use the cross domain training results and reduce the communication overhead. Compared with the benchmark algorithm, the model accuracy has been improved by 0.6% -5%, and communication overhead has been reduced by up to 50%.
  • WU Jingjing, JIA Xiangdong
    Accepted: 2026-01-14
    The inherent broadcast nature of wireless signals makes communication systems vulnerable to eavesdropping and interference threats. This is especially true in Integrated Sensing and Communication (ISAC) systems, where radar detection signals often convey sensitive information from communication users, complicating security concerns even further. To tackle this issue, this paper investigates a secure ISAC system supported by an Active Reconfigurable Intelligent Surface (RIS). It proposes a joint secure transmission mechanism that employs artificial noise (AN) to enhance physical layer security. In multiuser communication and sensing scenarios with a potential eavesdropper, we achieve coordinated optimization of communication, sensing, and security by jointly designing the base station (BS) precoding, AN interference signals, and RIS reflection coefficients. This approach minimizes the Cramér–Rao bound (CRB) for parameter estimation while satisfying the communication Quality of Service (QoS) requirements for legitimate users. To address the resulting multivariate coupled nonconvex problem, an efficient algorithm based on Alternating Optimization (AO), Semidefinite Relaxation (SDR), and Majorization Minimization (MM) is proposed. This method transforms the original problem into two subproblems for more efficient solutions. Simulation results indicate that, compared to the AN-free scheme, the proposed approach reduces the Signal-to-Interference-Plus-Noise Ratio (SINR) by approximately 20.3 dB (representing a decrease of about 99.07%) while sacrificing only around 0.5 dB (approximately 11.28%) in sensing performance , achieving a balance between security and sensing accuracy. Additionally, the active RIS-assisted approach markedly enhances security performance over passive RIS, validating the effectiveness of "enhancing physical layer security through perception" and providing a practical method for the secure design of ISAC systems.
  • Zhang Anqin, Li Zijian , Xue Mei
    Accepted: 2026-01-13
    With the increasing frequency and stealth of network attacks, traditional defense mechanisms struggle to timely identify unknown threats. Network intrusion detection, as a core component of cybersecurity, enables early anomaly identification and alerting, playing a crucial role in building intelligent and proactive defense systems. Existing intrusion detection methods still face limitations in modeling higher-order topological dependencies, coordinating global and local information, and maintaining robustness under adversarial perturbations, making it challenging to balance detection accuracy and generalization capability. To address these issues, this paper proposes a Multi-Scale Graph Diffusion Contrastive Learning-based Network Intrusion Detection model (MGDCL-IDS). The model establishes a task-oriented multi-scale graph representation learning framework and designs feature enhancement and information coordination mechanisms tailored to the characteristics of attack patterns. By leveraging topology-aware feature optimization and hierarchical contrastive learning, MGDCL-IDS achieves unified structural and semantic representations with high robustness. On a private real-world network intrusion dataset, the model achieves an accuracy of 98.57%, F1-score of 98.68%, precision of 98.41%, and area under the ROC curve (AUC) of 98.75%. On the NF_CSE_CIC_IDS2018 dataset, it outperforms recent methods by improving accuracy by 2.21%, F1-score by 2.08%, precision by 1.79%, and AUC by 0.74%. Experimental results demonstrate that MGDCL-IDS effectively enhances higher-order dependency modeling and structural robustness, achieving superior detection accuracy and false positive control, providing a viable solution for building efficient and reliable intrusion detection systems.
  • Yan Yue, Xu Guiqiong, Li Weimin
    Accepted: 2026-01-13
    In recent years, traditional graph neural networks have been widely applied to rumor detection tasks, with their core advantage lying in the effective capture of the structural characteristics of rumor propagation. However, most existing models only focus on the explicit interactions between tweets during the propagation process, neglecting the deep interaction between users and tweets, as well as the semantic relationships of tweet content. This limitation restricts the further improvement of rumor detection performance. To address these issues, this paper proposes a dual-hypergraph neural network model, UPST-HGNN, which integrates user, propagation, semantic, and temporal features. Specifically, the model first incorporates user features and high-order propagation features to construct a "user- propagation" hypergraph. At the same time, tweet semantic features are introduced and semantic similarity is calculated to construct a "semantic-temporal" hypergraph. Hypergraph convolutional networks combined with graph attention networks are utilized to extract hypergraph features, and feature representations are dynamically fused based on attention mechanisms. Finally, the fused feature vector is input into a classifier to complete rumor detection. Experimental results show that the UPST-HGNN model achieved accuracy of 86.27% and 94.10% on the publicly available PHEME and WEIBO datasets, respectively, which were 1.67% and 2.8% higher than the accuracy of the selected optimal baseline model. These results confirm that the model can more comprehensively capture rumor-related information, deeply understand the diversity and complexity of the propagation process, which effectively enhances detection performance and provides new insights for rumor detection research.
  • Sun Jing, Shang Kefeng, Meng Lichao, Wu Kangkai, Li jingjing
    Accepted: 2026-01-13
    As mega-constellations gradually become the core infrastructure of space–air–ground integrated networks, their resource scheduling faces multiple challenges, including high-dimensional constraints, dynamic task allocation, and multi-objective optimization. Intelligent scheduling methods in this field can be broadly classified into three categories: model-driven approaches, heuristic algorithms, and methods based on deep learning and reinforcement learning. Model-driven approaches leverage tools such as mixed-integer programming and graph-theoretical modeling to construct optimization models, describing the constraints and objective functions of resource scheduling through precise mathematical formulations. These methods can provide theoretically optimal solutions in static scenarios but suffer from exponential growth in computational complexity as the problem size increases, making them difficult to apply to large-scale dynamic scheduling. Heuristic algorithms, inspired by bio-inspired mechanisms, can rapidly generate approximate solutions and demonstrate high efficiency and flexibility in handling medium-scale problems. However, the quality of solutions is sensitive to parameter settings, and they generally lack guarantees of global optimality. Deep learning and reinforcement learning methods, driven by data and interactive learning mechanisms, can extract hidden patterns from massive scheduling datasets and continuously optimize decision strategies through agent–environment interaction. These approaches exhibit unique advantages in complex scenarios such as dynamic topologies and unexpected tasks. Nevertheless, they are highly dependent on training data, and the interpretability of their decision processes remains limited. Current research still falls short in areas such as cross-layer collaborative scheduling, robustness optimization, and heterogeneous resource integration. Future efforts should further explore multimodal learning and adaptive decision-making mechanisms, driving mega-constellation resource scheduling toward greater intelligence, efficiency, and reliability, and providing technological support for the large-scale deployment and application of space–air–ground integrated networks.
  • Kuo Du, Junfen Chen, Boyun Xie, Yan Li
    Accepted: 2026-01-13
    To address the challenges of large errors in 2D keypoint detection and the insufficient ability to model the spatial structural relationships among different joints, we propose a 3D human pose estimation model based on Graph Convolutional Cross-Fusion Attention (GCCFA). The model first introduces a GCN module to capture skeletal topology information from both local and global perspectives, thereby enhancing the structural constraints and representation capability among joints. Then, a learnable query fusion module is incorporated to dynamically select and fuse keypoint features through cross-attention, improving feature discriminability and robustness. Finally, a Transformer-based bone length correction post-processing method is proposed to adaptively learn the distribution of bone lengths from training data, refining the initial 3D estimations and effectively mitigating pose deviations caused by 2D detection errors. Experiments on the Human3.6M dataset demonstrate that, after bone length correction, our model achieves a P1 error of 38.4 mm and a P2 error of 30.4 mm, reaching state-of-the-art performance. Additional evaluations on the MPI-INF-3DHP dataset further verify the effectiveness of the proposed method.
  • PANG Ruixiang, WAN Li, ZHANG Zhi, WU Lulu, ZHOU Youlong
    Accepted: 2026-01-09
    】To address the demand for multi-algorithm adaptability, a reconfigurable application-specific instruction set processor is designed to efficiently support block ciphers and hash functions. The architecture adopts a Very Long Instruction Word (VLIW) structure, combined with symmetric clustered execution units and a cross-cluster register access mechanism, enabling parallel processing of logic operations, shifts, and table lookups. In instruction set design, fused logic–lookup instructions, multi-mode shift instructions, and vector operations are introduced to reduce pipeline stalls and enhance instruction density. The pipeline is organized into three stages—fetch, decode, and execute—while a bypass mechanism is employed to resolve data hazards and shorten the critical path. In algorithm mapping and optimization, block ciphers such as SM4 and AES leverage T-box lookups and four-cluster parallel scheduling, reducing each round to 4 and 7 cycles, respectively; hash functions such as SHA-256 and SM3 utilize multi-mode shift and fused Boolean logic instructions, achieving 8 cycles per round; SHA-3 is mapped through a three-phase strategy that reorganizes its five steps into three pipelined stages, effectively mitigating dependency-induced stalls. For hardware implementation, synthesis is carried out on the Xilinx Kintex-7 FPGA (XC7K325TFFG676-2), consuming 11,105 look-up tables (LUTs), 1,564 flip-flops (FFs), and 25 block RAMs (BRAMs), operating at a frequency of 125 MHz. Under these conditions, the processor achieves throughputs of 125 Mbps for SM4, 228.6 Mbps for AES, 125 Mbps for SHA-256, 125 Mbps for SM3, and 75.6 Mbps for SHA-3. The experimental results demonstrate that this architecture achieves unified acceleration of multiple algorithms with low resource overhead, outperforming general-purpose processor extensions, while offering high flexibility and scalability.
  • Xunsheng Ji, Kaixuan Fu
    Accepted: 2026-01-09
    Time series forecasting has been widely applied in fields such as finance, meteorology, and transportation. In multi-resolution forecasting scenarios, the demand for predictions at different temporal granularities is increasingly prominent. Traditional deterministic models struggle to capture the uncertainty of future sequences, while existing generative models, such as variational autoencoders, often suffer from limitations in generation quality and modeling flexibility. To address these challenges, we propose a Covariate-Conditioned Diffusion Model for Multi-Resolution Time Series Forecasting (MrC⊃2;DM). The model takes historical time series and future covariates as conditional inputs, introduces a resolution-category embedding to control prediction granularity, and employs a diffusion-based generative mechanism to progressively denoise and reconstruct future sequences from noise, thereby enabling uncertainty modeling and high-quality forecasting of future dynamics. Experimental results show that MrC⊃2;DM outperforms the best deterministic baselines by 5.4% (MAE) and 13.5% (MSE), and surpasses the best generative models by 28.1% (CRPS) across seven public datasets. Moreover, MrC⊃2;DM maintains higher stability and generalization ability in cross-resolution forecasting tasks.
  • Yang Chunxia, Wang Yulong , Wang Xin'ao
    Accepted: 2026-01-06
    With the rapid advancement of urbanization and industrialization, air pollution has become an increasingly severe issue. Accurate prediction of the Air Quality Index (AQI) is of great significance for public health and environmental protection. However, existing spatiotemporal graph neural network-based methods for air quality prediction still exhibit notable limitations. On one hand, due to structural constraints, these models struggle to effectively capture the influence of other stations on target stations through complex spatiotemporal propagation paths over long-term historical data. On the other hand, current dynamic graph learning approaches primarily rely on short-term sequences, failing to extract more representative spatial dependency patterns from long-term observational data. To address these issues, this paper proposes a Spatiotemporal Context-Aware Graph Network (ST-CAGN). The model incorporates a long-sequence spatiotemporal context extraction module based on a pre-trained encoder, which encodes lengthy historical data into low-dimensional representations rich in semantic information and efficiently captures long-range spatiotemporal dependencies across stations. Additionally, a multi-scale dynamic graph learning mechanism based on long sequences is introduced to overcome the limitations of constructing dynamic graphs solely from short-term sequences. This mechanism extracts steady-state spatial dependency features from low-dimensional representations of long-term historical sequences and adaptively integrates them with transient spatial correlations captured from recent fluctuations, thereby more accurately modeling the complex dynamic spatial dependencies between stations. Experimental results demonstrate that ST-CAGN significantly outperforms mainstream baseline models on three real-world air quality datasets. For 6-hour, 12-hour, and 24-hour prediction tasks, the MAE decreased by an average of 4.19%, 5.47%, and 6.53%, respectively, while the RMSE was reduced by an average of 2.10%, 3.14%, and 3.95%, validating the effectiveness and superiority of the proposed model in long-sequence spatiotemporal forecasting tasks.
  • WANG Qijun , LIU Qingcheng , GU Yang , YU Yanheng
    Accepted: 2026-01-06
    Object detection in Unmanned Aerial Vehicle (UAV) imagery is severely challenged by tiny object sizes and strong background clutter, where traditional algorithms often suffer from feature degradation and information loss, leading to a decline in accuracy. To address these challenges, this paper proposes a tiny object detection algorithm based on hybrid dynamic reparameterization, termed HDR-YOLO. First, to overcome the limitations of conventional convolutions in extracting features from tiny objects, the C3K2-PC module is reconstructed by incorporating the Pinwheel-shaped Convolution (PConv), which significantly enhances the backbone network's ability to perceive and capture fine-grained details. Second, to tackle the problem of information degradation during multi-scale fusion, this work designs the Hybrid Dynamic Reparameterization Module (HDRep), which achieves high-fidelity multi-scale feature reconstruction through a combination of low-distortion scale transformation and deep feature refinement. Building upon this, a Multi-Scale Feature Fusion Neck (MSFPN) is introduced, which optimizes cross-scale information flow to effectively boost the model's robustness in complex backgrounds. Experimental results on the VisDrone-2019 dataset demonstrate that HDR-YOLO achieves an mAP@50 of 43.7% and an mAP@50:95 of 26.5%, outperforming the YOLOv11n baseline by 10.2% and 7.0%, respectively. Furthermore, experiments on the public AI-TOD dataset and a self-built HVL-Cond dataset validate the superior generalization and stability of the proposed algorithm.
  • ZHAI Jie, MENG Tian-xin, RUAN Tong, LIU Jing-Ping, LI Bin-bin
    Accepted: 2026-01-05
    Online "Light Consultation" decision trees are designed to provide patients with minor health issues guidance on appropriate departments, preliminary diagnoses, or treatment suggestions. However, constructing such decision trees based solely on medical literature fails to meet the diverse needs of patients in real-world light consultation scenarios and suffers from delays in reflecting the latest advances in specific disease areas. While medical experts can manually construct these trees, the process is inefficient and lacks standardized representation due to reliance on individual experience. To address these limitations, this paper proposes a novel task: generating decision trees from online light consultation dialogue texts (DTGOLC). For this task, we introduce two methods: a large language model-based approach for light consultation decision text summarization generation (LCDTSG-LLM), and a medical decision path fusion method for decision tree generation (MDPFDT). Our study constructs 5,547 decision paths and generates nearly 30 light consultation decision trees. Finally, these decision trees are subsequently integrated as an external knowledge base in a retrieval-augmented generation (RAG) framework. Experimental results demonstrate that the proposed decision trees significantly outperform baseline models in assisting lightweight diagnostic decision-making tasks, achieving an average improvement in F1-score of 27.58% compared to the baseline using original consultation dialogues as the knowledge base.
  • GUO Wei, FAN Zixi, QU Haicheng
    Accepted: 2026-01-05
    This paper proposes a time–frequency collaborative attention algorithm for insulator defect detection in UAV power inspection. The method is used to address the problems of high missed detection rate of small defects caused by large differences in target scales and low detection accuracy under complex backgrounds.First, a Wavelet Transform Convolution Module (WTCM) is integrated into the backbone network to enlarge the receptive field and enhance the extraction of low-frequency information. Building on this, a Multi-scale Convolutional Attention Augmentation Module (MCAAM) is designed. It combines channel and spatial attention mechanisms to further suppress interference from complex backgrounds. Second, a Frequency-domain Modulation Attention Mechanism (FMAM) is introduced to improve the model’s robustness in complex environments. This mechanism fuses frequency and spatial information, enabling the model to perceive image features more comprehensively and ensuring detection stability. ly, an Adaptive Weighted Feature Fusion (AWFF) module is designed. It dynamically adjusts feature fusion weights to enhance cross-dimensional feature interaction, which further improves the network's representation capability. Experimental results show that the proposed algorithm achieves 92.4% in mAP50, an improvement of 4.8% over the baseline model. The recall rate for small defects increases by 5.2%. The inference speed (FPS) increases from 112 to 132.Furthermore, the AP values for three defect categories—insulator damage, hammer, and flashover—improve by 7.6%, 1.7%, and 9.8%, respectively. Compared to the original YOLO11n model, the improved model demonstrates superior performance in both detection accuracy and inference efficiency.
  • XU Zhixia, WANG Rui, SHEN Xiaowei, HE Bing, KANG Weijie
    Accepted: 2026-01-05
    The networked radar jamming resource allocation is a typical NP-hard problem and also a significant challenge, requiring the use of various optimization algorithms to solve it. To address the issues of slow computational speed and poor adaptability in traditional jamming resource allocation optimization algorithms, progress of intelligent optimization algorithms in this field was reviewed. Firstly, a mathematical model and solution framework for networked radar jamming resource allocation were constructed, the difficulties in solving this model were analyzed, and the obvious advantages of intelligent optimization algorithms in terms of computational efficiency, global optimization capability and robustness were emphasized. Then, taking the genetic algorithm, particle swarm optimization, ant colony optimization, and the various improved algorithms as typical examples, the implementation processes, solution effectiveness, and strengths and weaknesses of intelligent optimization algorithms in networked radar jamming resource allocation were analyzed in detail. Additionally, the application of fusion algorithms and other bionic/machine learning-based intelligent optimization algorithms in this field was summarized, the advantages and disadvantages of various algorithms were compared and analyzed from aspects such as adaptability, convergence, and global search capability, fully demonstrating the current development status of intelligent optimization algorithms in this application direction. Finally, combined with the multiple challenges currently encountered in networked radar jamming resource allocation, future development directions of intelligent optimization algorithms were prospected from four aspects: algorithm comparison, optimization speed, fusion innovation and dynamic adaptability. It provides a valuable reference for the research and engineering practice of intelligent optimization algorithms in networked radar jamming resource allocation.
  • Hu jing, Zhao xinyu, Peng mingchao
    Accepted: 2026-01-05
    Cross-modal image-text retrieval, a core task in multimodal understanding, faces inherent heterogeneity between images and texts in modal expression, semantic abstraction levels, and structural organization. How to achieve high-precision semantic alignment and bridge the cross-modal gap is a key challenge in current research. To address this, this paper proposes DPNet, an image-text retrieval model based on cross-domain feature decoupling and semantic prototype guidance, aiming to enhance fine-grained image-text matching and retrieval robustness in complex scenarios.The model is designed with frequency-spatial joint decoupling, hierarchical semantic enhancement, and dual-modal interactive attention mechanisms, realizing the structured reconstruction of cross-modal features and the enhancement of discriminative expression. To tackle the modeling flaw that traditional methods struggle to balance spatial structure and frequency-domain texture modeling, the proposed frequency-spatial decoupling module adopts a heterogeneous multi-head attention mechanism. It preserves local spatial semantics while mining global periodic patterns, achieving multi-dimensional collaborative expression of visual features. To compensate for the imbalance between local vocabulary and global semantic alignment, the semantic enhancement module integrates part-of-speech tagging and depthwise separable convolution, guiding the model to focus on key semantic regions and improving its ability to model semantic patterns like factual descriptions and subjective evaluations.Additionally, to address imbalanced training samples and noise sensitivity, the proposed dynamic boundary triplet loss adaptively adjusts the similarity discrimination boundary. Combined with semantic prototype contrastive learning, it further enhances intra-class compactness and inter-class separability. Experimental results on Flickr30K and MSCOCO show that the proposed method achieves 1.0%, 0.1%, 0.2% and 1.4%, 0.6%, 0.3% improvements in R@1, R@5, R@10 metrics respectively on MSCOCO for fine-grained image-text retrieval, significantly outperforming existing state-of-the-art methods. This study provides an efficient and feasible solution for high-precision and real-time retrieval in complex cross-modal scenarios.
  • Ouyang Ling, Li Hui, Lan Ju Long, Wu Jiang Xing
    Accepted: 2026-01-05
    The Dynamic Heterogeneous Redundancy Architecture (DHR) uses multi-dimensional dynamic reconfiguration to achieve heterogeneity and redundancy of executors, and closed-loop iteration based on policy adjudication realizes dynamic update of the system, giving the system inherent security genes and making it have natural proactive defense capabilities. However, DHR usually requires large heterogeneity of the executor to avoid attack and escape caused by shared vulnerabilities. However, the differences caused by heterogeneity can lead to inconsistent application state transitions and inconsistent encryption output of the executor, resulting in the problem that the output results cannot be adjudicated. Aiming at the above problems, this paper proposes a hidden leader distributed consensus algorithm based on distributed consensus theory. The algorithm adopts a relative time-based program process synchronization method to solve the problem of out-of-step running state of heterogeneous executors, and adopts a secret source normalization strategy to solve the problems of data encryption and random number differences in messages of heterogeneous executors. The operating mechanism of the algorithm is introduced in detail and the algorithm flow is given. Finally, a verification platform is built to compare and test the effectiveness of the algorithm. Test results show that in complex process scheduling scenarios, compared with existing algorithms, this method can improve the process synchronization success rates by 0.82% and 5.65% respectively, and can achieve correct adjudication of encrypted data. Compared with the ciphertext adjudication method based on encryption and decryption, the throughput can be improved by approximately 68.38%.
  • Xu Chongcong, Zhou Zhifeng
    Accepted: 2026-01-05
    The diagnosis of scoliosis relies on the precise measurement of the Cobb Angle. Traditional manual measurement has problems such as strong subjectivity, low efficiency and poor consistency, which is difficult to meet the clinical standardization and efficiency requirements. To solve this problem, this study proposes an automatic measurement method for spinal Cobb angles based on geometric constraint hybrid attention SwinUNet (GHA-SwinUNet). This method is based on the U-Net architecture, introduces the Swin Transformer module to enhance the ability of global structural modeling, combines the Hybrid Local Channel Attention (MLCA) to improve the perception of local details of the vertebral body, and designs a geometric constraint post-processing strategy to solve the problem of vertebral adhesion. In the Cobb Angle calculation stage, the end plate straight line fitting method is adopted to avoid the geometric deviation of the traditional midpoint method. The experimental results show that this method has excellent segmentation performance on the self-built spinal X-ray dataset: The Dice similarity coefficient (DSC) reached 0.9483, the Precision was 0.9504, and the average cross-union ratio (mIoU) was 0.9483, which was 1.11% higher than that of the traditional U-Net DSC and 0.27% higher than that of the MA-Net DSC. Meanwhile, in the cross-validation of the Synapse and AASCE2019 public datasets, the model maintained stable performance (DSC values were 0.9512 and 0.9425, respectively). The consistency correlation coefficient (ICC) between the automatic measurement and manual measurement of Cobb angles is greater than 0.90, and the mean absolute deviation (MAD) is approximately 3°, indicating good consistency. In conclusion, this method not only ensures the accuracy of segmentation and measurement but also takes into account efficiency. Moreover, it has strong generalization ability of multi-source images, providing reliable technical support for the quantitative assessment and clinical auxiliary diagnosis of scoliosis.
  • LAN Chenxi, SHEN Zongliang, FENG Jianzhou, ZHANG Hua
    Accepted: 2025-12-30
    Large Language Models exhibit powerful in-context learning and text generation capabilities, showing significant potential in tasks such as information retrieval and presentation writing. However, their ability is often insufficient when dealing with tasks that demand high timeliness, truthfulness, and specificity/format requirements, such as generating formatted documents in specific domains, where effective methods are still lacking. Consequently, it is necessary to integrate both agent technology and model fine-tuning techniques. This paper proposes a formatted document generation method that combines an LLM-based Agent architecture with Large Language Model fine-tuning. The LLM-based Agent architecture is utilized to acquire and verify real-time news information, which in turn is used to construct a domain-specific LLM fine-tuning dataset. Subsequently, fine-tuning techniques are employed to enhance the model's ability to generate style-compliant (normative) text. The method was tested, optimized, and validated using datasets from different domains. Experimental results demonstrate that the proposed method outperforms baseline approaches across evaluation metrics such as semantic similarity and text similarity. This indicates that the proposed method effectively strengthens the model's understanding and text generation capabilities for specific domains, and provides reliable guarantees for the timeliness and truthfulness of the generated text.
  • GAO Liulong, HUANG Zhengkun, JIANG Xiaowei, SUN Gongxing, LI Jiafeng
    Accepted: 2025-12-30
    In recent years, deep learning has achieved tremendous success in application fields such as computer vision and natural language processing. This has led researchers in high-energy physics to also turn their attention to deep learning technologies and explore their application in hadronic jet tagging tasks. Initially, researchers converted jet data into image and sequence data, and used convolutional neural networks and recurrent neural networks to tag jets. However, these approaches suffered from problems such as low computational efficiency and poor interpretability. To address these issues, researchers have made improvements to network architectures from multiple perspectives and conducted training on various constructed jet tagging datasets, thereby enhancing the classification performance of the models. This paper provides an in-depth analytical review of the key modules of new network models, including methods for representing jets based on sets, the application of equivariant neural networks, and the exploration of jet foundation models. Meanwhile, the paper analyzes and compares various tagging classifiers, evaluates the performance of different network architectures, analyzes and summarizes the current status of relevant models, and discusses the application prospects of deep learning models in jet tagging tasks.
  • Zhen Han , Li Yu
    Accepted: 2025-12-30
    Small object detection in remote sensing images faces challenges because of weak feature representation, complex background interference, and multi-scale variations. These challenges are more severe in resource-constrained environments, where both detection accuracy and model efficiency are required. This paper proposes an efficient detection framework named Multi-Scale Spatial Attention YOLO (MSSA-YOLO). The framework uses three lightweight modules to improve performance. The Hierarchical Feature Block (HFBlock) enhances small object features by dynamic scale selection and dual-axis multi-scale convolution. The Lightweight Downsampling Module (LDSample) applies efficient downsampling with residual connections to retain critical information. The Focal-WIoU Loss refines bounding box regression by adaptive weighting and gradient suppression. Experiments are conducted on three public datasets, VEDAI, VisDrone2019, and AI-TD. MSSA-YOLO achieves mAP50 values of 0.754, 0.436, and 0.519. Compared with YOLOv11s, the parameter count is reduced by 8.9%, while detection accuracy improves by 7%, 4.4%, and 18.5%. The framework also outperforms advanced models such as SP-YOLOv8s and SMN-YOLO. The results show that MSSA-YOLO achieves a balanced trade-off between accuracy and efficiency. The method is suitable for real-time small object detection and generalizes well to objects of different scales in remote sensing scenarios.