Author Login Chief Editor Login Reviewer Login Editor Login Remote Office
Home Browse Just accepted

Just accepted

Please wait a minute...
  • Select all
    |
  • XIE Binhong, SUN Xiaosong, ZHANG Rui
    Accepted: 2026-05-20
    Small object detection in complex scenarios has long grappled with two major technical bottlenecks: the propensity for weak object features to attenuate within deep neural networks, and the severe interference caused by environmental background noise. To address these challenges, this study proposes WF-DETR, an end-to-end real-time small object detection model. In the feature extraction stage, a Feature Weaving Network (WeaveNet) is designed. Diverging from simple hierarchical stacking, WeaveNet employs a heterogeneous feature weaving strategy. Leveraging a cross-level feature mutual correction mechanism, it tightly interweaves and bidirectionally calibrates deep semantic information with shallow geometric details. This approach effectively suppresses the attenuation of spatial information during feature transmission and mitigates small object feature loss, all while maintaining high-level semantic strength. Inspired by human visual physiological mechanisms, the neck network incorporates a FoveaFormer module. By simulating the human foveal imaging mechanism via adaptive sparse attention and gating units, this module dynamically filters redundant background noise and focuses on high-value target regions, significantly enhancing feature purity. Furthermore, a Haar Wavelet Downsample (HWD) operator is introduced to reconstruct the downsampling process. From a frequency domain perspective, this overcomes the irreversible loss of high-frequency texture details caused by traditional pooling, further augmenting the discriminability of small object features. Experimental results on the VisDrone2019 benchmark dataset demonstrate that the proposed model achieves mAP@0.5:0.95 of 23.7% and an inference speed of 166.3 FPS. These results fully validate the real-time performance and superiority of WF-DETR in small object detection tasks within complex backgrounds.
  • HE Ruiying, TIAN Youliang, XIANG Axin, ZHOU Feng, LIU Kaiqi
    Accepted: 2026-05-20
    Cloud computing offers efficient data storage and management, facilitating convenient data sharing and access. However, ensuring data security and user privacy in open cloud environments remains a critical challenge. Ciphertext-policy attribute-based encryption (CP-ABE) has been widely adopted to enforce fine-grained access control over data stored in cloud servers. Nevertheless, existing schemes still face limitations in handling hierarchical data and tracing malicious ciphertexts, making it difficult to simultaneously achieve efficient multilevel access and data provenance assurance. To address these challenges, this paper proposes a hierarchical attribute-based access control scheme with traceability over ciphertext. First, a hierarchical CP-ABE framework is employed to construct an efficient multilevel access mechanism. By integrating multiple hierarchical access trees into a unified structure, the scheme enables encryption and decryption of data at different levels under a single policy, significantly reducing computational overhead. Second, a zero-knowledge proof-based signature mechanism is introduced to securely bind ciphertexts with their creators while preserving data owner anonymity, enabling accurate tracing of malicious ciphertext sources. Finally, security analysis demonstrates that the proposed scheme can effectively resist chosen-plaintext attacks. Experimental evaluation shows that, compared with existing approaches, the scheme achieves lower encryption and decryption overhead, making it well suited for secure, efficient, and traceable data sharing in cloud environments.
  • ZHU Yanbin, ZHANG Hanling, WANG Runmin
    Accepted: 2026-05-20
    Micro-expressions are fleeting, involuntary facial muscle movements that can reveal genuine emotions individuals attempt to conceal. However, micro-expression recognition faces numerous challenges, including short duration, low intensity, prominent local features, limited scale of public datasets, and significant individual differences, which constrain the recognition accuracy and generalization capability of traditional methods. To address these issues, this study proposes a single-stream fine-grained micro-expression recognition method based on dynamic routing experts. Inspired by the mixture-of-experts model, this method replaces the traditional multi-head self-attention layer in Transformers with a dynamic routing expert mechanism. It dynamically selects expert networks through a sparse activation strategy and leverages a collaboration mechanism among experts to enhance feature representation capability, thereby improving model representational capacity while maintaining computational efficiency. Additionally, a multi-grained asymmetric aggregation module is designed, which integrates orientation-aware convolution and channel attention to effectively decouple spatial features and adaptively adjust feature granularity at different network levels, enabling more precise capture of subtle directional movements and local texture variations in micro-expressions. Experiments conducted on three public datasets, SAMM, SMIC, and CASME II, demonstrate that the proposed method significantly outperforms mainstream approaches. On the composite dataset, the method achieves an unweighted average recall of 87.65% and an unweighted F1-score of 87.21%. The experimental results validate the effectiveness of this method in capturing subtle dynamic features of micro-expressions, providing reliable technical support for emotion recognition in complex scenarios.
  • Nie Zeli, Sun Danfeng, Zhao Jianyong, Wu Huifeng
    Accepted: 2026-05-19
    The widespread application of robots and vision systems in factories has promoted mixed-line production characterized by small batches and multiple product varieties, while also sharply increasing the diversity of target size specifications and the uncertainty of arrival sequences, thereby making stacking tasks at many transition stages of production lines still highly challenging. With the increase in the number of targets in the sequence, it becomes difficult to guarantee both the solution time and the solution quality of the stacking task. To address the above issues, a hybrid optimization algorithm with stimulus memory for sequence stacking tasks is proposed. The algorithm decomposes the sequence stacking task into two subtasks: combinational block knowledge base construction and stacking decision optimization. First, basic target combinations satisfying quality thresholds are searched for in the initial sequence of targets to be stacked, so as to construct a combinational block knowledge base. During this process, a stimulus memory mechanism is introduced to dynamically update the existing combinational knowledge. Subsequently, each combinational block is equivalently treated as a macro-target, and the placement sequence and placement orientation of all targets are jointly optimized. Comparative experimental results on datasets with different size distributions show that, compared with the baseline algorithms, the proposed algorithm can reduce the solution time of stacking plans by at least 4.94% while achieving the optimal average filling rate of stacking space, which verifies its effectiveness in sequence stacking tasks. The ablation experimental results show that the proposed complete algorithm achieves the best performance in terms of solution time, which validates the rationality of the proposed algorithmic architecture.
  • Lu Shibo, Li Jing
    Accepted: 2026-05-19
    To address the problem that, in radar emitter individual identification, a single continuous-pulse model cannot simultaneously capture global temporal information and fine-grained single-pulse features, while a single-pulse model lacks global dynamic information, thereby limiting recognition performance in complex electromagnetic environments, this paper proposes a dual-branch lightweight fusion recognition method. First, the original pulse sequence is segmented into two types of data, namely continuous pulse sequences and single pulses, through continuous pulse segmentation. Corresponding datasets are constructed for the inter-pulse sequence branch and the single-pulse branch, and a continuous-sequence model and a single-pulse model are trained separately to extract inter-pulse temporal features and fine-grained intra-pulse features, thus enabling complementary modeling of the two types of information. Subsequently, two fusion strategies, namely feature-level fusion and decision-level fusion, are designed. In the feature-level fusion strategy, a gating mechanism is introduced to learn the importance weights of features from different branches, so that continuous-pulse features and single-pulse features can be adaptively weighted to construct a joint feature representation. In the decision-level fusion strategy, the probability outputs of the two models are integrated by soft voting to improve recognition stability. To verify the effectiveness of the proposed method, comparative experiments and ablation studies are conducted on a measured radar dataset. The results show that both fusion strategies outperform the individual models. Specifically, decision-level fusion improves the recognition accuracy by approximately 8 percentage points over the single continuous-pulse model and by about 3 percentage points over the single single-pulse model. Moreover, feature-level fusion achieves the best recognition performance while reducing the number of model parameters by two orders of magnitude compared with the baseline model. The results demonstrate that the proposed method can maintain high recognition accuracy while also exhibiting favorable lightweight characteristics and strong potential for engineering applications.
  • KANG Panpan, CAO Yuecheng, TENG Liping, CHEN Junjie, LI Hongjun
    Accepted: 2026-05-19
    In recent years, although self-supervised skeleton-based action recognition has made progress, it still faces two types of training bias under strong augmentations: imbalance in local perturbation allocation can easily lead to over-perturbation of critical motion segments and insufficient variation in low-dynamic regions; in multi-positive contrastive learning, non-target positive samples participate in normalization competition, which can easily cause target conflicts and weaken representation aggregation. To address this issue, this paper proposes DCD-CLR, a self-supervised contrastive learning framework for collaborative optimization of view construction and objective construction, namely Dual-end Collaborative Debiasing Contrastive Representation Learning, to improve the quality of skeleton representation learning from the two aspects of augmentation allocation and contrastive objective. On the view side, Continuous Dynamic Saliency Augmentation (CDSA) is designed to integrate frame-difference energy and data-level joint motion priors, construct a frame-joint dynamic intensity map, and perform continuous, region-level, and sample-adaptive scheduling of spatiotemporal perturbation magnitudes, thereby improving view diversity while preserving critical motion segments. On the objective side, Target-Isolated InfoNCE (TI-InfoNCE) is proposed as a target-isolated debiased multi-positive contrastive objective, which removes the remaining positive samples when computing the normalization term of the target positive sample, so as to reduce competition interference among positive samples and improve the boundary clarity of the representation distribution. Under the linear evaluation setting, the proposed method achieves recognition accuracies of 85.9%, 79.6%, and 92.6% on NTU60 xsub, NTU120 xset, and PKU-MMD I, respectively; combined with the results of representation distribution visualization, transfer evaluation, and noise interference experiments, it is shown that the proposed method has good stability, generalization ability, and robustness.
  • Chen Hong, Wang Jinwei, Jin Haibo, Wu Cong, Yang Zi
    Accepted: 2026-05-19
    With cyberattacks becoming increasingly sophisticated and covert, improving the representation and recognition of complex traffic patterns has become an important issue in intrusion detection. Although existing methods have improved detection performance, directly modeling complex network traffic still suffers from insufficient feature representation. To enhance local correlations and structural information among features, many studies transform one-dimensional traffic features into two-dimensional image-like representations for deep feature learning. However, limited by feature dimensionality and encoding schemes, such traffic images are usually small and structurally constrained, making fixed enhancement strategies insufficient for capturing differences among attack patterns. Meanwhile, class imbalance further restricts the recognition of minority attack classes. To address these issues, this paper proposes a network intrusion detection method based on dynamic selective feature enhancement. At the representation level, a multi-scale feature enhancement module adaptively fuses features with different receptive fields to alleviate the representation limitations of small traffic images. At the decision level, a dynamic adaptive module combined with minority-class attention selectively strengthens key responses to improve minority-class recognition. Experimental results show that the proposed method achieves 96.49% accuracy, 95.11% precision, 96.32% recall, and 95.50% F1-score on NSL-KDD. It also maintains good detection performance on UNSW-NB15 and shows stable performance in a simulated streaming environment built on TON-IoT-Network.
  • CAO Qi, LI Shaodong, LU Shuaiyan, ZHANG Zhehao, YANG Guokai
    Accepted: 2026-05-15
    In recent years, RGB-based hand mesh reconstruction has attracted extensive attention. Existing methods mainly rely on stacking complex visual modules to improve reconstruction accuracy, but this often incurs high computational cost and makes it difficult to satisfy the requirements of real-time applications. To address this issue, this paper introduces natural language information during training, injecting high-level prior knowledge into the network to enhance visual feature representation. Since the text branch is used only for supervision during training, it does not increase the number of parameters of the main network, thereby preserving real-time performance. To further enhance visual representation, a dual-scale text generation module is proposed to describe hand features from both global and local perspectives. Specifically, the global text prompt models the overall hand pose based on the bending degree of each finger, while the local text prompt describes local hand features according to the spatial positions of individual joints. In addition, contrastive learning is employed to enforce consistency between multi-scale text features and image features in a shared semantic space. Considering that the CLIP model is highly sensitive to textual formulation, manually designing prompts usually requires extensive tuning and still cannot guarantee sufficient alignment with image features. To this end, this paper adopts a combination of fixed text prompts and learnable word vectors, where the fixed prompts are used to summarize the main semantic information, and the learnable word vectors are used to adaptively refine the prompts, thereby improving the suitability of the text descriptions for the hand mesh reconstruction task. Experimental results show that, compared with real-time methods, the proposed method achieves excellent reconstruction accuracy while maintaining real-time performance. On the FreiHAND dataset, the PA-MPJPE and PA-MPVPE reach 5.5 mm and 5.8 mm, respectively; on the DexYCB dataset, they reach 5.4 mm and 5.2 mm, respectively. The inference speed reaches 68 fps. Ablation studies further demonstrate that the dual-scale text prompts play a key role in hand mesh reconstruction.
  • Song Chengchen, Wu Qi, Miao Wang
    Accepted: 2026-05-15
    With the proliferation of digital platforms, the forms of offensive memes have become increasingly complex and diverse. This phenomenon has exacerbated the scarcity of high-quality annotated data, making modal semantic alignment bias under small-sample conditions a core issue constraining detection performance. To address this issue, this study proposes an offensive meme detection method via Cross-Modal Meta-Learning with Unimodal Rectification(CMML-UR). This method uses a cross-modal dual-gradient meta-learning framework, leverages hierarchical image features to provide multi-level visual semantics, and combines them with low-noise textual representations generated through multi-regularized modeling. At the decision fusion stage, by evaluating the output confidence of each modality at the sample level, the method introduces a unimodal confidence-gated rectification mechanism to dynamically calibrate the final prediction. Experimental results on the MultiOFF dataset demonstrate that the proposed method achieves a weighted F1-score of 74.6%, which is an improvement of 4.3 percentage points over the state-of-the-art (SOTA) model. In few-shot generalization tests, it maintains a weighted F1-score of 69.3% (5.6 percentage points higher than the baseline model at 63.7%), verifying its efficiency in complex cross-modal semantic understanding and robustness in noise suppression within few-shot scenarios.
  • YUN Jian, WANG Songnan, ZHANG Xueyi
    Accepted: 2026-05-15
    This paper addresses the dual challenges of system heterogeneity and data heterogeneity faced by federated learning in the medical image classification task, and proposes an adaptive federated optimization framework named SEFedProX based on reinforcement learning. This framework employs the Soft Actor-Critic algorithm in an heterogeneous environment, based on key state features such as client data distribution and performance feedback, and dynamically adjusts the proximal term coefficients in the continuous action space, effectively overcoming the quantization errors and model oscillation problems caused by discrete action spaces, and achieving precise and smooth control of the local training intensity. At the same time, an EfficientNetV2B2 pre-trained on ImageNet is introduced as the feature extraction network, which improves the model's representation efficiency and discrimination ability while significantly reducing the deployment requirements for resource-constrained medical edge devices, alleviating the overfitting risk in small-scale medical data. Systematic experimental results based on four different system heterogeneity settings and four medical image datasets and a general dataset show that SEFedProX significantly outperforms existing baseline methods in terms of classification accuracy, convergence speed, stability, and robustness. Ablation experiments further verify the effectiveness of the SAC continuous regulation mechanism and the EfficientNetV2B2 network, as well as their collaborative enhancement effect in the framework. This research provides a stable, efficient, and highly adaptive technical solution for the construction of distributed intelligent diagnostic systems in heterogeneous medical environments.
  • Kedong Zhang, Xusheng Qian, Zhiyong Zhou , Yakang Dai
    Accepted: 2026-05-15
    Multimodal vision-language foundation models show great potential in the medical domain, yet face notable limitations due to complex medical semantics and challenging cross-modal modeling. Patient-level rigid alignment ignores semantic similarity, causing unreasonable negative repulsion and degrading learning, while the lack of unified hierarchical modeling between reports and images hinders fine-grained cross-modal alignment. To address the above issues, this paper proposes a global-local collaborative alignment (GLCA), which achieves an improved medical vision-language classification model. GLCA consists of two main components: semantic-driven cross-patient soft global alignment and progressive three-granularity intra-patient local alignment. The semantic-driven cross-patient soft global alignment leverages cross-patient semantic sample pair mining and correlation-weighted contrastive penalty to construct a more continuous feature space that better reflects authentic semantic relationships. The progressive three-granularity intra-patient local alignment aligns visual and textual features at three levels-coarse (report-image), mid (sentence-region), and fine (word-patch)-via progressive query fusion, enabling effective cross-modal interaction. Global-local collaborative alignment first builds a semantically consistent feature space through inter-patient soft global alignment, then performs layer-wise matching via intra-patient multi-granularity alignment, ensuring continuous and precise cross-modal semantic correspondence. Extensive experiments are conducted on four chest X-ray datasets. The results demonstrate that GLCA significantly outperforms existing methods in both zero-shot prediction classification and few-shot fine-tuning classification tasks. On the public 14-class ChestXray14 dataset, the zero-shot prediction classification achieves improvements of 1.2%, 2.0%, and 2.2% over the second-best method in terms of AUC, F1, and ACC, respectively.
  • ZHONG Hang, ZHANG Qinghua, LUO Nanfang, GUO Ruili
    Accepted: 2026-05-15
    Multimodal emotion recognition in conversations integrates language, acoustic, and visual information to automatically identify the emotions in dialogues, thereby enhancing the naturalness and emotional understanding in human-computer interaction. However, existing methods have limitations in modeling multi-layer contextual dependencies of emotions. Multimodal feature fusion often introduces redundant information and noise, and these methods cannot effectively capture the uncertainty of emotions, which limits the recognition of complex emotional categories. To address these issues, this paper proposes a multimodal emotion recognition model that combines hybrid encoding and fuzzy modeling. The model uses a hybrid encoding module to capture both global dialogue context and local utterance-level dependencies, which strengthens the representation of emotional temporal features. In addition, a hierarchical gated fusion mechanism integrates features from different modalities and layers with dynamic weighting to suppress redundancy and noise and improve multimodal feature discrimination. For emotion classification, a fuzzy neural network initialized with linearly spaced parameters models the boundaries of emotion categories using fuzzy membership functions, capturing the uncertainty and fuzziness of emotional expression. Experimental results show that the proposed model outperforms baseline methods on all metrics across the IEMOCAP, MELD, and CMU-MOSEI datasets. It achieves an accuracy of 72.67% on IEMOCAP, 67.37% on MELD, and 54.96% for 7-class accuracy and 86.78% for 2-class accuracy on CMU-MOSEI, respectively, which validates the effectiveness of the proposed method in multimodal sentiment analysis.
  • LIU Xiangbin, ZHU Youhua, PENG Feng
    Accepted: 2026-05-15
    Handwritten mathematical expression recognition is an important task in computer vision and plays a significant role in intelligent education, industrial applications, and related fields. Existing encoder-decoder-based methods typically rely on standard convolutions and conventional attention mechanisms for feature extraction. However, the fixed-grid sampling of standard convolution cannot effectively adapt to the geometric deformations of handwritten symbols, which often leads to confusion between visually similar characters. In addition, traditional attention mechanisms usually involve limited cross-dimensional interaction, making it difficult to capture long-range structural dependencies in complex mathematical expressions. To address these issues, this paper proposes a handwritten mathematical expression recognition model based on an encoder-decoder architecture, termed DDTAFF, which integrates deformable dilated convolution and triplet attention feature fusion. Specifically, deformable dilated convolution incorporates learnable dilation rates into both the offset learning process and the customized convolution operation of deformable convolution, enabling more accurate offset prediction and adaptive expansion of the receptive field. Meanwhile, triplet attention feature fusion adopts a similarity-guided dynamic fusion strategy to enhance cross-dimensional feature interaction and improve the extraction of discriminative features. In the encoder, deformable dilated convolution is used to capture multi-scale features and broader contextual information, while triplet attention feature fusion effectively fuses features at different levels to strengthen the representation of critical regions. In the decoder, a Transformer-based structure is introduced to enhance long-range dependency modeling. Experimental results on the CROHME 2014, CROHME 2016, CROHME 2019, and HME100K datasets show that the proposed model achieves recognition accuracies of 59.34%, 59.77%, 59.63%, and 68.94%, respectively, representing improvements of 2.34%, 3.71%, 4.75%, and 1.63% over the baseline model. These results demonstrate the effectiveness and superiority of the proposed method.
  • Wu Mingjie, Wang Cheng, Pang Yuqing, Shi Wenya, Lin Zhiquan , Huan Zhan
    Accepted: 2026-05-12
    In Industrial Internet of Things (IIoT) scenarios, Time-Sensitive Networking (TSN) is required to guarantee high reliability, determinism, and low latency for data transmission. However, the traditional Cyclic Queuing and Forwarding (CQF) model faces challenges such as severe resource preemption, load imbalance, and insufficient network resource utilization when processing mixed traffic. To address these issues, this paper proposes A Deep Noisy Q-Network-Based Multi-CQF Scheduling Algorithm (DNQN-MCQF). Initially, the algorithm constructs a four-channel Multi-Cyclic Queuing and Forwarding (Multi-CQF) architecture, establishing a dedicated queue to guarantee the deterministic transmission of high-priority traffic, and utilizing weights determined by Particle Swarm Optimization (PSO) to calculate dynamic sorting scores for optimizing the transmission sequence. Subsequently, a hybrid feature extraction framework is constructed, employing a Convolutional Neural Network-Gated Recurrent Unit (CNN-GRU) to capture spatiotemporal dynamic features of slot utilization and queue states, combined with a Graph Convolutional Network (GCN) to aggregate static global topological information, generating state embeddings via an Attention Fusion Mechanism. Ultimately, in the scheduling decision phase, the Deep Noisy Q-Network is integrated to inject parameter space noise, achieving adaptive policy exploration and slot optimization. Simulation results show that under different network topologies and time slot conditions, DNQN-MCQF improves the average scheduling success rate and load balance by 14.7% and 19.2%, respectively, compared with baseline algorithms.
  • Li Yamin, Xiang Wen, Chai Li, Xiang Yao
    Accepted: 2026-05-12
    Low-light image enhancement (LLIE) is crucial in computer vision for restoring rich visual information from corrupted low-light images. However, existing LLIE methods often suffer from color bias due to color space sensitivity, and they typically fail to balance denoising and color fidelity within a single-stage framework. To address these challenges, this research introduces a novel Dual-Stage HVI-based Transformer Network (DHTNet) for LLIE. DHTNet significantly improves the quality of low-light images by decoupling intensity (I) and color (HV) maps, enabling their independent yet synergistic optimization. In the first stage, a hierarchical Transformer network equipped with an Adaptive Guidance Interaction Module (AGIM) models long-range dependencies between I and HV features. This stage achieves global noise suppression and accurate color calibration. In the second stage, the Multi-Scale Enhanced Synergistic Attention (MESA) module enhances localized color and feature representation through synergetic optimization across I and HV branches. This dual-stage framework addresses the limitations of existing LLIE approaches by retaining complex image details while enhancing visual realism. Experimental results show that DHTNet achieves the highest PSNR on both the SICE and SID datasets, surpassing the second-best method by 0.717 dB and 1.897 dB, respectively. In addition, DHTNet attains PSNR values of 28.756 dB, 24.683 dB, and 25.950 dB on the LOLv1, LOLv2-Real, and LOLv2-Synthetic datasets, respectively, outperforming existing methods such as Retinexformer, CIDNet and other models.
  • LI Yang, ZHANG Boyang, LI Lihong
    Accepted: 2026-05-12
    Backdoor attacks on deep neural networks manipulate model behavior by implanting stealthy triggers into the training data, posing a serious threat to model security. However, most existing imperceptible backdoor attacks focus only on invisibility in the spatial domain while overlooking anomalies in frequency-domain features. These methods often introduce noticeable high-frequency artifacts or stable spectral residual patterns in the frequency domain, making them vulnerable to detection by frequency-based defense techniques. To address this issue, a dual-domain imperceptible backdoor attack method based on wavelet packet decomposition and fast Fourier transform is proposed. First, wavelet packet decomposition is employed to select carrier sub-bands according to the energy distribution characteristics of the target class, and an energy-aware adaptive trigger embedding strategy is applied to balance attack effectiveness and stealthiness. Then, fast Fourier transform is used for spectral reconstruction by combining the amplitude spectrum of clean samples with the phase spectrum of poisoned samples, thereby reducing detectable traces in the frequency domain. Comparative experiments are conducted on the CIFAR-10, CIFAR-100, and Tiny ImageNet datasets using PreAct-ResNet18 and VGG19-BN models. The results show that the proposed method maintains high attack effectiveness while significantly improving dual-domain stealthiness and robustness against defenses. On CIFAR-10, it achieves an attack success rate of 99.94% and demonstrates strong evasive capability against tested defenses such as FTD, Neural Cleanse, and STRIP.
  • Lin Yong, Liao Miao, Gong Shiyuan
    Accepted: 2026-05-07
    Liver segmentation is an important prerequisite for liver disease diagnosis, three-dimensional reconstruction, and surgical planning. To overcome the segmentation challenges caused by complex structure, weak boundary, and large individual differences of liver in abdominal CT volumes, an automatic segmentation method based on multi-view information fusion is proposed. Firstly, a 2D U-shaped convolution network based on atrous spatial pyramid pooling is designed to extract multi-scale features and enlarge the receptive field without increasing the number of network parameters. The proposed 2D U-Net is then applied to slice-wise segmentation from multiple view directions, including axial, sagittal, and coronal planes, thereby compensating for the inability of single-view models to capture inter-slice contextual information. Subsequently, a lightweight 3D convolution network is built to fuse the segmentation results on multiple viewing directions, enabling three-dimensional liver segmentation under limited computational resources and yielding voxel-wise liver probability maps and label assignments for the CT volumes. Finally, the probabilities and labels obtained are used to construct the graph cut energy function to refine the segmentation results, effectively alleviating over-segmentation and under-segmentation. The proposed method can extract 3D features of CT volume indirectly by fusing the segmentation results obtained on different viewing directions and can improve the segmentation accuracy by introducing graph cuts. Experiments are carried out on two public datasets including 3DIRCADb and LiTS, and the Dice obtained by the proposed method on the test set are 0.947 and 0.962 respectively, outperforming those by many existing segmentation methods.
  • ZHANG Chi, ZHOU Shibing, JU Jialin, JIANG Min
    Accepted: 2026-05-07
    To address the issues of unbalanced multi-scale feature expression, cross-level fusion loss, and insufficient bounding box localization accuracy in document detection, a document detection method with multi-scale feature and semantic optimization is proposed. This method includes three parts of design and improvement: first, a multi-branch convolutional attention fusion module is constructed, which expands the receptive field via multi-scale strip convolution and integrates the attention mechanism with the C3k module; second, a multi-scale neck coordinated with global semantics and high-order correlation is designed, which achieves fusion through global feature collection, hypergraph convolution-based correlation mining, and multi-scale scattering; third, the bounding box regression loss is optimized, and dual-threshold interval mapping is adopted to enhance the discrimination of sample losses. Experimental results on the EXAM, CDLA, D4LA, and PubLayNet datasets show that the average detection accuracy of this method is significantly higher than that of existing methods. Experimental results indicate that this method can break through the performance bottleneck of YOLO11n in the field of document detection, improve accuracy while ensuring efficiency, and provide a scientific and feasible application scheme for document detection.
  • Qiang Zhenqian, Zhang Yupeng, Wang Danping
    Accepted: 2026-05-07
    To address the challenges of low-rank degradation and partial observation in spatio-temporal data caused by unstructured occlusion, motion distortion, and multi-source noise coupling in complex dynamic scenes, this paper proposes a motion discontinuous spatio-temporal behavior understanding framework that integrates non-convex temporal difference low-rank constraints and hierarchical trajectory-behavior semantic mapping. Firstly, a temporal difference low-rank recovery model based on the non-convex Schatten-p norm is constructed, and the Alternating Direction Method of Multipliers (ADMM) is employed to reconstruct motion data under high missing rates and noise pollution. Secondly, based on the recovered data, structured trajectory clusters are built by combining multi-object tracking, and trajectory neighborhood interaction features are extracted. Furthermore, a three-level behavior understanding model is proposed: behavior primitive classification based on multilayer perceptrons, interaction pattern recognition based on graph attention networks, and semantic fusion and behavior narrative generation incorporating spatio-temporal context, achieving end-to-end mapping from trajectories to high-level semantics. Experiments show that the proposed method significantly outperforms baseline approaches in data recovery quality under a 60% high missing rate, achieving behavior recognition accuracies of 92.7% on both the NTU RGB+D (X-Sub) and the self-built motion dataset BAS, which is 5.6 percentage points higher than the best comparative method. Ablation studies further validate the effectiveness of each module: the NTDLR recovery module improves the recognition rate from 78.3% to 86.7% under 60% missing data, trajectory neighborhood encoding enhances it to 88.2%, and the complete three-level model achieves optimal performance through synergistic interaction. The results of interaction pattern recognition and semantic description generation also notably surpass those of mainstream graph convolutional networks and their variants. This research provides an interpretable and scalable algorithmic framework for discontinuous and interactive motion behavior understanding in complex dynamic scenes.
  • SHEN Xueli, QIN Qingjie
    Accepted: 2026-05-07
    To address the challenges of low contrast, large-scale variation, and limited computational resources in industrial steel surface defect inspection, this paper proposes an efficient detection network with full-path collaborative enhancement, termed ISA-DETR. First, an Information-Preserving Downsampling (IPD) structure is constructed, which adopts a spatial–channel rearrangement-based pixel reorganization strategy to replace conventional strided downsampling. This design effectively preserves fine-grained spatial information while reducing feature map resolution, thereby alleviating the information loss of small defects during feature extraction. Second, a Large Separable Kernel Attention-Hybrid Group Block (SLK-HG) module is developed by integrating large-kernel separable attention mechanisms. Through the collaborative optimization of group convolution and separable convolution, the module builds an ultra-large receptive field with near-linear computational complexity, significantly enhancing the network’s ability to model long-range spatial dependencies and irregular defect patterns. Furthermore, an Adaptive Dynamic Sampling (ADS) operator is introduced to achieve precise cross-scale feature alignment via content-driven offset prediction, reducing localization deviations in complex backgrounds and improving detection robustness. Experimental results on the NEU-DET steel surface defect dataset demonstrate that ISA-DETR achieves an mAP@0.5 of 75.2% with only 20.67M parameters and 77.5 GFLOPs. Compared with the baseline model, the proposed method reduces the number of parameters and computational cost by 35.4% and 25.1%, respectively, while improving detection accuracy by 3.2%. In addition, transfer experiments on the PCB defect dataset further verify the strong generalization capability of the proposed approach. Overall, the proposed method achieves an effective balance between detection performance and deployment efficiency, providing a practical and reliable solution for intelligent quality inspection in industrial edge scenarios.
  • Ji Kang, Wei Songjie, Li Meng
    Accepted: 2026-04-29
    Abstract In the high-risk petrochemical industry, operational environments involve complex factors such as high temperature, high pressure, flammable, explosive, and toxic substances. Even minor deviations in worker behavior can lead to severe accidents, resulting in irreversible casualties and property losses. Traditional supervision methods, which rely heavily on manual inspection, are not only inefficient but also struggle to cover multi-worker and multi-equipment collaborative scenarios, and are highly susceptible to subjective interference. The core challenges in recognizing group collaborative behaviors lie in modeling complex human–object interactions, capturing dynamic features of multiple targets, and bridging the ambiguous mapping between macroscopic group intentions and microscopic individual actions. To address these issues, this paper proposes a graph neural network-based method for group collaborative behavior recognition. By constructing a unified interactive graph structure, entities such as workers and equipment are encoded as nodes, and multimodal perceptual features are integrated to enable end-to-end reasoning of interpersonal and human–object interactions. A hierarchical graph network architecture is further designed to model the correlation and evolution from individual actions to group behaviors, achieving accurate recognition and understanding of multi-target group behaviors in complex operational scenarios. Comparative experimental results show that the proposed method improves the MCA/MPCA metrics by 3.91% and 2.86%, respectively, over the next best method on a self-built dataset. On the public open-source Volleyball dataset, the MCA/MPCA metrics are improved by 0.26% and 0.21% compared to the next best method, fully verifying the method's advancement and robustness.
  • LI Ziyang, ZHENG Jiong, MA Jie, LI Shishen, QIN Jiwei
    Accepted: 2026-04-29
    Multi-interest sequential recommendation models extract multiple user interests through a dynamic routing mechanism to achieve personalized recommendations. However, during the interest extraction phase, improper item–interest routing weight allocation may cause multiple interest representations to become excessively similar, leading to the multi-interest collapse issue. In the prediction phase, neglecting users’ preference intensity across different interests may assign comparable or even greater influence to low-preference interests than to high-preference ones, resulting in the multi-interest preference weight imbalance issue. To address these issues, we propose a multi-interest sequential recommendation model based on disentangled feature representation and adaptive weight fusion (DMIAFRec). First, the model partitions items according to their co-occurrence relationships, grouping frequently co-occurring items with complementary semantic characteristics into the same interest group. This partition serves as a structural guidance mechanism for routing weight allocation, encouraging each group to focus on relatively independent user interests, thereby achieving disentangled multi-interest representations and preventing excessive convergence among interest vectors. Furthermore, a time-decay mechanism and a multi-interest attention fusion strategy are introduced to adaptively assign weights according to users’ preference intensity for each interest. The weighted aggregation of multiple interest representations produces a unified user preference representation that reflects heterogeneous interest importance, thus enhancing personalized recommendation performance. Experimental results on three public datasets show that, compared with the best baseline model, the proposed model achieves average improvements of 6.2%, 4.98%, and 4.07% in R@20, NDCG@20, and HR@20, respectively, on the Retail Rocket, Gowalla, and Books datasets, demonstrating its effectiveness in improving recommendation performance and addressing the aforementioned issues.
  • LIANG Zefeng, QIAO Jie, CAI Ruichu, HAO Zhifeng
    Accepted: 2026-04-29
    Deep neural networks have been widely applied to time-series critical tasks such as medical diagnosis, intelligent sensing, and autonomous driving, yet their security vulnerabilities have gradually emerged. Existing studies show that deep time-series models are also susceptible to adversarial attacks. However, most existing adversarial attack methods for time-series models mainly focus on norm-bounded numerical perturbations and often ignore the inherent causal dependencies and dynamic evolution in the data generation process. As a result, the generated adversarial examples may deviate from feasible system dynamics and lack practicality in real-world scenarios. Therefore, generating effective adversarial examples while adhering to temporal causal dynamics has become an important challenge in time-series adversarial research. To address this challenge, this paper proposes TCADE (Temporal Causal ADversarial Examples), a novel method that explicitly models causal structures in time-series data and performs counterfactual reasoning under causal intervention constraints. By formulating adversarial attacks as feasible interventions on the underlying system, TCADE generates adversarial examples that can effectively mislead model predictions while remaining consistent with the system’s causal relationships and dynamic evolution. Experimental results demonstrate that TCADE achieves significant attack effectiveness in black-box settings, and the generated adversarial sequences conform to the causal generation mechanisms. This work provides a systematic evaluation of the vulnerability of time-series models under realistic and feasible black-box attacks, and offers practical insights for improving model robustness.
  • Jinglin Huang, Maoqiang Wu, Siming Wang, Yue Lai, Rong Yu
    Accepted: 2026-04-28
    Federated learning is a distributed machine learning paradigm that leverages decentralized data resources while ensuring data privacy. However, in real-world scenarios, data across clients are often non-IID (Independent and Identically Distributed), leading to label shift and class imbalance issues, which hinder convergence of global models and degrade generalization performance. To address the impact of such data heterogeneity on model performance, we propose a cross-client data augmentation and classification framework based on diffusion models. In this framework, each client trains an initial diffusion model based on local data and uploads its model parameters to the server. The server aggregates these parameters to construct a global diffusion model, which is then downlinked to all clients. Clients use the global diffusion model to generate supplementary samples, which are uploaded to the server for data augmentation to balance the local class distribution, thereby improving classifier performance. Ultimately, the classification model is trained through federated learning by receiving both local data and generated samples, and is deployed to clients for image classification and recognition. To generate high-quality images, a denoising diffusion probabilistic model is used as the generation backbone, while a ResNet-18 architecture is employed for the federated classification model. Experimental results show that the fine-tuned global diffusion model can generate images that are more consistent with the real data distribution. By augmenting the data through generated samples, the local data distribution on clients becomes more balanced, significantly improving global classification accuracy. Under the non-IID condition with a Dirichlet coefficient α=0.1, the accuracy of CIFAR-10 and CIFAR-100 increased from 46.76% and 21.31% to 54.64% and 25.57%, respectively, demonstrating the effectiveness of the proposed data augmentation strategy in mitigating class imbalance.
  • SUN Yunlei, Xu Ke
    Accepted: 2026-04-28

    This paper proposes a 3D digital rock reconstruction framework based on diffusion prior guidance and multi-scale residual fusion Implicit Neural Representation (INR) to address challenges such as structural discontinuity, topological fracture, and the difficulty in balancing cross-scale microscopic details under sparse 2D slice conditions. The framework introduces the Score Distillation Sampling (SDS) mechanism to transform the geometric and topological priors in pre-trained diffusion models into continuous gradient guidance. It combines the measurement consistency loss for collaborative constraints to achieve the consistent restoration of local fine features and global topological structures under extremely sparse slice constraints. Meanwhile, the framework utilizes multi-scale residual structures to enhance the representation ability of INR for complex pores and improves the generalization performance of the model under different voxel sizes. Experimental results show that this method accurately restores complex pore spaces on various digital rock datasets. The reconstructed structures maintain high consistency with the ground truth in key physical indicators such as porosity distribution and geometric connectivity. In the 256⊃3; scale reconstruction task, the Dice Similarity Coefficient (Ddice) reaches 97.01%, which is 2.6% higher than the baseline INR model. As the reconstruction scale further expands, the Ddice maintains at 95.97% and 92.88% under 512⊃3; and 1024⊃3; high voxel size tasks, respectively, demonstrating excellent stability in large-scale reconstruction. In the cross-sample generalization tests for Berea sandstone and Ketton limestone, the Ddice reaches 93.44% and 95.51%, respectively. This study solves the problem of stable reconstruction for complex porous media in data-limited scenarios and provides a physically reliable and continuous new technical solution for refined geological modeling.

  • Zhu Li, Cui Botao, Zhu Chunqiang, Mi Lugema, Xu Wanru, Wang Jing, Wang Pei
    Accepted: 2026-04-28
    Accurate short-term power load forecasting is crucial for the safe operation and optimal scheduling of power systems. Existing forecasting methods based on decomposition rely on fixed prior knowledge, which leads to rigid decomposition patterns. As a result, they struggle to handle load data with multiple periodicities and strong non-stationarity. Meanwhile, these methods find it difficult to balance computational complexity and prediction accuracy. To address these issues, this paper proposes a forecasting model based on learnable wavelet decomposition and KAN-Mixer, called LWKAN-Mixer. First, a learnable wavelet decomposition module is used to decompose the original load sequence into wavelet components of different frequency bands. Next, the Fast Fourier Transform (FFT) is applied to extract the dominant period of each component. Based on these dominant periods, patches of corresponding sizes are created for each component. Then, a multi-scale time-frequency fusion module is used to model each component independently to capture time-frequency features. The KAN-Mixer and a dual interactive convolution block are used to capture sequence representations and temporal dependencies, respectively. A multi-scale hybrid loss function is also introduced to constrain the quality of decomposition and reconstruction during training. This helps alleviate error accumulation and improve prediction accuracy. Experimental results on three real-world load datasets show that the proposed model reduces MAE by 1.10%–9.37% on the Australia dataset and by 4.97%–17.36% on the Morocco dataset compared to the latest baseline models. On the Cele dataset, the model achieves the second-best MAE. These results demonstrate that LWKAN-Mixer can effectively model complex nonlinearity and non-stationarity in load sequences, achieving strong performance in short-term load forecasting tasks.
  • Cao Tianya, Wang Zhixin, Shi Pengju, Li Kang, Li Shuang
    Accepted: 2026-04-28
    With the continuous advancement of social media and online service platforms, user reviews have increasingly become a critical source of information influencing consumer decisions and product evaluations. Aspect-based sentiment analysis (ABSA), as a key research direction in fine-grained sentiment computing, still faces significant challenges in practical applications—particularly prominent semantic ambiguity in textual content and insufficient extraction of sentiment cues. To address these limitations, this paper proposes CKMA, a novel aspect-based sentiment analysis model that integrates knowledge graph embeddings with a multi-channel attention mechanism.The CKMA model first leverages knowledge graph embedding techniques to map entities and their relationships from external knowledge bases into low-dimensional semantic vectors, which are then fused with textual representations to alleviate semantic ambiguity commonly observed in user reviews. Building upon this knowledge-enhanced representation, we design a parallel multi-channel feature extraction framework comprising three distinct channels: a structured information channel, a context-aware channel, and an aspect-focused channel. Through a staged fusion strategy, this framework enables collaborative modeling of diverse semantic and syntactic signals, thereby enhancing the model’s capacity to capture aspect-relevant sentiment features. To mitigate the loss of original semantic information during deep feature learning, we further introduce a joint fusion mechanism that combines the knowledge-enhanced word-level representations with the outputs of the multi-channel attention modules, thereby improving the completeness and robustness of the final feature representation.Extensive experiments conducted on four widely used benchmark datasets—Restaurant14, Restaurant16, Laptop14, and Twitter—demonstrate that the proposed method achieves superior performance in terms of both accuracy and Macro-F1 scores. Notably, CKMA exhibits more pronounced advantages on datasets characterized by complex syntactic structures, validating the effectiveness of our synergistic modeling strategy that jointly exploits structural and semantic information for aspect-based sentiment analysis.
  • jinhao hu , dongfen li, jinbo wang , Jinshan lai
    Accepted: 2026-04-22
    Federated learning (FL) features two core advantages: keeping raw data local and enabling collaborative training across participants, which safeguards data privacy and facilitates distributed model collaboration. However, its architecture still confronts two key security threats—malicious client selection manipulation and server-side gradient tampering—rooted in the contradiction between distributed training and centralized aggregation. Specifically, malicious servers can rig client selection to skew the aggregated model, and the server’s absolute control over gradient aggregation creates a trust bottleneck for tampering, as client authentication relies on server trust and gradient aggregation lacks decentralized verification. To tackle these issues, this paper proposes a client-verifiable FL framework integrating Verifiable Random Functions (VRF) and lightweight Message Authentication Codes (MAC). In client selection, a VRF-based dynamic protocol ensures unforgeable participant identities and publicly verifiable selection results, preventing undetectable server tampering. In gradient aggregation, an innovative lightweight MAC mechanism with auxiliary node collaboration enables trustless tampering detection via gradient-sensitive parameters. Experiments demonstrate that the VRF-based selection maintains performance close to the theoretical benchmark of unmanipulated scenarios, reducing the malicious node selection rate by over 33% compared with traditional FedAvg. Meanwhile, the MAC-based gradient verification mechanism cuts communication overhead by around 24% relative to the baseline VerifyNet.
  • CHEN Mingyun , YU Xin , WEI Zhipeng, ZHANG Jinxiong
    Accepted: 2026-04-22
    This paper proposes a novel distributed neurodynamic optimization algorithm, designed by combining multi-agent theory and the penalty function method, which ensures fixed-time consensus for distributed nonconvex problems subject to local inequality constraints. The initial conditions of the algorithm can be chosen arbitrarily. By appropriately designing the penalty mechanism, it is guaranteed that the algorithm’s state variables enter the feasible region defined by the constraints within a finite time and remain therein thereafter. The consensus term of the algorithm combines a dynamic switching function with a sign function to achieve fixed-time consensus independent of initial conditions, thereby improving the efficiency and controllability of the optimization process. Based on Lyapunov theory, it is proven that, under appropriate assumptions, the algorithm’s state variables remain bounded, enter the feasible region of inequalities in finite time, achieve fixed-time consensus, and ultimately converge to the set of critical points of the nonconvex problem. Compared with existing distributed algorithms, the proposed algorithm adopts a single-layer differential inclusion framework and integrates a penalty mechanism that avoids complicated penalty-parameter tuning with an advanced fixed-time consensus control strategy. This design ensures highly controllable convergence time while preserving structural simplicity, low computational overhead, and flexibility in selecting initial points. The effectiveness and practicality of the algorithm are demonstrated through two simulation studies and an application to optimal facility location.
  • Liu Yongchang, Yin Yanchao, Chen Hailong
    Accepted: 2026-04-22
    In complex process manufacturing, high process coupling, intricate multi-step coordination, and significant nonlinear relationships between product quality and process parameters pose challenges for quality control. To address these issues, this study proposes a segmented multi-process quality prediction method integrating multi-layer neural networks and ensemble learning. The approach first establishes an overall prediction model and segmented prediction models. The overall model employs Random Forest (RF), LightGBM, and KNN algorithms, overcoming the limitations of single-model generalization through ensemble learning strategies while leveraging multi-algorithm differences to extract multidimensional data features. The segmented model utilizes LSTM-KAN networks, where Long Short-Term Memory (LSTM) captures long-term dependencies between process quality and feature variables, while Kolmogorov-Arnold Networks (KAN) enhance nonlinear mapping capabilities. Subsequently, the XGBoost ensemble learning algorithm integrates both models to achieve complementary advantages. Finally, a case study of predicting the moisture content of materials at the exit of the tobacco dryer in tobacco production is conducted for verification. As a core quality characterization indicator in tobacco primary processing, the stability of the exit material moisture content is directly related to the material softening effect of loose conditioning, the liquid absorption efficiency of leaf moistening and feeding, and the drying uniformity of thin-plate tobacco drying. Comprehensive control of the multi-process quality can be achieved through the accurate prediction of this single indicator. The results show that the fusion model is significantly superior to traditional single models and comparative models in key indicators such as mean absolute error (MAE=0.0072), root mean square error (RMSE=0.0096), mean absolute percentage error (MAPE=0.0566%), and goodness of fit (R⊃2;=0.9890). This verifies the effectiveness of the proposed method in handling nonlinear relationships and time-series characteristics, as well as its advantages in prediction accuracy and generalization performance, making it suitable for complex multi-process scenarios in tobacco primary processing.
  • YU Hang, ZHU Hongqing
    Accepted: 2026-04-22
    Magnetic resonance imaging is an important tool for clinical auxiliary diagnosis and lesion detection. Currently, most MRI reconstruction methods are based on global feature modeling, utilizing transformers to achieve high-quality reconstruction. However, these methods often perform dense feature dependency calculations in the spatial domain, which may introduce redundant information and noise from irrelevant areas. Additionally, existing methods require separate training of models for different sampling patterns, resulting in inefficiency and limited generalization capabilities. To address these issues, this paper proposes the Dual-domain Adaptive Transformer Prompt Network (DATP-Net), a unified reconstruction framework that efficiently models feature relationships and reconstructs images from various sampling patterns simultaneously. The proposed network includes several core designs: (1) A deep feature convolution mixer that performs convolution operations in both spatial and frequency domains to enhance the representation of deep features; (2) An adaptive mixing transformer that combines adaptive self-attention and a fine-grained feedforward network, using dual-branch self-attention computation and fine feature elimination to enhance potentially useful feature relationships; (3) A degradation prompt module that injects learnable prior degradation information flow at the reconstruction end to guide feature reconstruction, enabling the network to integrate MR image reconstruction from multiple sampling patterns and enhance the model's generalization ability. Extensive experiments conducted on public IXI and fastMRI datasets demonstrate that the proposed method significantly outperforms state-of-the-art methods with lower computational costs. At a 4x random sampling rate, the model achieves an average PSNR of 39.82 and an SSIM exceeding 0.96, successfully reconstructing images with high clarity and detail restoration.
  • FAN Tianhao, QI Lianyong, YANG Yijie, LI Chong, SONG Te, ZHANG Dejiang
    Accepted: 2026-04-21
    Partial label learning is a typical weakly supervised learning paradigm in which each training instance is assigned a candidate label set that contains the true label. The goal of partial label learning is to identify the ground-truth label from the candidate set for each instance. In real-world applications, partial label data usually exhibit class imbalance. This makes learning methods based on prediction confidence and label refinement prone to bias and thus degrades classification performance. This issue is more severe in long-tailed scenarios, where head classes dominate the disambiguation process and tail classes are insufficiently learned. Moreover, existing optimal transport–based label refinement methods still suffer from systematic bias in imbalanced scenarios. To address these issues, this paper proposes a method named C2DOT-PLL for long-tailed partial label learning. While preserving the global consistency advantage of optimal transport, the method first employs a dynamic confidence calibration mechanism to alleviate unfair comparisons caused by inconsistent confidence scales across classes and to reduce the impact of class imbalance on instance-level label competition. Then, an unbiased optimal transport scheme is introduced in the pseudo-label refinement stage to correct the systematic bias induced by entropic regularization, thereby producing more accurate pseudo labels. Experiments are conducted on multiple benchmark datasets with different imbalance levels. The results show that, compared with existing partial label learning methods, C2DOT-PLL achieves the best overall classification accuracy.
  • ZHANG Haicang, TANG Shibao, HUO Jiuyuan
    Accepted: 2026-04-21
    Accurate traffic flow prediction can provide scientific decision support for traffic management departments, which is crucial for alleviating urban traffic congestion, improving overall network operation efficiency, and enhancing service levels. Addressing the issue of insufficient exploration of periodic spatio-temporal features in existing traffic flow prediction models, this paper proposes a Multi-Period Spatio-Temporal Gated Network (MPSTG) method for traffic flow prediction. The MPSTG method first designs decoupled parallel multi-period feature extraction branches to model spatio-temporal features under different periods in independent subspaces, considering the multi-period characteristics embedded in traffic flow data. Then, within each individual period branch, a spatio-temporal feature extraction module combining a gating mechanism and graph attention diffusion convolution is introduced to enhance the model’s ability to capture dynamic spatial correlations and temporal dependencies. Finally, a bidirectional feature fusion strategy is constructed to achieve efficient collaborative expression of multi-period information for features of different granularities. Experiments on three public traffic flow datasets show that the proposed method outperforms baseline models. In terms of MAE, it reduces the error by 2.0%, 3.4%, and 3.6% in the 60-minute prediction task on the three datasets, demonstrating its accuracy, adaptability, and robustness in complex traffic scenarios.
  • WANG Peng, JIANG Shaohua , ZHANG Yiwen, WANG Wanyu, ZHANG Lianming
    Accepted: 2026-04-21
    Stance detection is a core task in social media public opinion analysis and plays a crucial role in understanding the distribution of public opinions. However, existing methods perform poorly in multi-turn dialogue scenarios, with a significant decline in modeling capability especially when dealing with deep-level comments. The main bottlenecks lie in the lack of a logical reasoning chain for implicit knowledge and the stance formation process, as well as insufficient target-dependent multi-granularity context modeling. To address these issues, this paper proposes a Chain-of-Thought enhanced Context Modeling method (CoT-CM) to improve the accuracy and robustness of stance detection in multi-turn dialogues. Leveraging the external knowledge of large language models, this method guides chain-of-thought reasoning through prompt design, extracts stance-related intermediate variables, and integrates them interactively with dialogue semantics, thereby depicting the reasoning process of the stance formation logic. Meanwhile, a multi-level dialogue semantic framework is designed to model the historical dialogue context from global, local, and relational perspectives, and a target-guided multi-hop attention mechanism is introduced to capture the most relevant information. In addition, a structural consistency contrastive learning mechanism is proposed, which effectively enhances the discriminative ability between different stances by jointly optimizing classification and contrastive losses. Experiments on Chinese multi-turn dialogue stance detection datasets C-MTCSD and ZS-CSD show that CoT-CM achieves an average F1 improvement of 2.97% and 1.36% respectively.
  • Liu Mingkai, He Peiwen, Liu Mengchi
    Accepted: 2026-04-21
    The Text-to-SQL task aims to convert natural language queries (NLQ) into Structured Query Language (SQL). Although the rise of Large Language Models (LLMs) has redefined the paradigm of this task, most existing studies focus on optimizing the model's schema awareness and SQL generation capabilities through prompt engineering, while often neglecting the prevalent semantic ambiguity in natural language. This neglect leads to comprehension biases when models handle complex scenarios. To address this, we propose a Text-to-SQL framework with Disambiguation, Analysis, Refinement, and Election (DARE-SQL). The framework first leverages the semantic reasoning capabilities of LLMs to construct a semantic expansion module, which generates an expanded set of questions covering the user's potential intent space to explicate and capture fuzzy semantics. Subsequently, differentiated generation strategies are applied to questions from various sources, and a refinement mechanism based on execution feedback is introduced to optimize the results, thereby building a high-quality set of candidate SQLs. Finally, a two-stage selection strategy based on question consensus is employed to filter for the optimal solution that balances both accuracy and execution performance. Experimental results demonstrate that DARE-SQL achieves an Execution Accuracy (EX) of 71.71% and a Valid Efficiency Score (VES) of 70.41 on the challenging BIRD benchmark, and reaches 88.10% EX on the classic Spider dataset. These results validate the effectiveness of explicit ambiguity modeling in enhancing performance for complex Text-to-SQL tasks.
  • LIANG Yu, MA Jiayan, HU Xiyuan , WANG Ziheng, LIU Wen, PENG Tianhao, LI Ying
    Accepted: 2026-04-20
    With the rapid development of the internet and social media, the speed of information generation and dissemination has reached an unprecedented level. The proliferation of misinformation, rumors, and other misleading content has become increasingly prominent, posing significant threats to social governance order, harmony, and stability. In rumor detection, the low proportion of rumor samples leads to data imbalance, while existing text augmentation techniques struggle to enhance detection performance due to their lack of specificity to rumor styles and low generation quality. Additionally, although pre-trained language models excel at capturing global dependencies in text, they often fall short in focusing on key local features of rumors. To address these challenges, this study proposes a rumor detection framework based on large-model data augmentation and multi-granularity feature fusion. First, a rumor generation method integrating a rumor-style lexicon and large language models is proposed. Based on publicly available rumor datasets, a style lexicon is constructed to guide large language models in generating semantically coherent and rumor-style consistent minority-class samples. This approach alleviates data imbalance while ensuring the quality of augmented samples. Second, this study introduces a multi-granularity contextual feature extractor. It combines the strengths of pre-trained language models with disentangled attention mechanisms in capturing global dependencies and the focus of convolutional sub-layers on local features. This enables the simultaneous capture of long-distance logical associations and fine-grained linguistic clues in rumor semantics, effectively mitigating the inherent limitations of such pre-trained models in capturing key local features. Experimental results demonstrate that the proposed detection method achieves accuracy rates of 82.24% and 93.91% on the BuzzFeed and PolitiFact datasets, respectively.
  • Wang Xinyue, Sun Zhigang , Quan We, Huang Rong
    Accepted: 2026-04-20
    Time Sensitive Networking (TSN), as a real-time Ethernet technology with deterministic transmission characteristics, has gradually been applied in safety critical scenarios such as automotive and aerospace. In these scenarios, link failures caused by random environmental factors may interrupt TSN connections, thereby affecting static configurations such as TSN time synchronization trees. Therefore, real-time maintenance of network topology has become the key to ensuring system reliability in security critical scenarios. However, there is relatively little research on TSN topology state monitoring, which makes it difficult to meet the high real-time requirements of TSN systems for network monitoring. Based on this, this paper first compares and analyzes the problems and challenges of existing TSN topology state monitoring methods in safety critical scenarios from the perspective of real-time performance; Based on the above analysis, this article proposes a TSN fast topology state discovery protocol for security critical scenarios - FTDP. In FTDP, each node displays the planned monitoring path guidance monitoring probe through the source routing paradigm, requiring only one probe to collect information from the entire network, reducing the delay of topology state discovery; Finally, through testing in a real hardware environment, the experimental results show that the network topology monitoring delay within 10 nodes does not exceed 100 microseconds, confirming that the FTDP protocol can collect network topology in high real-time to complete monitoring. Furthermore, by comparing existing methods, the advantages of FTDP in real-time are further confirmed.
  • Yuxin LIU, Hui LI, Jianwei ZHANG
    Accepted: 2026-04-20
    The autonomous path planning of drones is the key to ensuring the success of the missions in complex environments, requiring it to be able to plan both globally efficient flight paths and respond to changes in local environments. In the initial static environment, complete planning for different combinations of starting and ending points, while adjusting obstacle avoidance in local areas, requires an effective balance between global path optimality and local obstacle avoidance capabilities. Existing heuristic algorithms exhibit an exponential growth in search time with spatial resolution in complex three-dimensional environments, making it difficult to meet real-time requirements. On the other hand, gradient-based deep reinforcement learning methods often encounter the "perception aliasing" problem when dealing with unstructured mountainous terrain due to the lack of local perception guidance, leading to unstable training convergence and susceptibility to local extremum traps. A proximal policy optimization algorithm based on local information enhancement (LIE-PPO) is proposed, and a state space integrating global position information, relative target information, and a local perception window is designed to enable the agent to balance long-term planning and local decision-making, thereby addressing path planning problems in high-dimensional feature spaces. For the path planning problem, the algorithm adopts a 26 neighborhood discrete action space and designs a multi-objective reward function that comprehensively considers path smoothness, safety, and efficiency. This guides the agent to learn an efficient safe path selection strategy, enabling the online generation of feasible and optimal paths between any given start and end points based on a pre-trained model. The experimental results show that, over multiple tests with random start and end points, the proposed algorithm has an approximate global optimality with an average path length difference of less than 7% compared to the results of the A* algorithm in a static environment; Compared to the standard proximal policy optimization algorithm, the convergence speed has been improved by approximately 1.6 times, demonstrating faster convergence speed and higher training stability. In the presence of unknown obstacles, feasible paths can still be planned, demonstrating good environmental adaptability.
  • ZHANG Shenghao, HAN Weili
    Accepted: 2026-04-20
    Passwords remain the most critical factor in identity verification and are widely used in various security scenarios. Enhancing password security relies heavily on the simulation and study of password guessing. In practice, data-driven credential tweaking attacks are highly constrained by the quantity and quality of training samples. Existing few-shot password guessing frameworks are not suitable for credential tweaking attacks. To address these issues, this paper proposes a few-shot credential tweaking attack method based on large language model and data augmentation technology. This method aims to automatically generate pseudo-aligned password data using a minimal number of high-quality samples, thereby reducing the high dependence on data quantity and quality in credential tweaking attacks. The contributions of this paper are as follows: 1) Based on reinforcement learning technology, a credential tweaking attack framework named PasswordRL is proposed. 2) Based on augmentation techniques, this paper proposes the few-shot credential tweaking attack framework PasswordRL-FS. Using four mainstream guessing methods as the baseline, this paper conducts comparative experiments on the aforementioned two frameworks on two real leaked password datasets. Experiments show that in real-world few-shot scenarios (number of training samples = 1000), with guess budgets of 5, 10, and 100, the hit rates of the proposed attack framework outperform the second-best baseline by 39.54%, 23.72%, and 42.40%, and the guess hit rates reach 83.72%, 81.85%, and 93.68% in data-rich scenarios (number of training samples > 107). These experiments demonstrate the effectiveness of the method proposed in this paper.
  • Yuan Shuai, Miao Disheng, Zhang Haonan
    Accepted: 2026-04-20
    Nonlinear state estimation is a core technology in fields such as radar target tracking and robot localization. However, in practical applications, model uncertainties and unknown or time-varying noise covariance matrices (NCMs) cause traditional filtering algorithms to exhibit increased estimation errors or even divergence. Existing adaptive filtering methods often struggle to achieve a balance between estimation accuracy and computational efficiency. To address these challenges, this paper proposes a Robust Sliding Window Variational Adaptive Cubature Kalman Filter (RSWVACKF). Firstly, variational Bayesian inference (VBI) is integrated with the cubature integration rule to derive a joint recursive solution for the state vector, the process noise covariance matrix (PNCM), and the measurement noise covariance matrix (MNCM), enhancing the algorithm applicability in nonlinear systems. Secondly, a sliding-window-based noise covariance estimator is designed. This estimator uses a cubature Kalman smoother (CKS) to backward smooth the state vectors within the sliding window, enabling online estimation of NCMs while avoiding fixed-point iterations and improving computational efficiency. Finally, a multiple fading factors-based strong tracking filter (MSTF) is incorporated. The online estimated NCMs guide the MSTF in adjusting the prediction error covariance matrix(PECM), thereby enhancing the algorithm robustness. Multiple simulations validate the effectiveness of the proposed RSWVACKF. Results demonstrate that the proposed method exhibits significant advantages over existing state-of-the-art approaches in both estimation accuracy and computational efficiency.
  • Yaxin Li, Jingling Yuan, Xian Zhong
    Accepted: 2026-04-20
    Video analytics extracts high-value information from video streams and plays a crucial role in applications such as intelligent transportation and public safety. Although traditional cloud-based video analytics offers powerful computational capabilities, uploading massive amounts of video data incurs high bandwidth consumption and network latency. Edge computing reduces network latency by processing video data near the cameras, but it still faces two major challenges: first, frame-by-frame analysis leads to redundant inference, and existing frame reuse methods cannot fully exploit local similarities in historical frames; second, uneven core workload arises because task allocation across big and LITTLE cores lacks real-time load awareness. To address these issues, this paper proposes Vable, an efficient video analytics system for big.LITTLE edge devices. Vable employs a multi-historical frame, block-level frame reuse mechanism, which partitions video frames into fine-grained blocks and employs a tree-based storage structure combined with locality-sensitive hashing for similarity matching, enabling efficient cross-frame computation reuse and significantly reducing redundant inference overhead. In addition, Vable introduces a core workload-aware list-based DAG partitioning algorithm, which dynamically allocates analysis tasks by monitoring the real-time load of big and LITTLE cores, balancing computation and communication overhead while avoiding latency increases caused by load imbalance. A prototype of Vable is implemented and evaluated on two real-world datasets. Experimental results show that Vable reduces end-to-end latency by 59.23% and 45.83%, respectively, while maintaining high throughput.
  • WU Jiaheng, DUAN Jiancheng, ZHANG Ronghui, CHEN Junzhou
    Accepted: 2026-04-20
    In complex road traffic scenarios, vehicle detection faces significant challenges, including large variations in object scale, frequent occlusions, and the difficulty of simultaneously achieving high accuracy and real-time performance. To address these issues, An improved vehicle detection algorithm, termed YOLOv13n-FCM, based on the YOLOv13n baseline was improved. First, Frequency Dynamic Convolution (FDConv) is introduced into the backbone to strengthen the modeling capability of multi-frequency information, thereby enhancing the representation of vehicle edge structures and fine-grained details. Second, a Channel–Spatial Fusion (CSF) module is designed to jointly model channel-wise and spatial features, enabling the network to focus on salient vehicle regions while effectively suppressing background interference in complex scenes. Finally, a Multi-Branch Fusion (MBF) module is incorporated into the detection head to perform adaptive, weighted multi-scale feature fusion, further improving the detection performance for vehicles at different scales. The experimental results on the public datasets Vehicle Detection Dataset and BITVehicle show that the YOLOv13n-FCM model achieves good detection performance in various road vehicle scenarios. Specifically, on the Vehicle Detection Dataset, the mAP50 reaches 60.1%, and the mAP50:95 reaches 42.6%, which are 2.7% and 2.6% higher than those of the original YOLOv13n model, respectively; at the same time, compared with the best competing method, it has improved by 2.7% and 1.8% respectively. On the BITVehicle, the proposed method also outperforms the baseline model, indicating its certain cross-scenario adaptability. In addition, after hardware acceleration on an NVIDIA Jetson AGX Orin edge device, YOLOv13n-FCM runs at 78.5 FPS with an input resolution of 640×640. Overall, the proposed method substantially improves detection accuracy while maintaining real-time performance, demonstrating strong practicality for engineering applications.
  • DONG Xianzhe, WANG Xiaoheng, LI Jing
    Accepted: 2026-04-15
    In recent years, Multimodal Large Language Models (MLLMs) have advanced rapidly, making the deployment of efficient inference services increasingly challenging. Existing online inference scheduling strategies, such as continuous batching and stall-free scheduling, are primarily designed for text-only large language models. They typically merge the encoding and prefill stages of requests into a single scheduling unit. However, multimodal inputs require significantly longer and more variable processing times during the encoding stage. Employing these coarse-grained scheduling approaches can easily lead to computational resource idling, increased inference latency, and ultimately constrain the overall effective throughput of the system. To address this issue, this study proposes an online inference scheduling strategy, named STEP (Stage-based Time Estimation Priority Scheduling), aimed at enhancing the effective throughput for MLLMs. The key innovation of STEP lies in fine-grained stage decoupling and scheduling of the inference process. Specifically, the multimodal inference pipeline is decomposed into three independently schedulable stages: encoding, prefill, and decoding. Furthermore, STEP employs a lightweight execution-time prediction model trained on historical profiling data to accurately estimate batch execution time under TPOT(Time per Output Tokens) requirements. Finally, a priority-based scheduling mechanism is introduced to accommodate diverse TTFT(Time to First Token) requirements across requests. Experiments were conducted on five open-source multimodal datasets covering tasks such as visual question answering and image understanding and were compared against several baseline methods. The results demonstrate that through stage-aware fine-grained scheduling and execution time prediction, the STEP strategy effectively adapts to the inference characteristics of MLLMs and significantly improves the effective throughput efficiency of online inference systems.
  • CHEN Wenjie, LIANG Yin, DU Mingjing, HUANG Yaosheng, LIU Yanjie
    Accepted: 2026-04-14
    Aiming at the problems of limited pixel resolution, significant scale variation, and dense distribution of small objects in UAV-aerial images, an improved algorithm named SAM-YOLOv12n based on YOLOv12n is proposed. In the backbone network, a Dual-Attention Coupled C2f for Small Objects (DA-C2f-S) module is designed. By introducing a multilevel feature extraction structure and a dual attention mechanism, the module effectively enhances the ability to capture fine features such as edges and textures of small objects. A Multi-Scale Fusion Convolution (MSFConv) module is constructed, which takes Dilated Depthwise Separable Convolution (DDSConv) as the core and designs differentiated branches with various dilation rates. This achieves cooperative modeling of local details and global contextual features, compensating for the limitations of a single-scale receptive field, and better adapting to the scale fluctuation characteristics of small aerial objects. Experimental results on the VisDrone2019 dataset show that the improved method achieves improvements of 9.9% in mAP@0.5 and 7.2% in mAP@0.5:0.95 compared with the baseline YOLOv12n, validating its effectiveness for small object detection in complex aerial scenarios. Generalization experiments conducted on the TinyPerson ultra-small object dataset and HIT-UAV infrared aerial dataset verify the cross-domain adaptability of the proposed method across different aerial scenes. Its core advantage lies in effectively balancing detection accuracy, model complexity, and inference efficiency, providing reliable technical support for real-time object detection tasks in UAV aerial imaging.
  • Kangyi Zheng, Ji Zhang , Bingyu Lin , Tian Yang Ningyi Liu
    Accepted: 2026-04-14
    Semi-supervised feature selection is a powerful tool in machine learning for processing large-scale partially labeled data. However, most existing feature selection algorithms are hindered by challenges such as insufficient computational efficiency, limited scalability, and inadequate accuracy. Related family is a high-efficiency feature selection framework based on granular computing; while it excels in large-scale data scenarios, it remains incapable of handling partially labeled data. To address this, this paper proposes a semi-supervised algorithm based on related family (SRF). First, a redundancy-free granulation method, termed consistent granulation, and a importance degree matrix are introduced to construct a novel related family. This facilitates the design of a semi-supervised feature evaluation method that reduces the complexity from quadratic to linear, effectively overcoming bottlenecks in computational efficiency and scale. Second, to further enhance classification performance, three strategies are implemented: 1) strengthening the data representation capability of information granules; 2) it balances the consistency and the quality of information granules,which are jointly used to evaluate feature importance; and 3) predicting pseudo-labels based on the selected high-quality feature subset to reduce noise interference. Experimental results on 12 public datasets demonstrate that, compared with four representative algorithms—SemiFREE, Semi2MNR, LMSFS, and GMSFS, SRF improves the classification accuracy by 0.88%, 2.34%, 2.81%, and 2.58% respectively. Meanwhile, it enhances the computational efficiency by 36.70 times, 841.56 times, 6.52 times, and 17.04 times respectively. These results verify the effectiveness and efficiency of the proposed method in handling large-scale partially labeled data.
  • LIU Jiaqi, CHENG Xiaona
    Accepted: 2026-04-14
    Federated learning achieves privacy preservation and collaborative modeling through the distributed paradigm of “data staying local and model being shared.” However, existing schemes show clear limitations in client selection efficiency, malicious node defense, and fairness of incentive allocation. This paper proposes a dynamic malicious node identification mechanism, named GIFL, to jointly optimize malicious node detection, efficient client selection, and dynamic incentive allocation. GIFL adopts a lightweight greedy screening strategy to filter low-contribution and high-cost clients. An influence factor dynamic updating mechanism based on model parameter deviation is used to accurately identify and remove malicious nodes. A dynamic reward payment strategy is designed by jointly considering historical and real-time contributions. Experiments on the Fashion-MNIST, CIFAR-10 and Tiny-ImageNet datasets demonstrate that in cross-device federated learning scenarios where the proportion of malicious nodes is 5%-30%, GIFL significantly outperforms five benchmark methods, including FedAvg and IAFL. The malicious node identification accuracy is improved by 5.4% to 23.9%. Compared with QAIM, the pre-selection time is reduced by an average of 86.1%. Model convergence stability and social welfare are significantly enhanced. Under the condition that model accuracy is not lower than 92% (Fashion-MNIST, CIFAR-10) and 88% (Tiny-ImageNet), the average server cost is reduced by 16.94%. The results indicate that GIFL provides an effective and reliable solution for federated learning in mobile edge networks.
  • Zhang Peng, Zhao Guosheng , Wu Xiaosheng
    Accepted: 2026-04-14
    Addressing issues such as limited adaptive capacity, insufficient adversarial robustness, and inadequate consideration of defense costs in dynamic defense models, an asynchronous advantage actor-critic adaptive dynamic defense model that integrates meta-learning and adversarial training is proposed. This model formalizes the defense process as a partially observable Markov decision process (POMDP), designs a reward function that incorporates penalties for false positives/negatives and operational costs, and constructs a three-layer collaborative optimization framework: the inner layer implements efficient strategy search based on the asynchronous advantage actor-critic algorithm; the middle layer introduces projection gradient descent adversarial training to enhance robustness under adversarial perturbations through a minimax game; the outer layer employs model-agnostic meta-learning to construct a meta-optimizer, enabling the model to quickly adapt to new attacks based on a small number of samples. Experiments on the NSL-KDD, UNSW-NB15, and CICIDS2017 datasets show that the model achieves an optimal defense decision rate (ODR) exceeding 92%, with an average reduction in defense resource consumption of approximately 60%. Under high-intensity perturbations, the attack success rate (ASR) remains below 38.2%, with no performance collapse; the detection accuracy for zero-day attacks can be improved to over 88%. This research provides a feasible path for constructing an intelligent dynamic defense system with high adaptability, strong robustness, and high efficiency.
  • LIU Jiale, DENG Weisi, HU Jiaqiu, JING Zhaoxia, ZOU Wenzhong
    Accepted: 2026-04-14
    In new energy power generation systems, missing data severely constrains the reliability of equipment condition assessment and fault prediction. The data in such scenarios typically exhibit high complexity, long-term dependencies, and strong volatility, making conventional imputation techniques inadequate in terms of both accuracy and generalization. To address these limitations, this paper proposes AFMFormer, an adaptive frequency-aware multi-scale transformer designed for imputation in new energy systems. Initially, Pearson correlation coefficients and maximal information coefficients are employed to select informative multivariate features, thereby enhancing the relevance and quality of the input data. AFMFormer integrates an adaptive frequency-domain feature enhancement module that performs frequency decomposition and dominant frequency amplification, emphasizing critical components within complex long sequences. Furthermore, two parallel temporal branches—a Patch-based Transformer for short-term dynamics and a Standard Transformer for long-term dependencies—jointly capture comprehensive temporal representations. Finally, a feature fusion mechanism combines the outputs of both branches to generate the imputed sequences. The experimental results show that the evaluation metrics of the proposed model are all significantly better than the baseline method, in which the mean square errors on the wind and PV datasets are reduced by 49.3% and 31.5%, respectively, compared with the optimal baseline model, which significantly improves the imputation effect.
  • WANG Jiongjiong, ZHANG Shufen, DAI Jiajia, ZHANG Hanrui, ZHANG Yi
    Accepted: 2026-04-14
    Federated learning trains models by sharing model parameters rather than raw data, but it remains vulnerable to inference attacks, which motivates the integration of differential privacy techniques. To address the limitations of static parameter partitioning and uniform noise injection in conventional Differentially Private Federated Learning (DP-FL), this paper proposes an adaptive differentially private federated learning framework with parameter personalization, termed DP-FedADC. The framework introduces Adaptive Parameter Partitioning (APP) to dynamically analyze model parameters and to separate personalized parameters from shared parameters according to their importance. Based on this partitioning, a Differentiated Parameter Update (DPU) strategy is designed to apply distinct regularization constraints to different parameter types, which stabilizes critical parameter updates and mitigates the distortion of optimization directions caused by gradient clipping. In addition, a Client-level Adaptive Privacy Budget Allocation (CAPBA) strategy is proposed to dynamically adjust privacy budgets according to the proportion of personalized parameters at each client, enabling stronger protection for high-sensitivity clients while avoiding excessive noise perturbation on parameters that dominate global convergence. Experiments conducted on MNIST, CIFAR-10, and Fashion-MNIST demonstrate that under strict differential privacy constraints, DP-FedADC consistently improves classification accuracy, convergence speed, and training stability. Compared with existing baselines, the proposed method achieves up to a 2%–4% improvement in test accuracy and converges to a lower loss range, validating its effectiveness and robustness in differentially private federated learning scenarios.
  • CAO Fu, XING Wenbin, ZUO Yong, ZHANG Ronghui, CHEN Junzhou
    Accepted: 2026-04-14
    Unstructured road segmentation is a crucial component of environmental perception for autonomous driving, facing challenges such as the integrity of global topological modeling, the preservation of boundary details, and the trade-off between model efficiency and accuracy. To address these challenges, this paper proposes a Lightweight Axial Context Network (AXON-Net). Employing an encoder-decoder architecture, the network introduces a Channel-and-Spatial Attention Block (CASAB) in the encoder, which adaptively recalibrates feature weights by aggregating multi-dimensional statistical information to effectively suppress environmental noise, thereby enhancing feature discriminability in complex backgrounds. A Lightweight Partial Context Transformer (LightPCT) is designed at the bottleneck, utilizing a partial channel interaction strategy to reduce computational redundancy and efficiently capture long-range dependencies to restore road topological connectivity. Furthermore, the decoder integrates Dual-Path Channel Fusion (DPCF) and Thin Structure Enhancer (TSE) modules, aiming to bridge the feature semantic gap and explicitly enhance axial geometric features for the refined recovery of blurred road edges. Experimental results on unstructured road datasets constructed from the India Driving Dataset (IDD) and the Off-Road Freespace Detection (ORFD) dataset show that AXON-Net achieves road Intersection over Union (IoU) scores of 95.3% and 88.1%, respectively, with only 8.49 M parameters, achieving a superior balance between segmentation accuracy and model efficiency. Ablation studies further validate the synergistic effectiveness of the proposed modules, demonstrating the network's potential application in unstructured road perception tasks.