Author Login Chief Editor Login Reviewer Login Editor Login Remote Office
Home Browse Just accepted

Just accepted

Please wait a minute...
  • Select all
    |
  • YU Yang, QU Haicheng and LIU Lamei
    Accepted: 2026-03-20
    To address the challenges of label scarcity and fine-grained feature alignment in rolling bearing fault diagnosis under variable speed conditions, this paper proposes a Category-Aware Contrastive Learning (CACL) method driven by coupled time-frequency attention for unsupervised cross-domain diagnosis. First, for feature extraction, a coupled time-frequency attention module is constructed to simultaneously extract discriminative features from both time and frequency domains of fault signals while enhancing sensitivity to long-tail distributions and incipient faults. Second, the extracted deep discriminative features are fed into a graph convolutional network with multiple receptive fields, where a graph generation layer constructs adaptive topological relationships among samples, and deep feature modeling and optimization are performed on the constructed sample topology. Finally, to explicitly optimize the structural consistency and categorical discriminability of the graph feature space, a cross-domain category-aware contrastive learning mechanism is designed. By constructing positive contrastive relationships among cross-domain intra-class samples and negative contrastive relationships among inter-class samples, fine-grained alignment of feature distributions and semantically consistent cross-domain transfer are achieved for samples of the same category from source and target domains. The proposed method achieves average accuracies of 90.67% and 93.67% on the public CWRU and JNU datasets, respectively, representing improvements of 4.68 and 1.69 percentage points over the second-best comparative methods, thereby validating its effectiveness for unsupervised fault diagnosis across multiple variable speed cross-domain transfer tasks.
  • ZHAO Wangpeng, CHEN Tao, LI Wei, NAN Longmei, DU Yiran
    Accepted: 2026-03-19
    Polynomial multiplication accounts for more than 80% of the computational time in lattice-based cryptographic operations. Polynomial multiplication based on the Number Theoretic Transform (NTT) can reduce the computational complexity of polynomial multiplication from to . However, compared with other implementation methods, polynomial multiplication based on the NTT algorithm is more complex in data scheduling and more difficult in memory mapping. At present, memory mapping schemes tailored for specific algorithms are limited by algorithm parameters and hardware characteristics, resulting in poor scalability. Memory mapping schemes for reconfigurable polynomial multiplication incur significant overhead in control and storage units, leading to low area efficiency of polynomial multiplication architectures. To address the above issues, this paper proposes a conflict-free memory mapping scheme based on partial constant geometry transformation, which can support lattice-based cryptographic polynomial multiplication operations that meet the condition . A conflict-free data scheduling scheme is proposed to avoid write-write conflicts during the mode transition of polynomial multiplication and data conflicts in the polynomial point multiplication stage. In addition, to avoid read-write conflicts in memory units during data scheduling, a multi-Bank storage scheme with cyclic shift storage is proposed, which can reduce the control complexity and cut down the storage capacity by 37.5% compared with the classic ping-pong storage method. To further demonstrate the superiority of performance, the polynomial multiplication architecture based on the conflict-free memory mapping scheme was experimentally verified on the FPGA xc7v2000tflg1925. Compared with the relevant literature, the conflict-free memory mapping scheme proposed in this paper exhibits higher area efficiency.
  • Wu Wenxin, Xu Guotian, Zhu Guangrui
    Accepted: 2026-03-18
    While novel mainstream domestic V2Ray-type encrypted proxy protocols protect user privacy, covert channels are provided for cybercrime. Accurate identification of such traffic has become a new research hotspot in cyberspace governance. To evade regulation, these protocols often employ traffic variant techniques, making them more camouflaged and difficult for existing methods to detect effectively. To address this issue, an encrypted proxy traffic detection model is proposed, AG-CTNet, based on dynamic fusion of multimodal features, to identify V2Ray-type encrypted proxy traffic employing various camouflage strategies. To address the scarcity of existing public datasets, an encrypted proxy traffic sample library is constructed through independent data collection and introduce data augmentation strategies to improve model robustness. For the traffic variant camouflage problem, a parallel fusion architecture of 2D-CNN and Transformer is adopted, innovatively introducing cross-modal attention and dynamic gating mechanisms to achieve adaptive fusion of multimodal features. Experimental results show that the model in this paper achieves an accuracy of 98.62% and a precision of 98.41% for identifying V2Ray-type encrypted proxy traffic, effectively improving the accuracy of traffic identification.
  • Chen Qiongbin, He Yulin, Cui Laizhong, Huang Zhexue
    Accepted: 2026-03-18
    Time series mining plays a pivotal role in domains such as renewable energy, meteorology, and finance, with growing interest in the analysis of multivariate multi-step time series. Existing deep neural network-based approaches for multivariate multi-step time series forecasting often suffer from complex model architectures and large-scale parameterization. These characteristics lead to substantial computational demands and high training costs. Moreover, most current prediction models focus predominantly on the time domain, processing either channel-independent or channel-mixer information, which limits their ability to simultaneously capture both correlated and independent channel features. This restriction can lead to reduced prediction accuracy, particularly when training data is scarce. To overcome these limitations, we propose a lightweight dual-channel time-frequency cross-attention network for multivariate multi-step time series forecasting. The network extracts both independent and mixed channel representations in the frequency domain and integrates them with the original time-domain signals via an attention-based fusion mechanism. This design enables the model to jointly leverage time-domain and frequency-domain information, thereby capturing global spatiotemporal dependencies more comprehensively. We evaluate the proposed method against eight state-of-the-art time series forecasting models on eight publicly available datasets. Experimental results show that, for example, on the representative ECL dataset, our model achieves improvements over Autoformer (NeurIPS 2022) of 17.55%, 12.87%, and 14.72% in MSE, MAE, and SMAPE, respectively. Furthermore, compared with Crossformer (ICLR 2023), our approach reduces number of parameters by 30.82%, and achieves a 66.07% reduction in training time relative to Pyraformer (ICLR 2021). These results demonstrate that the proposed network is an effective and efficient solution for multivariate multi-step time series forecasting.
  • LU Anwen, ZENG Tianhao, JIAO Yiping, LIU Mingxin, GONG Hongyi, CHEN Jun, XU Jun
    Accepted: 2026-03-18
    Primary liver cancer is a highly prevalent digestive system malignancy worldwide, predominantly comprising intrahepatic cholangiocarcinoma (ICC) and hepatocellular carcinoma (HCC). Clinical practice demonstrates that precise histological subtyping and clinical staging of these subtypes are critical for guiding personalized treatment strategies and prognosis evaluation. However, effectively exploiting cross-scale features for multi-task pathological analysis remains challenging due to the high heterogeneity of liver cancer and the complex coexistence of macroscopic tissue structures and microscopic nuclei in whole slide images (WSIs). To address this problem, this study proposes a weakly supervised Dual-Branch Multi-Source Feature Fusion (DBMSF) model. This model integrates multi-scale deep features extracted by the CHIEF foundation model and handcrafted features derived from HoVer-NeXt nuclei segmentation. Specifically, the deep branch employs a multi-scale alignment module for feature interaction, while the handcrafted branch utilizes a graph convolutional network (GCN) to dynamically aggregate nuclei information, capturing a comprehensive representation of the tumor microenvironment. Finally, a multi-source fusion module dynamically integrates these features. Multi-task evaluations on a private ICC cohort and the public TCGA-LIHC cohort demonstrated that DBMSF achieved an AUC of 88.5% and accuracy of 75.6% for ICC subtyping, and an AUC of 82.4% and accuracy of 71.5% for HCC T-stage prediction. These experimental results indicate that DBMSF significantly outperforms state-of-the-art methods, demonstrating robust effectiveness and promising clinical application potential for multi-task pathology analysis.
  • LI Hao, MA Zhenzhe, CHENG Lan, XU Xinying
    Accepted: 2026-03-18
    Pedestrian detection on unloading platforms in waste incineration power plants remains challenging due to complex lighting interference and significant variations in pedestrian scales. Existing pedestrian detection methods exhibit limitations in shallow edge feature extraction, multi-scale feature fusion, and lightweight detection head design. To address these issues, this paper proposes a pedestrian detection model named MS-ADFF, which is based on multi-scale aggregation-diffusion feature fusion. Firstly, an edge feature enhancement module is developed. By reinforcing contour information within shallow features, this module effectively mitigates the adverse impact of image detail blurring under complex lighting conditions. Secondly, a multi-scale aggregation-diffusion feature fusion network is constructed, performing two rounds of feature aggregation and diffusion operations on the P3, P4, and P5 feature levels, which effectively integrates multi-scale semantic features through aggregation and diffusion mechanisms, thereby enhancing the model’s capability to perceive pedestrians targets of different scales. Finally, a lightweight shared detection head constructed using deep convolution and group convolution is proposed, which replaces the traditional dual-branch structure with a shared feature extraction mechanism, effectively suppressing redundant parameters while maintaining detection accuracy. Experimental results show that, with YOLOv11s as the baseline model, the proposed MS-ADFF model achieves a detection accuracy of 92.7% on the self-built WIPPID dataset, with Recall and mAP@0.5 improved by 4.6% and 1.5% respectively compared to the baseline model, while reducing 0.7 GFLOPs in floating-point operations. On the public CityPersons dataset, the MS-ADFF model improves detection precision by 1.9% over the baseline model, with a reduction of 0.7 GFLOPs. These results demonstrate that, under the condition of overall floating-point operations being lower than those of the baseline model, the proposed method effectively enhances pedestrian detection accuracy in unloading platforms of waste incineration power plants, while also exhibiting strong generalization ability and robustness in street-scene pedestrian detection tasks.
  • Wei Wei, Yu Chenchen, Wang Di
    Accepted: 2026-03-17
    Visual Simultaneous Localization and Mapping is a core technology in the field of mobile robotics. Traditional VSLAM methods primarily rely on manually designed features and geometric constraints, facing numerous challenges in complex environments. In recent years, deep learning-based approaches have provided new solutions to address these challenges. This paper reviews the research progress of deep learning-based VSLAM from a problem-driven perspective. Firstly, the basic system framework of VSLAM is introduced, and the main challenges it faces are analyzed. The review focuses on three key issues: for dynamic interference, it analyzes dynamic detection methods based on semantic segmentation and semantic-geometry fusion; for illumination variations, it systematically reviews robust frontend designs based on image enhancement, exposure control, and learned feature extraction; for lightweight and real-time deployment requirements, it discusses the application of network model compression and hardware acceleration techniques on edge devices. It also briefly discusses representative solutions for challenges such as texture deficiency, fast motion, scale uncertainty, large-scale environments, and long-term operation. This paper starts from the key issues that restrict the performance of VSLAM in practical applications, constructs a problem-driven analysis framework, and reveals the differences in the applicability of different technical routes in complex scenarios. Finally, it summarizes common evaluation metrics and public datasets, and provides a conclusion with outlooks on future research directions.
  • LI Pu-cong, JIANG Rui , WANG Si-zhe , YAN Wen-jun
    Accepted: 2026-03-17
    Click-Through Rate (CTR) prediction is a core task in recommender systems and online advertising, and its performance highly depends on effective feature interaction modeling. Existing methods suffer from several limitations when modeling higher-order feature interactions, including the neglect of domain-level semantic information, the introduction of redundant noise by higher-order interactions, and excessive sharing of input feature representations, which jointly restrict further performance improvement. To address these issues, this paper proposes a CTR prediction model that integrates gated field-aware interactions with soft feature selection. Specifically, a soft feature selection layer is first employed to adaptively reweight embedded features through continuous learnable weights, enabling better adaptation to different interaction networks. Then, a field-aware interaction module is introduced to explicitly model higher-order feature interactions at the field level, so as to preserve domain-level semantic information. Meanwhile, an information gating component is incorporated to dynamically filter key interaction features, effectively suppressing redundant noise. Experimental results on four public datasets, including Criteo, Avazu, MovieLens, and Frappe, show that the proposed model achieves consistent improvements in terms of AUC and LogLoss. For example, compared with the best-performing baseline methods on each dataset, the proposed model improves AUC by 0.12% and 0.13% and reduces LogLoss by 0.11% and 0.14% on Criteo and Avazu, respectively, while maintaining comparable model parameter size and training efficiency. These results demonstrate that the proposed model achieves a favorable balance between prediction accuracy and computational efficiency, indicating strong potential for practical applications.
  • ZHANG Yuzhang, TIAN Le, WEI Huali, LIN Yumao, LV Shibin, GUO Maozu
    Accepted: 2026-03-17
    In cloud computing environments, workloads and resource states change continuously over time, which often causes reinforcement-learning-based scheduling policies to suffer from unstable randomness during online execution, leading to increased energy consumption or degraded response time. Conventional Soft Actor–Critic (SAC) mainly relies on temperature tuning during training to control policy randomness, and thus struggles to adapt promptly to non-stationary workloads in real systems. To address this issue, this paper proposes an entropy-supervised Soft Actor–Critic algorithm for online cloud task scheduling, referred to as ESAC. Without altering the original training structure, ESAC introduces a policy entropy supervision mechanism during inference to monitor policy randomness in real time and triggers lightweight entropy feedback fine-tuning when the entropy deviates from a stable range, enabling fast correction with constant computational cost. In addition, sliding-window reward normalization and periodic incremental updates are employed to alleviate numerical instability caused by reward scale drift under dynamic workloads. Experiments based on dynamic workload simulations constructed from the Alibaba Cluster Trace 2018 demonstrate that ESAC consistently outperforms several representative scheduling algorithms under different load intensities and burst scenarios, reducing the average energy consumption per task by about 1.8% and the average response time by up to 3.01%. Compared with the A2C baseline, ESAC achieves improvements of 70.7%, 76.0%, and 76.2% in the composite performance metric under three load scenarios, while maintaining acceptable online scheduling overhead. These results verify the effectiveness of the proposed method in enhancing the stability and adaptability of online scheduling in non-stationary cloud environments.
  • HE Yu-lin, HE Jia-hao, MO Pei-heng, KAN Zheng, CUI Lai-zhong, HUANG Zhe-xue
    Accepted: 2026-03-17
    Big data processing frameworks like Apache Spark have gained significant attention due to their widespread applications in large-scale data analysis. However, it is difficult to balance computing costs and runtime performance by relying solely on a single deployment mode (e.g., on-premises or cloud-based), especially when handling data-intensive tasks. Hybrid cloud deployment combines local resources and public cloud resources to offer a flexible and efficient solution that is able to balance the cost and performance. The job scheduling in hybrid cloud environments faces numerous challenges, including optimizing resource utilization and job execution costs. Existing scheduling algorithms often fail to fully account for the directed acyclic graph (DAG) structure of Spark jobs and the characteristics of multi-stage scheduling. This leads to prolonged job execution times in scenarios with parallel jobs and inability to reduce costs in an effective way. To address these issues, this paper proposes an innovative cost-aware particle swarm optimization (CA-PSO) scheduling algorithm for Spark jobs. By incorporating a cost model, the algorithm includes the rental costs of virtual machine (VM) instances in its optimization objectives and dynamically adjusts resource allocation strategies to minimize resource usage while meeting performance requirements, thereby reducing cluster operational costs. Additionally, the scheduling algorithm leverages the DAG dependency structure of Spark jobs and introduces a multi-Spark job, multi-stage scheduling mechanism to optimize resource allocation strategies and stage execution order. This approach not only effectively reduces cluster costs but also significantly improves the overall performance of multi-job scheduling in a hybrid cloud environment. Simulation and real-cluster experimental results demonstrate that, compared to existing scheduling algorithms, the CA-PSO Spark job scheduling algorithm exhibits excellent scalability, adapts to different VM pricing models and various Spark job types, and can reduce the usage cost of hybrid clusters.
  • Tian Feng, Li Xiang , Liu Fang, Zhang Yan, Xie Hongtao, Han Yuxiang, Fang Chao
    Accepted: 2026-03-17
    The rapid development of deepfake technology in recent years brings new opportunities in fields such as entertainment and education, but also causes serious cybersecurity and privacy issues. Current deepfake video detection methods face two main challenges. First, encoding artifacts and noise in low-quality and highly compressed videos can hide subtle forgery traces. Second, existing approaches have difficulty modeling temporal inconsistencies between video frames and lack deep fusion of spatiotemporal features. To solve these problems, this paper proposes a detection model called MSST based on multi-scale spatiotemporal feature fusion. The method builds a complete framework with multi-scale spatial feature extraction, frequency-domain feature enhancement, and multi-scale temporal feature extraction. First, a multi-scale Transformer encoder extracts spatial features at different levels. A learnable frequency-domain filter is used to improve the detection of high-frequency forgery traces. At the same time, a multi-scale temporal Transformer models temporal inconsistencies between frames to capture short- and long-range dynamic anomalies. The model also designs a gated cross-attention module to fuse spatiotemporal features. This module enables dynamic cross-modal interaction and produces more discriminative fused features. Tests on the FF++ (LQ), Celeb-DF, and DFDC datasets show that MSST achieves ACC scores of 92.73%, 96.61%, and 95.15%, and AUC scores of 0.965, 0.981, and 0.976. Compared to current mainstream methods, the proposed approach gives better accuracy and generalization.
  • Duan Yaning, Guo Shuai, Chen Tao, Sun Yongqiang, Zhang Weishan
    Accepted: 2026-03-16
    Digital twin systems for Industrial Internet of Things (IIoT) operating under federated learning face dual challenges: catastrophic forgetting caused by continuously evolving data distributions and model knowledge erosion resulting from intermittent device offline behavior. To address these issues, this paper proposes a Knowledge-Persistent Federated Evolutionary Learning (KPFEL) framework that systematically mitigates knowledge forgetting through a coordinated "Storage-Constraint-Inheritance" mechanism. The framework comprises three core modules: (1) A knowledge persistence storage module that maintains independent storage units for each edge device on the server side, employing a momentum-based update strategy to preserve historical knowledge contributions from offline devices; (2) A knowledge-constrained aggregation module that treats historical gradient update directions as optimization constraints and efficiently computes global update trajectories compatible with historical knowledge via quadratic programming; (3) A generator knowledge inheritance module that synthesizes high-quality historical-class samples for data-free knowledge replay by integrating parameter inheritance, knowledge alignment, and adversarial training. Theoretical analysis proves that the framework achieves an convergence rate. Experiments on CIFAR-100, Tiny-ImageNet, and Stanford Cars datasets demonstrate that the proposed method yields an average improvement of 3.07 percentage points in classification accuracy and a reduction of 3.79 percentage points in forgetting rate over state-of-the-art baselines. Under extreme settings with only 20% client participation, the accuracy drop is limited to 5.21% compared to 15.84% for the baseline, exhibiting strong robustness against intermittent device offline behavior and providing an effective solution for privacy-constrained IIoT digital twin applications with continuously expanding categories.
  • JIA Xiao, LUO Hao, ZHANG Xinyue, YU Jiaheng, ZHU Kai, LI Jing
    Accepted: 2026-03-12
    Sequential recommendation effectively captures the dynamic evolution of user interests. However, systems relying on single-domain data often face challenges like data sparsity and recommendation homogeneity. Cross-domain sequential recommendation was proposed to address these issues by integrating user behavior sequences from multiple domains, which alleviates data sparsity and enables a more comprehensive modeling of user interest dynamics. However, existing methods often employ a uniform global strategy when fusing cross-domain interaction information, neglecting the diversity and complexity of user interests. Moreover, simple graph structures are insufficient to capture complex high-order interaction features between users and items, resulting in incomplete representation of cross-domain interaction information. To address these issues, this paper proposes an interest-enhanced cross-domain sequential recommendation model based on graph and hypergraph fusion. To tackle the problem of insufficient mining of deep-seated user preferences, a capsule network structure was introduced in the private domain. Its dynamic routing mechanism adaptively aggregated contextual information from item embeddings in sequences, extracting multiple potential user interest points to supplement single-domain user profiling. In the shared domain, a hybrid architecture combining Graph Neural Networks and Hypergraph Neural Networks was proposed to overcome the limitations of traditional graph structures in capturing complex group associations and higher-order interactions. This design enabled comprehensive capture of user preference features across different dimensions through multi-level feature interactions, enhancing the representational capacity for cross-domain behavioral dependencies. Subsequently, the user's unique preferences and general preferences were deeply integrated through a sequence relation learning module and a contrastive learning module, generating a complete user preference embedding. Experimental validation on the Hvideo and Amazon datasets showed that compared to the strongest baseline models, the proposed HGIE-CDSR model achieved average improvements of 4.95% and 8.39% in MRR, and 3.58% and 14.37% in NDCG, respectively. Ablation study results further verified the effectiveness of each module within the model.
  • Luo Hao, Yiran Xin, Yunqi Tang
    Accepted: 2026-03-11
    In recent years, generative image technology based on diffusion models has achieved breakthrough progress, with text-to-image models represented by Stable Diffusion, DALL-E, and Midjourney being widely applied in commercial and creative fields. However, highly realistic AI-generated images have also brought challenges to information authenticity, giving rise to social issues such as misinformation dissemination and copyright infringement. To effectively address these challenges, this paper systematically reviews the latest research progress in detection technologies for images generated by diffusion models. First, it outlines the development trajectory of diffusion models from principles and basic frameworks to large-scale applications. Second, it summarizes the evolution of dataset construction, pointing out that dataset development is progressing from using few generators and low resolutions toward multi-model integration and high-quality multi-level filtering. Third, it analyzes three mainstream approaches in detection technology: detection technologies based on implicit features, detection technologies based on explicit features, and detection technologies based on hybrid features. Finally, it analyzes the main challenges facing current detection technologies and provides an outlook on future research directions. This review offers researchers and practitioners a comprehensive technical landscape and reference for development trends.
  • HUO Jiuyuan, KAN Jiayun, YANG Jiguang, ZHENG Shannong, CAO Fang
    Accepted: 2026-03-11
    To address the problem of uneven cluster head load caused by traditional clustering methods in wireless sensor networks (WSNs), this paper proposes a clustering algorithm for WSNs constrained by Vertex-Sum Reducible Edge Coloring (VSRECUC). From a graph-theoretic perspective, the node-to-cluster association and cluster head load are modeled by abstracting the network clustering structure as a multi-star graph. The theory of vertex-sum reducible edge coloring is introduced, where the node association cost is mapped to edge coloring values, and the chromatic sum of each cluster head is used to characterize its communication load, thereby theoretically constraining the load balance among different cluster heads. In the cluster head election stage, residual energy and local node density are jointly considered to construct a candidate cluster head selection function, which is combined with a competition radius mechanism to effectively alleviate the “hotspot problem” caused by cluster head overload near the sink node. In the clustering stage, a node reassignment strategy constrained by vertex-sum reducible edge coloring is proposed. The CRITIC method is employed to determine the weights of competition radius and residual energy, dynamically calculate the cluster head load threshold, and guide nodes to be reasonably reassigned among different cluster heads, ensuring that the load of each cluster head matches its resource capability. Simulation results demonstrate that, in terms of network lifetime, the proposed VSRECUC algorithm extends the lifetime by 369.1%, 59.9%, 116.1%, 57.2%, and 55.7% compared with MH-LEACH, ESPC, EEUC, FSCVG, and BEBMCR, respectively. Moreover, it exhibits significant advantages in performance metrics such as cluster head number control and energy consumption balance. The results indicate that introducing vertex-sum reducible edge coloring theory into WSN clustering design provides a novel modeling perspective and an effective approach for achieving load balancing and network lifetime optimization.
  • Dawei Zhang, Kangbo Kou, Yi Liu, Wei Guo, Yang Yu
    Accepted: 2026-03-11
    High-precision semantic segmentation enables autonomous vehicles to obtain detailed environmental perception. To address the limitations of traditional methods on fisheye images, such as poor edge segmentation, low accuracy, and insufficient training data, we propose RSCAMamba, a model specifically designed for fisheye image segmentation. A zoom augmentation method is employed to transform standard datasets into fisheye datasets, allowing effective modeling of fisheye distortions and ensuring robustness across diverse scenarios. RSCAMamba first adopts a Swin Transformer encoder to capture global feature representations. Second, we propose the restricted spatial-channel attention module. By integrating one-dimensional and two-dimensional restricted deformable convolutions, the module adaptively models distortion-aware nonlinear features and effectively captures anisotropic deformations. Consequently, it provides more accurate representations of strip-like structures and irregular edges. In addition, a channel reduced and edge increased module further enhances edge details, alleviating distortion-induced degradation. Finally, the Mamba module fuses global features, captures long-range dependencies, and reduces redundancy across scales. This helps the model detect complete objects and preserve spatial continuity. Experimental results indicate that, compared with Mask2Former, RSCAMamba achieves a 1.88% improvement in mIoU on the WoodScape public dataset and a 3.30% improvement on the CityScapesFisheye synthetic dataset, demonstrating superior segmentation performance.
  • ZHANG Xin, YI Huawei, ZHAO Mengyuan, WANG Yanfei , LAN Jie
    Accepted: 2026-03-11
    Blind image super-resolution reconstruction aims to restore clear high-resolution images from blurred and degraded images in real-world scenarios. Although deep learning-based reconstruction methods have achieved some progress, the degradation models they rely on still have certain limitations. First, the blurring and noise-adding operations in the degradation process lack adaptability; second, the simulation of the degradation process is insufficient. To address these issues, this paper proposes a hybrid-order adaptive multi-dimensional degradation model. The model employs a hybrid-order degradation approach overall, consisting of two stages. The first stage is the adaptive degradation stage, which utilizes dynamic convolution to perform adaptive blurring and noise addition on high-resolution images. The second stage is the multi-dimensional degradation stage, which further processes the images generated in the first stage through distortion, brightness adjustment, rotation, and down-sampling. The proposed degradation model is integrated with classical super-resolution reconstruction networks to develop a blind image super-resolution reconstruction algorithm based on the hybrid-order adaptive multi-dimensional degradation model. To verify the effectiveness of the proposed method, comparative experiments were conducted on the Set14, BSD100, and DRealSR datasets. The results show that, compared to the PDM-SRGAN baseline method, the proposed method achieves improvements in peak signal-to-noise ratio (PSNR) of 0.84 dB, 0.63 dB, and 1.06 dB on the three datasets, respectively, in 4× super-resolution reconstruction tasks. This demonstrates that the proposed degradation model can effectively enhance the reconstruction performance and real-world adaptability of super-resolution algorithms, enabling the generation of higher-quality images.
  • Zhang Anqing, Zhuang Zhiqi, Li Zijian, Zhang Ting
    Accepted: 2026-03-11
    In recent years, cyberattacks have become increasingly frequent and sophisticated, causing economic losses and security risks for both nations and enterprises. Traditional attack detection methods analyze attack behaviors by constructing source graphs, but this approach loses some semantic information when describing attack behaviors as simple graphs, leading to poor detection performance. This study proposes a network intrusion detection model based on temporal information graph autoencoders, abbreviated as TIGAE. TIGAE generates multiple source graphs through a refined graph construction method, comprehensively recording the interaction behaviors of system entities. Subsequently, an improved linear graph algorithm was devised to transform complex graphs into simpler ones, enhancing the graph structure while preserving the original system behaviour information. A graph autoencoder was then employed to learn benign system behaviour. The experimental results on the three datasets show that the Precision increases by an average of 0.65%, the F1-Score increases by an average of 0.68%, the Recall increases by an average of 1.07%, and the FPR decreases by an average of 0.40%. Experiments demonstrate that TIGAE outperforms existing state-of-the-art methods across multiple attack detection metrics.
  • Li Zongmin, Wang Xingyu , Ma Jinyue, Bai Yun
    Accepted: 2026-03-11
    Addressing the limitations of existing lightweight Vision Transformers (ViTs), specifically the lack of explicit structural and spectral priors during token construction which leads to the loss of local high-frequency details and constrained representation efficiency, this paper proposes a novel framework named OFT-Former (Orientation- and Frequency-Aware Token Interaction Transformer). First, an Orientation-Aware Patch Embedding (OAPE) module is designed to explicitly inject horizontal and vertical spatial structural priors during initialization, thereby mitigating the insufficient geometric perception inherent in traditional embedding methods. Second, a Frequency-Enhanced Token Refinement (FETR) module is proposed, which leverages Fast Fourier Transform (FFT) to decouple frequency-domain features and integrates multi-scale convolutions to specifically enhance the preservation of high-frequency details. Furthermore, a Bidirectional Gated Token Modulation (BGTM) mechanism is constructed to establish bidirectional interaction pathways between local and global features, facilitating adaptive fusion of cross-scale representations via dynamic gating. Experimental results demonstrate that OFT-Former achieves a Top-1 accuracy of 81.4% on ImageNet-1K with only 12.8M parameters and 1.8 GFLOPs. Additionally, the model exhibits superior performance on CIFAR-100 classification and COCO object detection tasks, verifying the effectiveness of the proposed method.
  • Jia Xinyuan, Qin Jiwei , Ma Jie
    Accepted: 2026-03-04
    The dynamic graph anomaly detection method based on graph convolution utilizes graph modeling strategies to capture information about anomalous nodes or edges, and has wide applications in fields such as network security, social networks, and recommendation systems. However, these methods face two main challenges: first, it is difficult to fully learn discriminative knowledge from dynamic graphs where the graph structure and temporal information are coupled, and second, they are ineffective in detecting anomalies in nodes with no attributes. To address these challenges, a novel dynamic graph anomaly detection framework is proposed— the Bidirectional Encoder Representations from Transformers for Graph & Temporal Anomaly Detection (GTBAD). This method first designs a subgraph sampling module based on edges, which centers on target edges and constructs local substructures across multiple time slices, thereby enhancing the contextual awareness of anomaly detection. It then designs an encoding module that comprehensively considers both the graph structure and temporal aspects, aiming to better extract the structural and temporal features of each node in dynamic graphs. Additionally, BERT is employed in the downstream encoder to further extract information from dynamic graphs, enabling the model to effectively capture dynamic graphs of nodes without attributes. Finally, a discriminative anomaly detector is introduced to compute the anomaly scores of edges. Extensive experiments were conducted on four real-world datasets, with the area under the receiver operating characteristic curve (AUC) as the evaluation metric. The experimental results demonstrate that the proposed GTBAD framework outperforms existing frameworks in dynamic graph anomaly detection tasks, achieving higher AUC values, thereby providing a novel solution and approach for dynamic graph anomaly detection.
  • Ding Li , Yang Jun
    Accepted: 2026-03-04
    In order to deal with the core challenges faced by the task offloading decision in the UAV assisted mobile edge computing system, such as multi-dimensional timing coupling, dynamic environment adaptation and insufficient strategy robustness, this paper innovatively proposes a dual delay depth deterministic strategy gradient algorithm (HTAN-TD3) that integrates hierarchical timing attention mechanism. The breakthrough contributions of this study are reflected in three aspects: firstly, a composite optimization objective that integrates total system latency, worst user experience, and multi-user fairness is constructed, which breaks through the limitations of traditional single objective modeling; Secondly, a hierarchical attention network (HTAN) with macro micro dual stream temporal analysis capability was designed. Through the heterogeneous collaboration and attention weighted fusion of LSTM and GRU, accurate perception and deep mining of dynamic features at multiple time scales in the system state were achieved; Furthermore, the Ornstein Uhlenbeck process with temporal correlation is introduced to explore the noise and dynamic adaptive Huber loss function, and the algorithm is systematically enhanced from two dimensions: policy exploration smoothness and training process robustness. In a complex edge scene simulating high load, strong occlusion and multi-user competition, HTAN-TD3 is significantly superior to mainstream baseline algorithms such as DDPG and TD3 and MATOPO in key indicators such as total system delay and user fairness, demonstrating excellent environmental adaptability and decision-making intelligence. This study provides a useful reference and reference for improving the autonomous decision-making ability of intelligent edge computing systems in dynamic and complex environments.
  • Jiang Xiao, Qin Tuanfa, Sun Hongmin, Zhou Huayang, Gu Weiyu, Wang Suhong
    Accepted: 2026-03-04
    In remote and disaster-stricken areas, ground Internet of Things (IoT) devices are constrained by limited computing capabilities and insufficient communication infrastructure, making it difficult to support a large number of emergency tasks with stringent latency requirements within a short time. Existing studies mainly adopt single unmanned aerial vehicle (UAV) or low Earth orbit (LEO) satellite architectures, or treat UAVs merely as communication relay nodes, and their optimization objectives primarily focus on minimizing system latency or a weighted sum of latency and energy consumption, failing to fully exploit the cooperative computing potential of multiple UAVs and multiple LEO satellites as well as to satisfy the heterogeneous quality-of-service (QoS) requirements arising from different task priorities and latency constraints. Therefore, this paper proposes a multi-agent deep reinforcement learning–based task offloading and adaptive resource allocation strategy, termed TOARA. First, a space–air–ground integrated network (SAGIN) architecture with cooperative multiple UAVs and multiple LEO satellites is constructed and integrated with edge computing technologies to effectively alleviate ground resource limitations. In this architecture, UAVs collect ground tasks and make intelligent offloading decisions, dynamically assigning tasks to local edge nodes or LEO satellite nodes for execution. Then, the joint task offloading and resource allocation problem is formulated as a decentralized partially observable Markov decision process and solved using a multi-agent deep deterministic policy gradient (MADDPG) algorithm under a centralized training and decentralized execution framework, enabling agents to autonomously learn efficient offloading decisions and adaptive resource allocation strategies to jointly optimize task processing latency, system energy consumption, and the completion rates of tasks with different priority levels. Finally, simulation results demonstrate that, compared with several baseline strategies, the proposed algorithm reduces the average task processing latency and system energy consumption by at least 26.09% and 27.53%, respectively, while improving the completion rate of high-priority tasks by at least 22.24%, validating its effectiveness in learning efficient task offloading and resource allocation decisions in dynamic and complex environments.
  • Wang Hongyu, Cui Mingzhu, Cheng Li, Luo Weili, Dang Zheng, Shi Hanqi, Ye Hongyuan, Zhao Jintao
    Accepted: 2026-03-03
    Existing methods for detecting small targets in UAV applications suffer from limitations in feature representation and fusion capabilities, struggling to effectively handle complex backgrounds and small-scale objects due to challenges such as low pixel density, significant size variations, and susceptibility to background interference. To address these issues, VD-YOLOv11, an improved algorithm tailored for drone-captured scenes, is proposed. First, a Multi-Scale Feature Enhancement (MSFE) module augments the model’s perception of tiny objects by incorporating multi-scale contextual information and an edge detail reinforcement mechanism. Second, a Multi-Scale Feature Fusion (MSFF) module enhances small object representation through hierarchical integration of semantic and spatial features, improving detection accuracy in complex backgrounds and multi-scale scenarios. Additionally, a Receptive-Field Attention Head (RFAHead) enables dynamic interaction across multi-level features and adaptive allocation of receptive field weights, employing an attention-guided mechanism to refine focus on fine-grained small object regions. Finally, a dedicated small object detection layer is integrated with an optimized neck network, supplemented by an additional detection head to mitigate feature loss and strengthen recognition capability. Experimental results demonstrate that VD-YOLOv11 achieves 42.1% mAP50 on the VisDrone2019 dataset, surpassing the baseline YOLOv11n by 7.4%. On the PDT dataset, it achieves a mAP50 of 94.8% with a computational cost of 19.1 GFLOPs and 3.3M parameters. VD-YOLOv11 achieves an effective balance in detection accuracy, computational complexity, and model size, validating its effectiveness and practicality for UAV-based small object detection.
  • HOU Linchao, XU Yanyan, PAN Shaoming
    Accepted: 2026-03-03
    As a core of intelligent transportation systems, the efficiency and reliability routing algorithms in Vehicular Ad Hoc Networks (VANETs) directly impact critical applications like traffic safety warning, autonomous driving, and intelligent traffic management. In complex traffic, vehicle nodes' interaction makes VANETs' topology change complex and link stability fragile, challenging routing algorithms. Given this, constructing adaptable routing algorithms is crucial for the communication of VANET. To solve this, we propose a novel Neighborhood Potential-based Link-aware Routing (NPLAR) algorithm. The NPLAR innovatively constructs a neighborhood potential energy model, comprehensively quantifying the impact of the static and dynamic features of the neighborhood environment on link stability, offering a more accurate basis for routing decisions. By integrating complex network theory and graph neural networks, it effectively captures the multi-hop neighborhood propagation mechanism of neighborhood potential energy, better predicting network changes and enabling efficient routing path selection. Moreover, NPLAR integrates link stability indices with network link QoS metrics, building a multi-dimensional routing decision framework. This framework achieves adaptive decision optimization in highly dynamic environments, significantly enhancing the routing algorithm's overall performance. Experimental results show that compared with existing VANET routing algorithms, NPLAR increases the average throughput by 8.3%-35.7%. In terms of the packet loss rate, NPLAR reduces it by 6%-50.4%, and the communication delay is reduced by 11.3%-39%. These data clearly demonstrate NPLAR's superiority in enhancing network performance.
  • Gu Yudi, Di Yicheng, Di Lan
    Accepted: 2026-03-03
    Existing click-through rate (CTR) prediction methods typically rely on centralized data storage and modeling. However, due to high user privacy sensitivity and data protection regulations, user behavior data across different platforms cannot be directly shared or aggregated. At the same time, most CTR prediction models adopt deep architectures with large parameter scales, leading to high communication and computation costs that limit their practical application. To address these problems, this paper proposes an efficient Federated Recommendation System (FedRSS) based on a Slim Module and a Salience Awareness Module. FedRSS aggregates cross-platform feature representations within a federated learning framework while preserving privacy. The Slim Module replaces the traditional Hadamard product with an inner product to reduce model complexity and stacks compression layers to decrease parameters, while the Saliency-Aware Module employs a bit-level attention mechanism to dynamically assign feature weights and enhance the modeling of important features. In addition, FedRSS introduces a local differential privacy mechanism to further protect user information. Experiments on three public datasets, Criteo, Avazu, and MovieLens, show that FedRSS achieves notable improvements in both performance and efficiency, with RelaImpr increases of 11.04%, 3.38%, and 4.82%, respectively, and significantly reduced training time. The results demonstrate that FedRSS achieves efficient CTR prediction under privacy constraints and provides a promising direction for developing low-overhead federated recommendation systems.
  • LI Bin, FAN Jiawei
    Accepted: 2026-03-03
    To address the insufficient cross-domain generalization ability of existing ship target detection models and their poor detection stability under extreme noise and complex sea surface conditions in Synthetic Aperture Radar (SAR) imagery, this paper proposes an improved ship target detection algorithm, CK-YOLO, based on YOLOv12. The proposed method aims to enhance the model’s robustness and adaptability in SAR data. First, to improve the extraction of ship boundary features and strengthen contextual modeling capability, an SKC3k2 module is designed. This module enhances boundary feature representation by incorporating a Kolmogorov–Arnold Network (KAN) layer with residual connections into the original C2k2 structure, and introduces a switchable atrous convolution(SAConv) mechanism to adaptively adjust the receptive field for better multi-scale feature extraction. Furthermore, to improve the model’s dynamic modeling capacity and its ability to extract high-level semantic information, a CST module is developed. The CST module consists of a local convolution branch for spatial modeling and a sparse dynamic branch based on a Liquid Neural Network (LNN), which leverages temporal modeling advantages to enhance high-order semantic feature extraction. To validate the effectiveness of the proposed method, experiments were conducted on SAR datasets provided by the China Centre for Resource Satellite Data and Application and the LS-SSDD dataset. The results demonstrate that model A achieves improvements of 0.8% and 1.3% in mAP@50 over YOLOv12n, respectively, thereby exhibiting the best performance among all compared models. In addition, cross-domain generalization experiments using the LS-SSDD and MMShip datasets demonstrate that CK-YOLO achieves the best overall performance among the YOLO series models, showing superior robustness and generalization ability in both intra-domain SAR detection and cross-modal detection tasks. Finally, ablation studies further confirm the effectiveness and contribution of the proposed modules. The CK-YOLO model maintains a lightweight architecture while effectively reducing missed detections and false alarms in SAR images with strong noise and complex sea surface conditions.
  • Zeng Wenyan, Zhang Lei, Liu Bailong, Meng Xiang, Zhang Xuefei
    Accepted: 2026-02-12
    Accurate traffic speed prediction is critical for enhancing the efficiency of Intelligent Transportation Systems (ITS). However, contemporary end-to-end prediction models are often constrained by training data from specific regions or time periods, leading to limited generalization capabilities. Furthermore, most existing methods employ static network structures and parameter-sharing mechanisms, which struggle to capture dynamic traffic characteristics and the inherent diversity across different nodes. To address these two challenges, this paper proposes Adaptive Spatial-Temporal Masking Pre-training for Traffic Speed Prediction (ASTMP), which is divided into a pre-training stage and a prediction stage. In the pre-training stage, a dynamic adaptive graph convolutional layer is designed to provide unique weight and bias parameters for each node. By constructing an adaptive graph based on a node embedding matrix containing individual node attributes, the unique properties of nodes and the dynamic patterns governing their inter-relationships can be deeply explored. Subsequently, a spatial-temporal masking encoding layer is developed to perform random masking on long-term traffic speed sequences. A corresponding decoding layer then utilizes mask tokens to replace data at masked positions, reconstructing the original information based on contextual cues to enhance the model's adaptability and generalization performance. In the prediction stage, the dynamic spatial-temporal representations learned from long-term sequences are integrated with a short-term traffic speed predictor to achieve more precise and efficient forecasting. Experimental results on the METR-LA and PEMS-BAY datasets demonstrate that ASTMP outperforms state-of-the-art baseline methods, validating the feasibility and effectiveness of the proposed approach.
  • WANG Shixin, LI Jun, ZHAO Ning, NIE Jun, LIU Shengqiang
    Accepted: 2026-02-12
    In order to meet the needs of efficient and accurate multi-object tracking (MOT) in campus scenarios, a solution based on the improved YOLOv8 object detection algorithm and the OCSORT multi-object tracking algorithm is proposed. In view of the complex background and crowd distribution of the campus scenarios, a dataset with specific scene features is constructed to optimize the performance of the algorithm. In order to improve the accuracy of pedestrian small object detection, an efficient multi-scale attention (EMA) module is introduced, and the self-calibrated convolutions (SCConv) module is used to replace the cross-stage partial fusion (C2f) module in YOLOv8, which effectively improves the detection effect. In multi-object tracking, an innovative solution is proposed to address the problems of low association accuracy and high computational overhead. Firstly, an ID initialization (IIR) strategy based on person re-identification (ReID) is proposed, which effectively solves the problem of ID inconsistency when pedestrians reappear after leaving for a short time. Secondly, a data association strategy combining shape similarity between frames (SSF) and object box intersection over union (IoU) is designed to further improve the accuracy of object matching between consecutive frames. Finally, in order to improve the efficiency of appearance similarity calculation, a stage-wise data association (SDA) strategy is proposed, which reduces the computational overhead while ensuring high accuracy. Experimental results show that the proposed method effectively improves the accuracy of pedestrian detection and tracking in campus scenarios and exhibits good robustness and a high frame rate in complex backgrounds, providing efficient and reliable technical support for smart campus security and crowd behavior analysis.
  • WANG Kai, YUAN Shaojiang, CHEN Chenglizhao, WANG Shuo, ZHANG Yingchao, ZHANG Huaye, SUI Ruoyu, Wang Xi
    Accepted: 2026-02-12
    Water sample bag impurities refer to tiny foreign objects accidentally entering the bags during industrial production, such as iron filings, hair, and soil particles. Due to their small size, complex background, and significant interference from text labels, traditional detection methods struggle to meet the strict quality control requirements of industrial production. To address this issue, a water sample bag impurity detection method for complex industrial scenarios is proposed, which innovates at both the data and model levels. At the data level, an automated collection and detection device based on dual-view cross-validation is designed; this device uses dual industrial cameras and an electromagnetic control system to realize automatic double-sided detection and intelligent sorting of water sample bags, and based on this device, a dedicated dataset of 3000 images WBID-3K is constructed, covering all types of impurities that may appear in real industrial scenarios. At the model level, based on this dataset, a model named WBID-DETR for cross-domain feature enhancement and hierarchical information fusion is proposed. This model strengthens the high-frequency feature expression of tiny targets through a fine-grained frequency-domain feature optimizer, suppresses text label interference via a multi-scale global feature fusion module, and complements missing information using a complementary feature fusion module, thereby achieving accurate localization and identification of various tiny impurities. Experimental results show that on the self-built WBID-3K dataset, WBID-DETR achieves 4.2% and 3.5% improvements in accuracy and mAP50 respectively compared to the baseline model; on the public VisDrone2019 dataset containing complex backgrounds and dense small targets, WBID-DETR achieves 2.5% and 3.4% improvements in accuracy and mAP50 respectively compared to the baseline model. This fully demonstrates the generalization and robustness of the proposed method for small target detection tasks, providing an effective solution for automated industrial quality inspection.
  • WANG Jun, ZHANG Shengjun, ZUO Zengqiang
    Accepted: 2026-02-12
    Millimeter-wave radar offers distinct advantages for human activity recognition, including robustness in complex environments and inherent privacy preservation. However, existing recognition methods confront several challenges: low accuracy, insufficient data representation, difficulty in modeling temporal dependencies, and high computational costs. To address these issues, this paper proposes a lightweight human activity recognition method based on a novel Time-series Capture and Enhancement Module (TCM-TMEM).The proposed architecture comprises two primary components. First, the Temporal Capture Module (TCM) employs causal convolution to enhance sensitivity to local temporal patterns, while its simplified network design minimizes computational overhead. Second, the Temporal Enhancement Module (TMEM) is constructed using a parameter-efficient Transformer encoder. This module strengthens the network's ability to model long-range, global temporal correlations while preserving the model's lightweight characteristics. Furthermore, to mitigate the representational limitations of traditional range-Doppler maps, an enhanced 11-dimensional feature set is introduced. This set incorporates critical dimensions such as range, Doppler shift, and signal energy, thereby significantly improving the completeness of data representation. Experimental evaluations were conducted on the self-collected PACT dataset and the public R-IHB dataset. The proposed method achieved recognition accuracies of 89.86% and 86.63%, respectively. Importantly, the entire TCM-TMEM model contains only 0.12 million parameters. These results substantiate the effectiveness of the proposed feature construction scheme and model architecture in improving recognition accuracy, effectively capturing temporal dependencies, and substantially reducing computational resource consumption.
  • WANG Hebin, YANG Wenjun, MO Xiuliang
    Accepted: 2026-02-12
    With the wide application of the Internet of Things, a large number of devices access to the network, and its security vulnerabilities are easy to be exploited by attackers, which seriously threatens network and data security. Therefore, it is particularly important to deploy intrusion detection systems in the Internet of things environment to detect and protect abnormal traffic and intrusion behavior. However, IoT devices usually have limited computing power and insufficient storage resources, which makes the existing intrusion detection models based on deep learning difficult to be directly deployed. To solve the above problems, this paper proposes a customized lightweight intrusion detection model named FDRBT, which aims to achieve accurate detection of IoT attacks under resource-constrained conditions. In this paper, Pearson Correlation Coefficient (PCC) and Principal Component Analysis (PCA) are used to fuse feature dimension reduction, and the teacher model based on Transformer structure is gradually replaced by a more concise Poolformer structure by a progressive module replacement method. In order to compensate for the loss of representation ability in the process of knowledge distillation, the Dynamic tanh (DyT) activation function is also introduced to enhance the model, and the traditional normalization layer in Poolformer is replaced by DyT layer. This design enables the model to automatically adjust the activation properties according to the input feature distribution, achieving a normalization layer-like function without calculating the activation statistics. Experimental results on TON-IoT and CIC-BCCC-NRC-2024 datasets show that the FDRBT model achieves 99.91% and 99.96% accuracy respectively. The model also maintains a small size and low computational overhead, which is suitable for resource-constrained IoT intrusion detection scenarios.
  • CHEN Yuang, SHI Lei, TANG Zhiqing
    Accepted: 2026-02-12
    As 6G evolves toward extremely large antenna arrays (ELAA) and high-frequency bands, the near-field region in communication scenarios expands significantly. However, existing research on physical layer security for reconfigurable intelligent surface (RIS)-assisted non-orthogonal multiple access (NOMA) systems is largely confined to far-field communication scenarios. Moreover, the computational complexity is often high, which limits practical application in near-field large-scale systems. For RIS-assisted near-field uplink NOMA systems, this paper considers an uplink system comprising an access point (AP), RIS, a far user, a near user, and an eavesdropper (Eve). By jointly optimizing AP beamforming and RIS phase shifts, the system achieves maximum secrecy sum-rate. This problem is non-convex and challenging due to the Euclidean norm and unit modulus, necessitating an efficient resource allocation strategy. To address this, this paper proposes a low-complexity block coordinate descent (BCD) algorithm that decomposes the original problem into two subproblems. First, a closed-form solution for AP beamforming is derived, and then the manifold optimization is applied to obtain RIS phase shifts. MATLAB simulation results demonstrate that, under the default parameter settings, compared to random phase shift, maximum ratio transmission (MRT), and orthogonal multiple access (OMA) schemes, the proposed scheme enhances the secrecy sum-rate of the system by approximately 4.4bps/Hz, 10%, and 15% respectively. Furthermore, the proposed scheme achieves comparable performance to semi-definite relaxation (SDR) schemes while exhibiting lower computational complexity.
  • BAI Yang, PEI Mengxuan, SHI Fangyuan
    Accepted: 2026-02-12
    Genomic Structural Variations (SVs), which alter the three-dimensional conformation and regulatory networks of the genome through insertions, deletions, inversions, or translocations of large DNA fragments, are key pathogenic variants in various complex diseases. In recent years, breakthroughs in long-read sequencing and 3D genomics have significantly improved the detection capability of SVs. However, due to the complexity of SVs and the scarcity of functional annotations, predicting their pathogenicity remains a major challenge. Several methods have been developed to decipher the pathogenic mechanisms of SVs and reveal their impact on gene expression and phenotypes by integrating multi-modal data such as chromatin interactions, epigenetic modifications, and single-cell transcriptomics. However, there is still a lack of systematic summary of such methods. Therefore, this article systematically reviews methods for predicting the pathogenicity of SVs based on high-throughput sequencing data, including knowledge-driven methods, traditional machine learning methods, deep learning methods, and large model methods. By summarizing the limitations of existing methods, including low sensitivity in predicting rare variants, insufficient functional annotation databases, and limited generalizability of 3D models, this article proposes potential future directions to advance the field through multimodal data fusion, causal inference models, and spatial omics technologies. It aims to provide a theoretical reference for the functional interpretation of genomic structural variations.
  • Yang Xingyu, LIU Yi, HUANG Xumin, KANG Jiawen
    Accepted: 2026-02-11
    Envisioning the sixth-generation (6G) satellite-terrestrial integrated network, Low Earth Orbit (LEO) satellite Mobile Edge Computing (MEC) is key to achieving seamless global coverage. However, existing studies struggle to effectively address the strong coupling among computation offloading, multi-hop routing, and resource allocation variables, as well as the high-dimensional non-convex optimization challenges caused by highly dynamic topologies and limited onboard resources. To address this, we establish a three-layer collaboration architecture comprising MEO, LEO, and ground users, and propose a Hierarchical Soft Actor-Critic (H-SAC) hybrid optimization framework to minimize the weighted sum of system latency and energy consumption. To reduce the complexity of solving the hybrid non-convex problem, H-SAC adopts a hierarchical decoupling strategy, the upper layer utilizes the maximum entropy mechanism of the SAC agent to fully explore the discrete offloading space, effectively avoiding local optima; the lower layer embeds efficient traditional algorithms to solve the sub-problems of continuous resource allocation and routing planning under the given offloading policy. Additionally, a dynamic weight adjustment mechanism is introduced to adaptively balance latency and energy objectives based on real-time service states. Simulation experiments demonstrate that H-SAC significantly outperforms H-TD3 and H-DDPG in key metrics, with final rewards improving by approx. 7.2% and 10%, respectively. Ablation studies verified the necessity of ISL support and flexible offloading, contributing approx. 18% and 15% performance gains. Furthermore, H-SAC reduces inference latency by approx. 73% compared to T-DRL. Overall, the framework achieves efficient and robust resource scheduling in dynamic satellite edge computing scenarios.
  • YU Chuangyu, HUANG Zhiqiang, XUN Chao, SHEN Yu, LIU Lin, Chen Yantao, XU Yanyan, PAN Shaoming
    Accepted: 2026-02-11
    Power data prediction is the foundation for situational awareness and dispatch decision-making in power systems. However, existing power prediction methods still face significant challenges in multi-scale temporal feature modeling and effective integration of unstructured domain knowledge, which limit the prediction accuracy and generalization capability of models in complex power system scenarios. To address these issues, this paper proposes an intelligent power data prediction method named LLM-KGAP (LLM enhanced Knowledge Graph Augmented Power prediction), which integrates large language models with knowledge graphs to construct a data-knowledge dual-driven collaborative prediction framework. First, a large language model is employed to automatically extract key entities and causal relationships from power-related documents to construct a heterogeneous knowledge graph. Second, a knowledge mapping mechanism based on semantic confidence is designed to transform multi-path semantic relationships in the knowledge graph into a weighted prior adjacency matrix, providing knowledge-guided structural prior information for the prediction model. Finally, an Adaptive Spatio-Temporal Information Extraction Network with Mixed Adjacency Matrix (ASIEN-MAM) is proposed. This network employs a progressive segmentation strategy to achieve multi-scale temporal window partitioning and designs a Sparse Attention-xLSTM (SA-xLSTM) module to filter key temporal segments and extract multi-scale features in the temporal dimension, while integrating prior knowledge with data-driven mixed adjacency matrices to accurately characterize complex spatio-temporal dependencies in power systems. Experimental results demonstrate that the proposed method significantly outperforms comparative methods on both public photovoltaic datasets and regional load datasets, reducing the mean absolute error by 11.9%–44.3% and the mean absolute percentage error by 7.0%–27.3%.
  • HE Yuhong, WANG Wei, ZHAI Pengling, HU Jiayi, LI Yueqi
    Accepted: 2026-02-11
    In single image super-resolution (SISR) tasks, although Transformer can effectively capture global dependencies through self attention (SA) mechanism, it has problems such as high complexity, information redundancy, and large parameter quantity, which limit its applicability on low-power devices. To address the above issues, a lightweight feature aggregation Transformer (FATNet) model is proposed. This model synergistically aggregates dual-dimensional features through the continuous application of spatial and channel self-attention; At the same time, introduces sparsification strategies to adaptively screening critical information across spatial and channel dimensions, thereby optimising SA computational efficiency; Before calculating the attention matrix, deep convolution is used to enhance local context modeling, and channel separation and depthwise separable convolution are used to design a lightweight feedforward network (SFFN) that reduces the number of parameters while preserving nonlinear expressions. The experimental results on five commonly used datasets show that compared to representative lightweight SISR models such as SMFANet and CATANet, FATNet better balances model parameters and reconstruction performance. Compared to the MAN-light model, FATNet reduced the number of parameters by 48% and 47% on datasets with magnification factors of × 2 and × 3, respectively, and achieved better reconstruction results. Compared with the latest lightweight super-resolution model (CATANet), FATNet achieves a maximum improvement of 0.15dB in peak signal-to-noise ratio (PSNR) and 0.0029 in structural similarity (SSIM) with reduced parameter count, demonstrating better reconstruction performance.
  • Anjie Luo, Xiaoning Wu, Rui Han, Haiting Hou, Ke Qiu, Chi Harold Liu, Jing Chen
    Accepted: 2026-02-11
    With the widespread adoption of edge intelligence in industrial inspection, mobile devices, and smart security applications, deep learning models are increasingly shifting from cloud-centric to edge-side deployment. However, when cloud-pretrained models are fine-tuned on edge devices, limited computational and memory resources often lead to degraded training efficiency. To maintain high model accuracy and training performance, it becomes essential to jointly optimize both model hyperparameters and resource-related parameters. Existing methods predominantly focus on hyperparameter optimization in server-side environments and insufficiently account for the heterogeneity of edge devices and the dynamic variability of edge resources, which may result in substantial drops in accuracy and training efficiency during edge-side fine-tuning. To address these challenges, this paper proposes a Heterogeneity-Aware Hyperparameter Optimization method (H2PO). Leveraging Infrastructure as Code (IaC), H2PO provides a unified interface for managing heterogeneous devices and dynamically changing resources. Additionally, a lightweight prediction model is integrated to guide online hyperparameter adjustment during training, enabling hyperparameters to adapt in real time to fluctuations in resource availability. Experiments on heterogeneous devices show that H2PO can effectively improve model accuracy by 2.5% and resource utilization by 4.2% under resource-constrained conditions. Compared with existing methods, it reduces training time overhead by up to 71.3% and is applicable to different deep learning models.
  • Jiao Mengru, Liu Yaoyang, Liu Bosheng, Wu Jigang
    Accepted: 2026-02-11
    Block floating point (BFP), with its distinctive data representation, has been extensively applied in convolution calculations for convolutional neural networks. In particular, frequency-domain convolution transforms spatial-domain convolution into complex multiplications in the frequency domain, significantly reducing computational complexity and enabling efficient neural network deployment. However, existing studies mainly focus on BFP-based convolution acceleration in the spatial domain or fixed-point acceleration in the frequency domain, leaving the potential of combining the BFP numerical format with frequency-domain convolution underexplored in terms of inference latency reduction and resource efficiency optimization. In this work, we propose a BFP-based frequency-domain processing unit that exploits the structural characteristics of digital signal processing blocks in field-programmable gate arrays. By leveraging the exponent-sharing mechanism of the BFP format, the proposed design enables packed execution of multiple complex multiplications, thereby improving overall computational performance. Furthermore, we introduce a dataflow mapping method tailored for BFP-based frequency-domain convolution, which maximizes the reuse of both exponent and mantissa components of BFP data during frequency-domain processing. We conduct a systematic evaluation of the proposed frequency-domain BFP acceleration design on representative convolutional neural network benchmarks. Experimental results demonstrate that, the proposed approach achieves up to 5.4× speedup in inference latency and 8.5× gain in resource efficiency, compared with state-of-the-art BFP-based spatial-domain convolution acceleration baselines.
  • YANG Gang, CUI Yunhe, CHEN Yi, GUO Chun
    Accepted: 2026-02-11
    Probing attacks in Software-Defined Networking (SDN) are stealthy attacks that probe switches for sensitive configurations and states. Their low rate, small volume, and high stealth make detection difficult. Existing detection methods show limited recognition ability. Graph Neural Network (GNN) methods designed for overt attacks such as controller saturation or flow-table overflow often rely on dense topological interactions or strong signal features. These methods fail to capture sparse correlations among probing flows and implicit host-level structures, which restricts their performance. This paper proposes HSENet, a detection method for SDN probing attacks. The method first designs a Heterogeneous Semantic Hypergraph generation algorithm called HSHG. The algorithm encodes micro-level communication semantics and macro-level host behavior semantics of network flows. The method then builds a Heterogeneous-Edge Convolutional GCN called HEC-GCN. The network performs adaptive convolution and fusion over different semantic relations and produces more discriminative node embeddings. Experiments on two network flow datasets show that HSENet significantly outperforms multiple GNN and traditional machine-learning baselines on Accuracy, Weighted-F1, and Macro-F1. Compared with the best baseline, Accuracy increases by 2.65% and 3.34%, Weighted-F1 by 2.64% and 2.48%, and Macro-F1 by 2.91% and 11.37%. These results indicate that the method strengthens the identification of low-rate, small-volume, and highly covert sniffing flows and provides a practical and efficient solution for early threat discovery in SDN.
  • You Lujie, Li yanghui, Chen jichang, Xiao tianhang, Wang chenyi, Chen si, Tong mingbo
    Accepted: 2026-02-11
    In the context of predicting the water-entry trajectory of structures, the challenges of high computational cost and slow response time are prevalent. This paper presents an innovative rapid solution method based on engineering algorithms, designed to improve the trade-off between computational efficiency and accuracy. Unlike traditional Computational Fluid Dynamics (CFD) approaches, the proposed method leverages simplified water-entry theory, empirical formulas, and semi-empirical hydrodynamic models, thereby developing a rapid prediction framework that obviates the need for large-scale fluid equation solvers. By employing a purely elastic contact model and engineering-optimized computational strategies, the method significantly enhances computational efficiency while maintaining a computational error within 25% of experimental data. This results in a substantial reduction in computation time from the hours required by conventional CFD methods to mere seconds. The test results demonstrate that, without relying on extensive databases or complex tabular data, the proposed method exhibits a distinct advantage in error control when compared to machine learning and surrogate model methods, providing an efficient and reliable solution for the rapid prediction of the water-entry trajectory and attitude of deployed objects. Additionally, this paper systematically analyzes the impact of key factors, including initial velocity, seawater flow speed, and initial attitude, on the water-entry trajectory and landing point. It elucidates the influence of initial velocity and seawater flow speed on the landing point position, with particular emphasis on the increased sensitivity of the landing point to seawater flow speed under high flow-speed conditions, thereby highlighting the risks associated with deployment operations under specific flow-speed scenarios.
  • Hao Yaohui, Cai Jintian, Cui Xinyue, Lu Xianling
    Accepted: 2026-02-04
    Aiming at the problems that, in the prediction and analysis of public-opinion information spreading based on mean-field epidemic models, it is difficult to iteratively correct parameters within the model itself, which can lead to prediction bias, and that the LSTB model shows poor long-term prediction accuracy for public-opinion information propagation, the SEI⊃3;R-BiLSTM model integrating communication dynamics and deep learning technology was proposed. Firstly, the SEIR model was improved by classifying user states during the dissemination of online public opinion information into six categories: S (Information Unaware), E (Information Hesitant), I₁ (Positive Communicator), I₂ (Negative Communicator), I₃ (Neutral Communicator), and R (Information Immune), with clear definitions of the transition relationships between these states. Secondly, to enhance the model’s accuracy, the attention mechanism and residual connection were introduced by combining the BiLSTM neural network model, enabling the prediction of changes in the number of public opinion information communicators. Finally, 659,000 posts were collected from Sina Weibo across three high-profile public opinion events, including "the Jiang Ping Mathematics Competition", "Qin Lang Losing Homework", and "Fat Cat Jumping into the River", for experimental validation and analysis. The results showed that the time-series curves of the number of three types of communicators (I₁, I₂, and I₃) predicted by the SEI⊃3;R-BiLSTM model were generally consistent with the actual propagation trends, with high fitting accuracy. Furthermore, the performance of SEI⊃3;R-BiLSTM model was better than the four models including SEI⊃3;R-LSTM and SEI⊃3;R-ARIMA, based on four evaluation metrics including RMSE (0.162), MAPE (16.6%), Jaccard (0.74), and F1 score (0.72). In addition, the ablation experiment further confirmed the model’s rationality and effectiveness. These findings provide a model reference for predicting the development of online public opinion.
  • LUO Yangxia, YAO Yuanle, LI Xiaoyu, Zhao Jinlong
    Accepted: 2026-02-04
    To address the problem of exponential growth in malware and variants, and the limited capability of traditional detection methods to identify unknown threats, this paper proposes a MobileNetV2_AD detection method combining "multimodal visualization + lightweight" approaches. The main feature is the fusion of multi-source semantic visual information, representing byte entropy, disassembled instruction streams, and API call sequences as RGB three-channel images to achieve "one image integrating three domains." This reveals the complementary discriminative patterns of different semantic modalities in the image space, offering finer-grained feature extraction compared to grayscale images. Secondly, the lightweight backbone with strong scale perception incorporates Atrous Spatial Pyramid Pooling (ASPP) into MobileNetV2, enhancing the model's receptive field and multi-scale feature extraction capabilities. Additionally, a "category-feature" dual decoupled distillation approach is employed, using ResNeXt50 as the teacher model to simultaneously transfer macro classification logic and micro feature distributions. This resolves the "precision-generalization" trade-off issue in lightweight student models, resulting in an 11.7% increase in F1 score on unknown family samples after distillation. Finally, cross-dataset performance validation is conducted on the Kaggle (400 GB) and DataCon (latest attack-defense competition) public benchmarks, achieving accuracy rates of 96.41% and 98.68% respectively for MobileNetV2_AD, which is 6.31% and 4.21% higher than the original MobileNetV2. The inference speed reaches 280 samples per second, meeting the real-time detection requirements of terminal devices. The experimental results demonstrate that the proposed method significantly improves malware detection effectiveness in resource-constrained scenarios, providing an effective technical solution for cybersecurity defense.
  • Huang Jianwen, Chen Xuhang , Cheng Lianglun, Huang Jiajie , Huo Yejing
    Accepted: 2026-02-03
    To address the insufficient perception of geometric pose relationships and the difficulty of unified modeling for multiple tooth types in existing three dimensional dental keypoint detection methods, a quaternion based geometric aware and adaptive expert network (QGAE Net) is proposed. The method introduces a Multi Scale Quaternion based Geometric Positional Encoder (MS QGPE) that combines quaternion representation with geometric shape descriptors to learn local to global geometric structures of point clouds and enhance spatial relationship modeling. A Quaternion Guided Geometric Pose Attention (QG GPA) module is designed to constrain attention weights using quaternion similarity, allowing feature aggregation according to true geometric correlations. Furthermore, a Classification Driven Expert Routing Mechanism (CD ERM) is constructed to achieve unified modeling of heterogeneous tooth types and personalized feature learning through dynamically activated expert subnetworks. Experiments conducted on a clinical dataset containing 19,200 tooth samples demonstrate that the proposed method achieves mean absolute errors of 0.179 mm, 0.233 mm, 0.188 mm, and 0.301 mm for incisors, canines, premolars, and molars, respectively, with corresponding recalls of 85.1%, 87.1%, 91.5%, and 67.5%, and an overall classification accuracy of 97.5%. In addition, experiments on the public Teeth3DS+ and KeypointNet datasets demonstrate consistent performance improvements over existing methods, confirming the model’s strong generalization capability on public benchmarks and cross-category scenarios. Overall, QGAE Net effectively enhances keypoint detection accuracy while maintaining high deployment efficiency and scalability, making it suitable for automatic landmark annotation across diverse dental scenarios.
  • GONG Hongyi, LU Anwen, TANG Yijun, WANG Xiangxue, XU Jun, JIAO Yiping
    Accepted: 2026-02-03
    Integrating pathological images with genomic data through deep learning can significantly improve the accuracy of cancer prognosis prediction. However, in clinical practice, only a subset of patients have complete genomic sequencing results, which limits the comprehensive application of multimodal models. How to fully leverage limited genomic data to enhance the prognostic capability of pathological models is crucial for improving the clinical applicability and generalization ability of multimodal approaches. To this end, this paper proposes VMEF, a pathology enhancement framework based on the Variational Mixture of Experts(VMoE) module, designed to address training scenarios where pathological images are complete but genomic data is partially missing. The framework learns cross-modal mapping relationships between pathology and genomics using samples with complete modalities, generating imputed features for missing samples to improve overall prognostic performance. VMEF comprises three core modules: (1) a multi-source pathology encoding module that fuses global tissue structure with tumor microenvironment prior information, providing a rich pathological foundation for genomic feature generation; (2) a VMoE-based imputation module that models diverse pathology-to-genomics mapping relationships through a dual-expert structure and dynamic routing mechanism, adaptively generating biologically plausible genomic representations; (3) a prior-guided fusion module that leverages prior features to guide mutual calibration between genomic features and pathological representations, effectively alleviating inter-modal heterogeneity. Experiments on three TCGA cancer datasets demonstrate that when only 60% of training samples have genomic sequencing data, the average C-index reaches 0.6149; under complete modality conditions, the average C-index reaches 0.6370, surpassing existing multimodal methods. The experimental results demonstrate the effectiveness and robustness of the VMEF framework for cancer prognosis under modality-missing scenarios, providing strong support for its application in randomly missing data scenarios.
  • Wang Huiyong, Zhou Rumeng, Zhang Yi, Feng Tao, Zhang Xiaoming
    Accepted: 2026-02-03
    In recent years, large language models have demonstrated exceptional performance in natural language processing tasks. However, in domain-specific question answering tasks such as those in the medical field, lightweight large language models lack sufficient support from vertical domain knowledge, resulting in deficiencies in the reliability and accuracy of their generated outputs. To enhance the accuracy of lightweight large language models in medical question answering tasks, this paper proposes a knowledge graph-enhanced medical question answering approach for large language models based on entity recognition and knowledge filtering, named ERKF-MedQA. This approach mainly consists of two components: precise initial entity recognition and knowledge filtering. Entity recognition is implemented using a multi-stage prompting method. First, entity normalization retrieval is performed on the input question. Then, relevance assessment is conducted on the retrieved entities to determine the final valid entities. Knowledge filtering is accomplished using the Multi-Task Semantic Scoring Model (M-TSSM). This model integrates question and path information, scores the initially retrieved knowledge, and filters out the knowledge highly relevant to the question. Finally, the filtered relevant knowledge is integrated into prompts and input into the large language model, which then performs reasoning and generates answers. Experimental results show that the proposed method outperforms all baseline models in terms of BERTScore. Compared with the best-performing baseline model, the proposed method achieves improvements of 0.44%, 0.25%, and 0.34% in Precision, Recall, and F1-Score, respectively.
  • Luo Li, Li Bo, Wu Jiani, Wen Yuan, Dai Lu
    Accepted: 2026-02-02
    Urban underground pipeline defect detection is essential for ensuring the normal operation of underground pipeline systems. Due to the diversity, complex shapes, and varying scales of underground pipeline defects, existing detection methods often suffer from insufficient accuracy, resulting in many false positives and missed detections. This paper proposes an effective underground pipeline defect detection model, MEG-DETR, based on the RT-DETR framework.A Multi-scale Attention-based Intra-scale Feature Interaction (M-AIFI) module is designed, which combines Multi-scale Multi-head Self-attention(M2SA) to establish channel and spatial dependencies within high-level semantic features, enabling the comprehensive capture of fine-grained defect features. A Spatial Prior Multi-scale Feature Pyramid Network(SP-MSFPN) is constructed, introducing Efficient Local Attention (ELA) and adding a shallow feature layer to achieve efficient fusion across different scales, enhancing detection of small defects. Furthermore, a Gated Semantic Enhancement Module(GSEM) is developed, combining a multi-scale convolutional gated linear unit and a GSBottleneck to achieve collaborative enhancement of semantic and structural features, improving representation of complex defect semantics and structural details. Experimental results show that MEG-DETR achieves higher accuracy in underground pipeline defect detection, with an mAP of 83.44%, an improvement of 2.74% over the baseline; Precision and Recall increase by 1.69% and 3.03%, respectively. Compared with mainstream detection models, MEG-DETR demonstrates superior overall performance, verifying its effectiveness in complex defect scenarios.
  • YUN Jian , ZHANG Xueyi
    Accepted: 2026-02-02
    To address challenges in cross-domain collaboration posed by data privacy and compliance constraints, Federated Learning (FL) integrated with blockchain mitigates centralization risks in traditional FL, yet existing solutions face insufficient model update quality assessment and validator trust crises. This paper introduces a decentralized blockchain-based federated learning framework. It features a dynamic closed-loop system that coordinates quality, trust, and equity. It works by:1)Validator Quality Score,quantifies validator performance using multi-round cross-validation and spatiotemporal weighting, converting quality scores into dynamic voting weights to suppress collusion attacks;2)Model Quality Factor,tracks worker nodes' historical contributions via sliding windows and dynamically adjusts update thresholds using validator accuracy to distinguish high-value updates from malicious perturbations; 3)Model Quality-Driven Dynamic Proof-of-Stake,binds node stakes to contribution quality,ensuring high-stake nodes deliver high-quality outputs.The framework is tested on multiple datasets. Its synergistic mechanisms maintain strong performance under malicious attacks in Non-IID environments. Results show a 12.5% average accuracy gain over baselines. Defense effectiveness on CIFAR-10 improves by up to 38%. The system suppresses malicious nodes' stake to only 1%, far below the 13% baseline level. Communication costs remain comparable. This method successfully solves the consistency problem between model quality and validator performance.
  • Liu-Chengke, Guan-Donghai, Yuan-Weiwei
    Accepted: 2026-01-30
    Imbalanced time series classification represents a significant challenge in the field of deep learning, especially when critical information is concentrated in the minority class. Conventional data augmentation techniques, such as undersampling and oversampling, are designed to increase the proportion of minority class samples. However, they often give rise to issues including information loss, elevated overfitting risk, and the introduction of noise. While "Dual Augmentation Joint Label Learning" (JobDA) has been proven effective in alleviating such problems to a certain extent, it still lacks explicit mechanisms tailored to the minority class. To address this issue, this study proposes a novel approach named "Dual Augmentation with Minority Class Label Merging" (DAMLM). Specifically, this method first expands the training set through dual augmentation of samples and labels, and then uses a label mapping mechanism to merge the minority class labels, which effectively increases the proportion of minority class samples compared with JobDA. In detail, the method performs sample augmentation by repeating the original data, thus avoiding noise introduction. Meanwhile, during the training process, it adopts joint labels for the majority class and retains the original labels for the minority class—this forms clearer classification boundaries compared with other methods. On 38 imbalanced datasets from the UCR archive, we conducted experiments with six time-series classification models and compared methods by averaging the results across these models. Compared with seven representative baseline augmentation methods, DAMLM improves the mean F1 score by 1.24–6.27 percentage points and achieves the best performance on G-mean and other metrics.
  • Wang Liang, Deng Song
    Accepted: 2026-01-30
    As a critical infrastructure, the power system is vulnerable to threats such as equipment failures and malicious data tampering, while the scarcity of abnormal samples restricts the performance of traditional detection models. To address the problem of abnormal data imbalance in the power system, this paper proposes a data augmentation method based on the Mixture of Experts Wasserstein Generative Adversarial Network (LT-MoEWGAN). This method innovatively integrates Long Short-Term Memory (LSTM) and Temporal Convolutional Network (TCN) as dual expert modules, and realizes dynamic weight allocation at the feature level through a gating network to construct a multi-scale temporal feature extractor for generating high-quality samples. Simulation experiments based on real power system datasets show that: 1) Based on the Wasserstein distance metric, the distribution difference between the data generated by this method and real samples is the smallest (with medians of 0.043 and 0.135 respectively), and taking WGAN as the baseline, the generation stability is improved by 33%; 2) On classifiers such as XGBoost, LightGBM, Random Forest, Decision Tree, CNN, GAT, and MTGF-Conv the Area Under the Curve (AUC) of the proposed algorithm is improved by 1.5%–2% compared with baseline methods such as SMOTE, ADASYN, Borderline-SMOTE, GAN, WGAN, WGAN-GP, DCGAN, and WM_CVAE. This method effectively enhances anomaly detection performance through high-quality data augmentation, thus providing a reliable data augmentation solution for abnormal detection in power systems, and its innovative architecture has theoretical reference value for time-series data generation tasks.
  • Zhu Guozheng, Peng Wanda, Zhang Shuo, Cheng Xinru, Zhang Liye, Li Pengfei
    Accepted: 2026-01-30
    图像模型的跨域迁移已成为解决视频理解问题的有效范式,但其使用的方法仍有改进空间:全量微调计算开销大且易产生性能波动;多数参数高效迁移学习(PETL)方案采用单一适配器,在长程时序依赖与小样本场景中的时空表征能力易受限,更关键的是,现有方法普遍依赖隐式时序建模而忽视显式运动先验,导致难以充分捕捉复杂运动模式。为此,本文提出结构化适配器框架FDA4Video,基于PETL范式实现图像模型的高效适配:设计解耦式双路径适配器架构,同步捕捉局部动作细节与长程时序关联;提出光流移位协同注意力机制,将显式运动表征深度融合到时序建模过程中以强化跨帧依赖;同步引入可学习时间位置嵌入提供时序坐标基准,通过分阶段残差融合策略保障表征完整性。实验表明,该框架在Kinetics-400、UCF101和HMDB51上分别取得85.6%、98.2%与83.9%的准确率,较基线方法在减少约26%新增参数的前提下平均精度提升1.6%~2.2%,整体性能可媲美先进PETL策略,为图像模型的视频化迁移提供了一条兼顾精度、轻量与效率的技术路径。