Author Login Chief Editor Login Reviewer Login Editor Login Remote Office
Home Browse Just accepted

Just accepted

Please wait a minute...
  • Select all
    |
  • YIN Weiliang, Liu Bing, Luo Shanjun, Huang Liang, Chen Xiaohui
    Accepted: 2026-04-02
    Person Re-identification (Re-ID) is frequently challenged by complex factors such as variations in viewpoint, pose, and occlusion. Existing mainstream deep learning methods primarily rely on the statistical similarity of visual features for matching. While these methods perform well in general scenarios, they often lack high-level semantic understanding and logical reasoning mechanisms. Consequently, they struggle to capture fine-grained differences when distinguishing "hard samples" with similar appearances, leading to accuracy bottlenecks. To address these issues, this paper proposes a two-stage Re-ID method featuring a collaboration between small and large models, designed to integrate the efficiency of specialized small models with the robust discriminative power of general Multimodal Large Language Models (MLLMs). The first stage is a rapid recall phase, where a lightweight deep learning model is combined with the K-reciprocal nearest neighbor algorithm to retrieve candidates. This stage filters a small set of highly relevant candidates from the massive gallery, significantly reducing the data scale for subsequent processing while ensuring a high recall rate. The second stage is a precise refinement phase, where a pre-trained MLLM serves as a discriminator to accurately screen the candidate set by leveraging its powerful multimodal understanding capabilities. This collaborative two-stage approach effectively balances inference speed and recognition accuracy. Experimental results on the Market-1501 and DukeMTMC-reID datasets demonstrate that the proposed method achieves Rank-1 accuracies of 98.5% and 96.5%, respectively. These results represent significant improvements of 2.8% and 6.5% over the CLIP-ReID method, fully validating the effectiveness of the proposed approach.
  • Zhou Zesheng, Li Ping
    Accepted: 2026-04-02
    To address the performance degradation of efficient Transformer models in noisy text classification scenarios, this study proposes a robust and efficient classification method that integrates a dynamic low-rank attention mechanism with a dual-view consistency constraint. The proposed approach adaptively adjusts the attention rank based on the variance of input features, allocating higher ranks to semantically complex samples to enhance representation capacity and lower ranks to simpler samples to maintain near-linear computational complexity, thus achieving a dynamic balance between expressiveness and efficiency. During training, a dual-view consistency mechanism is introduced by constructing clean and perturbed text views and enforcing consistency between their semantic representations, which suppresses noise-induced shifts in the decision boundary and further improves robustness. Extensive experiments on multiple Chinese and English text classification datasets — including sentiment analysis, topic identification, and fine-grained emotion classification — demonstrate that the proposed method outperforms fixed-rank baselines in terms of accuracy and exhibits more stable performance across various noise types and intensities. This study provides a novel solution for achieving efficient and robust text classification in complex noisy environments.
  • MA Handa, OUYANG Tao
    Accepted: 2026-04-02
    To address the limitations of existing relation triplet extraction methods in complex contexts, including insufficient multi-relation semantic representation and difficulty in extracting implicit relations, this paper proposes a dual-channel joint encoding model with attention mechanisms, termed AMJERE (Attention-Mechanism Joint Encoding for Relation Extraction). The model constructs independent yet interactive sentence and relation encoding channels to enhance the completeness and discriminative ability of relation semantic representations. Specifically, AMJERE employs a sentence–relation dual-channel independent encoding architecture to separately represent input sentences and candidate relations, reducing semantic interference. A relationship fusion module based on self-attention is introduced to enhance implicit relation modeling by incorporating sentence contextual information. Furthermore, a cross-channel attention mechanism enables deep semantic interaction between sentence and relation representations, capturing latent dependencies between entities and relations and producing compact joint representations. Finally, multiple linear classifiers are used to perform relation prediction and entity label identification, achieving joint extraction of relation triplets. Experimental results on the NYT and WebNLG datasets demonstrate that AMJERE outperforms several baseline models in terms of precision, recall, and F1 score, achieving F1 values of 93.3% and 93.5%, respectively. Ablation studies and qualitative analyses further verify the effectiveness of the proposed model.
  • Long Haiqing, Li Mao
    Accepted: 2026-04-02
    Interactive image retrieval breaks the traditional single-query-return-results paradigm by reshaping the retrieval process into a multi-turn iterative dialogue, allowing users to dynamically guide and refine their intentions based on preliminary results. Text and sketch, as two intuitive and complementary query modalities, offer significant advantages in scene-level image retrieval by effectively expressing complex visual requirements. However, existing methods often rely on the latest-is-best interaction assumption, and their evaluation metrics typically focus only on whether the target is retrieved in any round, ignoring real-world challenges such as noisy feedback, evolving user intent, and insufficient ranking stability. Moreover, sketches are highly abstract and user-drawn with uncertainty, and existing static retrieval models lack the ability to effectively refine ambiguous or incomplete initial inputs through interaction, limiting their practicality and robustness. To address these issues, this paper proposes an interactive text-and-sketch-based scene-level image retrieval framework named IScene. The framework designs three core modules: dialogue rewriting, similarity optimization selection, and visual extension, constructing a retrieval pipeline that progressively refines semantics, maintains discriminative stability, and enhances visual representation. Additionally, to support interactive research, the first multi-turn dialogue dataset for this task is constructed. Experimental results demonstrate that IScene significantly outperforms existing baseline methods in retrieval accuracy and stability across multiple datasets, providing an effective solution for more natural and robust interactive scene retrieval.
  • HAO Guanyi, SUN Jingchao
    Accepted: 2026-04-01
    In the digital era, the complex interactions between modalities such as text, images, and audio have given rise to multimodal misinformation. Its propagation speed and concealment level far exceed those of traditional unimodal misinformation, posing severe challenges to information security and social governance. However, research in this field is relatively scarce in China, and a comprehensive framework has yet to be established. Therefore, this study systematically reviews the research status and development trajectory of multimodal misinformation detection, providing a comprehensive summary of this field. Based on a clear understanding of the core concepts and task spectrum of multimodal misinformation detection, the study details the characteristics of datasets and evaluation metrics. It also analyzes the applicability and detection performance of different multimodal methods and models, such as SAFE, CAFE, CFFN, SSA-MFND, PSCC-Net, DGM4, CCN, SNIFFER, and KGAlign. The study summarizes three core detection methods: cross-modal consistency, anomaly feature recognition, and external fact-driven approaches. Furthermore, it explores the interpretability and generalization robustness of multimodal misinformation detection. With the rise of large-scale visual-language models (LVLM), their application in multimodal misinformation detection is continuously deepening. This study reviews various application scenarios, advantages, and limitations of LVLMs in this domain. Finally, the paper outlines future research directions in multimodal misinformation detection, aiming to provide insights and inspiration for the further development of this field.
  • Tiejun Wang, Ziyi Lu, Xiaoyan Hu, Mengyang Kang, Wenhao Wang, Kaiyan Wang, Chengjie Xu
    Accepted: 2026-03-30
    Existing methods for inpainting bamboo slip text images struggle with structural-texture confusion, complex degradation, and low text-background contrast, often causing structural damage, instability, and artifacts. This paper proposes AmdmaNet, a multi-granularity feature-guided inpainting network. It separately reconstructs texture and structural features to avoid semantic confusion. A Multi-scale Dynamic-range Map Attention (Mdma) mechanism classifies pixels by degradation level, preventing over/under-inpainting. An Adaptive Mask-aware Pixel-shuffle Downsampling (Ampd) method weights damaged pixels using surrounding information and guides downsampling to prevent mask shift, reducing artifacts, blur, and mosaics. Experiments on a custom dataset show our method outperforms state-of-the-art approaches in both visual quality and metrics, demonstrating superior robustness for complex cases like broken strokes and background noise.
  • LIN Suqing, WU Jingheng, CHEN Qixuan, YAN Ming
    Accepted: 2026-03-30
    The rapid growth of tourism renders personalized POI recommendation essential for user experience. But the recommendation encounters feature extraction obstacles caused by extreme interaction sparsity and semantic fragmentation in short reviews. Traditional probabilistic topic models struggle to capture latent semantic correlations due to their reliance on word co-occurrence statistics. Iterative deep learning based on back-propagation are prone to gradient instability and training inefficiency. This paper proposes DeepTSN, a deep learning recommendation framework integrating semantic-enhanced topic modeling. By introducing the semantic clustering-enhanced topic modeling SynTopic, short-text representation is enhanced via an LLM-constructed topic library. Redundancy is removed and similar topics are merged using BERT-Chinese based clustering. This process extracts latent topic features to compensate for missing data. High-dimensional vectors are constructed through feature integration to capture non-linear interactions. A sampling network is integrated to reconstruct the data distribution via adaptive probability density sampling. By employing a constructive learning mechanism to analytically determine network weights, the proposed method effectively mitigates interference from missing data and resolves convergence challenges, significantly enhancing both recommendation accuracy and training efficiency. Experiments on multi-source datasets demonstrate that DeepTSN outperforms baselines across real-world and public scenarios with varying interaction densities. The model reduces MAE by up to 21.34% and 12.72%, and MSE by 22.89% and 7.32%, respectively. Furthermore, it cuts runtime by approximately 61.69% and peak memory by 72.87%.
  • ZHANG Ke, Li Fei
    Accepted: 2026-03-30
    To address the insufficient representation of original sequence features and the information loss caused by the decomposition strategy of existing "decomposition-ensemble" forecasting models in long-term time series prediction tasks, this paper proposes a High-Dimensional Feature Series Enhancement Network (HDFSENet) incorporating an attention mechanism. The network integrates embedding techniques, the Mixture of Experts Decomposition (MOEDecomp) block, and the Feature Series Enhancement (FSE) block to capture the inherent characteristics of time series while reducing information loss in decomposition strategies. Firstly, the method strengthens the feature information of the original time series through three embedding techniques: value embedding, position embedding, and temporal embedding. Secondly, the enhanced time series is decomposed into trend feature series and seasonal feature series via the MOEDecomp block. Subsequently, an FSE block based on the attention mechanism is constructed to capture the interactions between the decomposed trend and seasonal feature series, thereby improving the representation capability of these features. Afterwards, these interaction features are integrated into the model as key variables to further enhance forecasting accuracy. Finally, the effectiveness of the model is verified on multiple benchmark datasets. Experimental results demonstrate that HDFSENet significantly outperforms several benchmark models in evaluation metrics such as MSE and MAE, indicating that the proposed model provides a reliable approach for more accurate time series forecasting.
  • JU Hongzheng , TANG Jianhang , ZHANG Yang , JING Kebing
    Accepted: 2026-03-30
    In recent years, an increasing number of studies have focused on modeling users’ multiple interests from their behavioral sequences in order to better capture complex user preferences. However, in implicit modeling scenarios where external auxiliary information such as item categories is unavailable, existing multi-interest models often struggle to accurately determine the interest attribution of individual behaviors. As a result, items that are weakly related or even irrelevant to the target interest are easily aggregated into the same interest representation, leading to the introduction of interest-specific noise. To address this issue, we propose a two-stage denoising multi-interest recommendation algorithm, termed DMIRec, which suppresses interest-specific noise at both the item-feature level and the interest-representation level. In the item denoising stage, learnable adaptive filters are employed to filter out irrelevant item features within each interest, yielding denoised behavior sequences for different interests. In the interest denoising stage, a conditional diffusion model is introduced, where items highly related to the current interest serve as guidance signals to iteratively remove noise components from the corresponding interest representations. Furthermore, to enhance the overall denoising effectiveness, we design a target-guided multi-interest loss that explicitly incorporates the recommendation target into the multi-interest learning process. This loss encourages appropriate responsibility assignment among different interests and reduces the influence of interest-specific noise from an optimization perspective. Experiments conducted on three real-world datasets, Book, Beauty, and Retail Rocket, show that, compared with the best Top-50 recommendation results among baseline models, the proposed method achieves improvements of 8.84%, 2.03%, and 2.27% in Recall; 9.78%, 0.95%, and 0.72% in Hit Rate (HR); and 9.07%, 3.87%, and 2.49% in Normalized Discounted Cumulative Gain (NDCG), respectively. These results demonstrate the effectiveness and robustness of the proposed approach.
  • Liang Hao, Bohejun Su, Jinghua Wang, Yong Xu
    Accepted: 2026-03-27
    Model quantization technology effectively reduces model storage and computational overhead by mapping high-precision floating-point data to low-bit discrete spaces. A core focus of model quantization research is how to rationally account for the characteristics of parameter distributions to construct superior mapping schemes. Existing Post-Training Quantization (PTQ) schemes nearly universally assume that the data distribution of non-activation layers follows a symmetric bell-shaped curve, but overlook the fact that small biases introduced by the model’s activation layers and inputs induce distributional asymmetry. Consequently, the resulting quantization mapping is skewed to one side due to this subtle asymmetry, leading to significant approximation loss. This paper investigates quantization schemes for image super-resolution and proposes improvements to the widely recognized two-stage post-training quantization scheme. First, the max-min-based equal partitioning employed in the pre-search for quantization bounds is modified to a sorting-based non-uniform partitioning approach. Second, a bias term is introduced during the pseudo-quantization process, where a portion of the data and its mean are adaptively shifted to mitigate estimation loss caused by data bias. The improved scheme outperforms the original counterpart across almost all performance metrics while retaining the same high compression ratio and acceleration ratio: compared to the original SwinIR-light model, it reduces parameter count by approximately 67.4% and accelerates the super-resolution process by 3.99×.
  • Lin Cao, Zhanqi Zhang, Benkui Zhang∗, Ying Chang, Zhizhe Liu, Kangning Du, Yanan Guo
    Accepted: 2026-03-27
    With the rapid growth of cyber-physical systems, massive time series data are continuously collected by sensors. Timely and accurate anomaly detection in such data is crucial for maintaining system stability and preventing potential risks. Due to the scarcity and imbalance of anomalous samples, time series anomaly detection is often modeled as an unsupervised learning task. In particular, contrastive learning leverages the latent consistency shared by normal samples across different views. By minimizing the representation distance between different augmented views of the same sample, it constructs a more compact and discriminative normal feature space. This significantly enhances the separability between normal and abnormal patterns, making it a highly promising mainstream paradigm in the field. Although contrastive learning–based methods have achieved notable progress, they still struggle to capture complex contextual variations in time series, limiting detection performance. To address this challenge, we propose Dual-Branch Intra- and Inter-Sample Representation Learning for Time Series Anomaly Contrastive Detection (I2CD). The framework explores hierarchical contextual dependencies within samples while leveraging inter-sample information to enhance normal variation patterns, enabling more discriminative representations for abnormal changes. Specifically, we design a multi-expert temporal pyramid module to adaptively capture hierarchical dependencies in multivariate sequences. In addition, we introduce a prototype-guided normal pattern enhancement module that builds inter-sample information interactions using representative prototypes of normal patterns, suppressing anomalous variations and enlarging the representational gap between normal and abnormal samples. Experiments on six real-world benchmark datasets demonstrate the effectiveness and robustness of our approach in time series anomaly detection.
  • Lu Xiaochen, Wang Shenglan, Zhong Yan, Zhang Jingjing, Zhang Lei
    Accepted: 2026-03-27
    In recent years, deep learning has achieved increasing success across various research fields such as computer vision, in which activation functions play an important role in enhancing the nonlinear fitting capability of deep neural networks. However, existing activation functions such as ReLU, SiLU, etc., have revealed more and more issues as research progresses, such as the problems of gradient vanishing/dead and the lack of adaptive regulation capability in the negative region, etc. This paper proposes a new activation function—Adaptive Parametric Softplus-Sigmoid (APSS)—for the salient feature preservation and dropping in common object detection and recognition tasks. It aims to extract and learn the multi-scale collaborative features from complex backgrounds. This activation function is based on the base-gate combination mechanism in biological neuroscience. The base unit ensures the learnability of basic features and gradient stability. The gate unit achieves the suppression of invalid features by dynamically adjusting the response intensity in the negative value region. The combination of two units can promote the network model's balance of retaining or suppressing features. To verify the advantages of this activation function, this paper conducts comparative experiments with several typical object detection and recognition network prototypes on three experimental datasets: SoccerNet, UA-DETRAC, and BEEF24. The research results show that the proposed APSS activation function is significantly superior to the activation functions in the original network models. It has better target feature extraction and fitting capabilities.
  • Anbo Huang, Haicheng Qu
    Accepted: 2026-03-24
    The rapid growth of the open-source ecosystem has accelerated the spread of software vulnerabilities, posing significant threats to information security. Sequence-based deep learning methods struggle to capture the structural characteristics of source code, while existing graph neural network–based approaches struggle to sufficiently integrate topological structures with node features. To address these challenges and overcome the limitations of current deep learning–based techniques, we propose MVGE-Net, a source code vulnerability detection method that integrates multi-view graph representations with edge-type information.In MVGE-Net, source code is first transformed into a graph representation. Then, depending on the semantic richness of the nodes, different pretrained models are utilized to obtain node embeddings. Subsequently, topology graphs, feature graphs, and shared graphs are constructed from multiple perspectives to capture complementary information. Meanwhile, edge-type information is incorporated into node features to enhance representational capability. Finally, a lightweight gating mechanism fuses the extracted features to generate the final vulnerability prediction.Experiments conducted on two benchmark datasets show that our method achieves improvements of 9.14, 9.13, 1.75, and 5.74 percentage points in Accuracy, Precision, Recall, and F1 score, respectively, compared with the baseline method Devign.Both qualitative and quantitative analyses confirm the effectiveness of the proposed approach. Overall, MVGE-Net successfully addresses the limitations of existing GNN-based methods and provides a more robust and efficient solution for vulnerability detection.
  • HuangTianyi, ZhangCong, LiuShiyi, ZuoJiayi, WangZheng
    Accepted: 2026-03-24
    Fine-grained image-text matching technology achieves high-quality image-text matching by aligning visual semantic fragments such as regions in images and words in sentences. Although existing studies have made significant progress at the region-word alignment level, in the text-word aggregation link, there still exists the problem that the aggregation strategy is difficult to adapt to the text length and the semantic distribution of words, which will lead to the loss of semantic information and ultimately reduce the overall matching accuracy. To solve this problem, this study proposes a Lightweight Dynamic Aggregator (LDA). The LDA consists of a micro neural network and a Softmax function. It dynamically generates the weights for summation and mean aggregation by analyzing the text length and the semantic distribution of words. The LDA network first projects the input text features into a high-dimensional space and performs nonlinear transformation to capture complex interactions, and then maps them back to a low-dimensional space to compress the features. To prevent the loss of feature information during the transformation process, the network uses residual connections to enhance the information flow, and finally normalizes through the Softmax function to stabilize the weights. The experimental results show that the proposed method outperforms the existing advanced algorithms on public datasets. On the Flickr30K dataset, the proposed method achieves the best overall score and top performance on all metrics in the text-to-image retrieval direction, with a 2.1% improvement on R@1. On the 1K and 5K test sets of the MS-COCO dataset, the retrieval total score was the best result, and in all metrics of the two directions, it demonstrated comparable or superior performance, while only introducing negligible additional computational overhead. This work not only verifies the significance of the joint optimization of text length and semantic distribution in the aggregation stage, but also provides an efficient and robust new aggregation idea for fine-grained image-text matching.
  • Zhong Junjian, Chen Weigang
    Accepted: 2026-03-24
    Event cameras recorded brightness changes in the scene in the form of asynchronous event streams, featuring low latency and high dynamic range. However, since they only perceived brightness change rather than visual aspects in the scene, the lack of static texture information can negatively affect the performance of object detection systems using event streams as input. To address this issue, this paper aimed to exploit features extracted by image reconstruction networks as assistance to enhance the accuracy of event-based object de-tection. A sparsity-driven channel attention module was pro-posed to preliminarily filter and enhance the features that extracted by an image reconstruction network. A cross-modal fusion mechanism was constructed, in which event features were the primary modality and reconstructed image features serve as modulation signals. Spatially adaptive normalization parameters were employed to achieve effective fusion of the two modalities. Experimental results demonstrate that the proposed method outperforms existing event-based object detection approaches on the Gen1 and 1 Mpx datasets, achieving mAP improvements of 1.3% and 0.6%, respectively. By introducing reconstructed image features and combining them with event features using a sparsity-driven channel attention mechanism, this paper achieved efficient cross-modal feature fusion and enhances the performance of event camera-based object detection systems. The proposed method provided an effective way for high-precision perception of event-based vision in complex scenarios, and had practical application value.
  • YU Yang, QU Haicheng and LIU Lamei
    Accepted: 2026-03-20
    To address the challenges of label scarcity and fine-grained feature alignment in rolling bearing fault diagnosis under variable speed conditions, this paper proposes a Category-Aware Contrastive Learning (CACL) method driven by coupled time-frequency attention for unsupervised cross-domain diagnosis. First, for feature extraction, a coupled time-frequency attention module is constructed to simultaneously extract discriminative features from both time and frequency domains of fault signals while enhancing sensitivity to long-tail distributions and incipient faults. Second, the extracted deep discriminative features are fed into a graph convolutional network with multiple receptive fields, where a graph generation layer constructs adaptive topological relationships among samples, and deep feature modeling and optimization are performed on the constructed sample topology. Finally, to explicitly optimize the structural consistency and categorical discriminability of the graph feature space, a cross-domain category-aware contrastive learning mechanism is designed. By constructing positive contrastive relationships among cross-domain intra-class samples and negative contrastive relationships among inter-class samples, fine-grained alignment of feature distributions and semantically consistent cross-domain transfer are achieved for samples of the same category from source and target domains. The proposed method achieves average accuracies of 90.67% and 93.67% on the public CWRU and JNU datasets, respectively, representing improvements of 4.68 and 1.69 percentage points over the second-best comparative methods, thereby validating its effectiveness for unsupervised fault diagnosis across multiple variable speed cross-domain transfer tasks.
  • ZHAO Wangpeng, CHEN Tao, LI Wei, NAN Longmei, DU Yiran
    Accepted: 2026-03-19
    Polynomial multiplication accounts for more than 80% of the computational time in lattice-based cryptographic operations. Polynomial multiplication based on the Number Theoretic Transform (NTT) can reduce the computational complexity of polynomial multiplication from to . However, compared with other implementation methods, polynomial multiplication based on the NTT algorithm is more complex in data scheduling and more difficult in memory mapping. At present, memory mapping schemes tailored for specific algorithms are limited by algorithm parameters and hardware characteristics, resulting in poor scalability. Memory mapping schemes for reconfigurable polynomial multiplication incur significant overhead in control and storage units, leading to low area efficiency of polynomial multiplication architectures. To address the above issues, this paper proposes a conflict-free memory mapping scheme based on partial constant geometry transformation, which can support lattice-based cryptographic polynomial multiplication operations that meet the condition . A conflict-free data scheduling scheme is proposed to avoid write-write conflicts during the mode transition of polynomial multiplication and data conflicts in the polynomial point multiplication stage. In addition, to avoid read-write conflicts in memory units during data scheduling, a multi-Bank storage scheme with cyclic shift storage is proposed, which can reduce the control complexity and cut down the storage capacity by 37.5% compared with the classic ping-pong storage method. To further demonstrate the superiority of performance, the polynomial multiplication architecture based on the conflict-free memory mapping scheme was experimentally verified on the FPGA xc7v2000tflg1925. Compared with the relevant literature, the conflict-free memory mapping scheme proposed in this paper exhibits higher area efficiency.
  • Wu Wenxin, Xu Guotian, Zhu Guangrui
    Accepted: 2026-03-18
    While novel mainstream domestic V2Ray-type encrypted proxy protocols protect user privacy, covert channels are provided for cybercrime. Accurate identification of such traffic has become a new research hotspot in cyberspace governance. To evade regulation, these protocols often employ traffic variant techniques, making them more camouflaged and difficult for existing methods to detect effectively. To address this issue, an encrypted proxy traffic detection model is proposed, AG-CTNet, based on dynamic fusion of multimodal features, to identify V2Ray-type encrypted proxy traffic employing various camouflage strategies. To address the scarcity of existing public datasets, an encrypted proxy traffic sample library is constructed through independent data collection and introduce data augmentation strategies to improve model robustness. For the traffic variant camouflage problem, a parallel fusion architecture of 2D-CNN and Transformer is adopted, innovatively introducing cross-modal attention and dynamic gating mechanisms to achieve adaptive fusion of multimodal features. Experimental results show that the model in this paper achieves an accuracy of 98.62% and a precision of 98.41% for identifying V2Ray-type encrypted proxy traffic, effectively improving the accuracy of traffic identification.
  • Chen Qiongbin, He Yulin, Cui Laizhong, Huang Zhexue
    Accepted: 2026-03-18
    Time series mining plays a pivotal role in domains such as renewable energy, meteorology, and finance, with growing interest in the analysis of multivariate multi-step time series. Existing deep neural network-based approaches for multivariate multi-step time series forecasting often suffer from complex model architectures and large-scale parameterization. These characteristics lead to substantial computational demands and high training costs. Moreover, most current prediction models focus predominantly on the time domain, processing either channel-independent or channel-mixer information, which limits their ability to simultaneously capture both correlated and independent channel features. This restriction can lead to reduced prediction accuracy, particularly when training data is scarce. To overcome these limitations, we propose a lightweight dual-channel time-frequency cross-attention network for multivariate multi-step time series forecasting. The network extracts both independent and mixed channel representations in the frequency domain and integrates them with the original time-domain signals via an attention-based fusion mechanism. This design enables the model to jointly leverage time-domain and frequency-domain information, thereby capturing global spatiotemporal dependencies more comprehensively. We evaluate the proposed method against eight state-of-the-art time series forecasting models on eight publicly available datasets. Experimental results show that, for example, on the representative ECL dataset, our model achieves improvements over Autoformer (NeurIPS 2022) of 17.55%, 12.87%, and 14.72% in MSE, MAE, and SMAPE, respectively. Furthermore, compared with Crossformer (ICLR 2023), our approach reduces number of parameters by 30.82%, and achieves a 66.07% reduction in training time relative to Pyraformer (ICLR 2021). These results demonstrate that the proposed network is an effective and efficient solution for multivariate multi-step time series forecasting.
  • LU Anwen, ZENG Tianhao, JIAO Yiping, LIU Mingxin, GONG Hongyi, CHEN Jun, XU Jun
    Accepted: 2026-03-18
    Primary liver cancer is a highly prevalent digestive system malignancy worldwide, predominantly comprising intrahepatic cholangiocarcinoma (ICC) and hepatocellular carcinoma (HCC). Clinical practice demonstrates that precise histological subtyping and clinical staging of these subtypes are critical for guiding personalized treatment strategies and prognosis evaluation. However, effectively exploiting cross-scale features for multi-task pathological analysis remains challenging due to the high heterogeneity of liver cancer and the complex coexistence of macroscopic tissue structures and microscopic nuclei in whole slide images (WSIs). To address this problem, this study proposes a weakly supervised Dual-Branch Multi-Source Feature Fusion (DBMSF) model. This model integrates multi-scale deep features extracted by the CHIEF foundation model and handcrafted features derived from HoVer-NeXt nuclei segmentation. Specifically, the deep branch employs a multi-scale alignment module for feature interaction, while the handcrafted branch utilizes a graph convolutional network (GCN) to dynamically aggregate nuclei information, capturing a comprehensive representation of the tumor microenvironment. Finally, a multi-source fusion module dynamically integrates these features. Multi-task evaluations on a private ICC cohort and the public TCGA-LIHC cohort demonstrated that DBMSF achieved an AUC of 88.5% and accuracy of 75.6% for ICC subtyping, and an AUC of 82.4% and accuracy of 71.5% for HCC T-stage prediction. These experimental results indicate that DBMSF significantly outperforms state-of-the-art methods, demonstrating robust effectiveness and promising clinical application potential for multi-task pathology analysis.
  • LI Hao, MA Zhenzhe, CHENG Lan, XU Xinying
    Accepted: 2026-03-18
    Pedestrian detection on unloading platforms in waste incineration power plants remains challenging due to complex lighting interference and significant variations in pedestrian scales. Existing pedestrian detection methods exhibit limitations in shallow edge feature extraction, multi-scale feature fusion, and lightweight detection head design. To address these issues, this paper proposes a pedestrian detection model named MS-ADFF, which is based on multi-scale aggregation-diffusion feature fusion. Firstly, an edge feature enhancement module is developed. By reinforcing contour information within shallow features, this module effectively mitigates the adverse impact of image detail blurring under complex lighting conditions. Secondly, a multi-scale aggregation-diffusion feature fusion network is constructed, performing two rounds of feature aggregation and diffusion operations on the P3, P4, and P5 feature levels, which effectively integrates multi-scale semantic features through aggregation and diffusion mechanisms, thereby enhancing the model’s capability to perceive pedestrians targets of different scales. Finally, a lightweight shared detection head constructed using deep convolution and group convolution is proposed, which replaces the traditional dual-branch structure with a shared feature extraction mechanism, effectively suppressing redundant parameters while maintaining detection accuracy. Experimental results show that, with YOLOv11s as the baseline model, the proposed MS-ADFF model achieves a detection accuracy of 92.7% on the self-built WIPPID dataset, with Recall and mAP@0.5 improved by 4.6% and 1.5% respectively compared to the baseline model, while reducing 0.7 GFLOPs in floating-point operations. On the public CityPersons dataset, the MS-ADFF model improves detection precision by 1.9% over the baseline model, with a reduction of 0.7 GFLOPs. These results demonstrate that, under the condition of overall floating-point operations being lower than those of the baseline model, the proposed method effectively enhances pedestrian detection accuracy in unloading platforms of waste incineration power plants, while also exhibiting strong generalization ability and robustness in street-scene pedestrian detection tasks.
  • Wei Wei, Yu Chenchen, Wang Di
    Accepted: 2026-03-17
    Visual Simultaneous Localization and Mapping is a core technology in the field of mobile robotics. Traditional VSLAM methods primarily rely on manually designed features and geometric constraints, facing numerous challenges in complex environments. In recent years, deep learning-based approaches have provided new solutions to address these challenges. This paper reviews the research progress of deep learning-based VSLAM from a problem-driven perspective. Firstly, the basic system framework of VSLAM is introduced, and the main challenges it faces are analyzed. The review focuses on three key issues: for dynamic interference, it analyzes dynamic detection methods based on semantic segmentation and semantic-geometry fusion; for illumination variations, it systematically reviews robust frontend designs based on image enhancement, exposure control, and learned feature extraction; for lightweight and real-time deployment requirements, it discusses the application of network model compression and hardware acceleration techniques on edge devices. It also briefly discusses representative solutions for challenges such as texture deficiency, fast motion, scale uncertainty, large-scale environments, and long-term operation. This paper starts from the key issues that restrict the performance of VSLAM in practical applications, constructs a problem-driven analysis framework, and reveals the differences in the applicability of different technical routes in complex scenarios. Finally, it summarizes common evaluation metrics and public datasets, and provides a conclusion with outlooks on future research directions.
  • LI Pu-cong, JIANG Rui , WANG Si-zhe , YAN Wen-jun
    Accepted: 2026-03-17
    Click-Through Rate (CTR) prediction is a core task in recommender systems and online advertising, and its performance highly depends on effective feature interaction modeling. Existing methods suffer from several limitations when modeling higher-order feature interactions, including the neglect of domain-level semantic information, the introduction of redundant noise by higher-order interactions, and excessive sharing of input feature representations, which jointly restrict further performance improvement. To address these issues, this paper proposes a CTR prediction model that integrates gated field-aware interactions with soft feature selection. Specifically, a soft feature selection layer is first employed to adaptively reweight embedded features through continuous learnable weights, enabling better adaptation to different interaction networks. Then, a field-aware interaction module is introduced to explicitly model higher-order feature interactions at the field level, so as to preserve domain-level semantic information. Meanwhile, an information gating component is incorporated to dynamically filter key interaction features, effectively suppressing redundant noise. Experimental results on four public datasets, including Criteo, Avazu, MovieLens, and Frappe, show that the proposed model achieves consistent improvements in terms of AUC and LogLoss. For example, compared with the best-performing baseline methods on each dataset, the proposed model improves AUC by 0.12% and 0.13% and reduces LogLoss by 0.11% and 0.14% on Criteo and Avazu, respectively, while maintaining comparable model parameter size and training efficiency. These results demonstrate that the proposed model achieves a favorable balance between prediction accuracy and computational efficiency, indicating strong potential for practical applications.
  • ZHANG Yuzhang, TIAN Le, WEI Huali, LIN Yumao, LV Shibin, GUO Maozu
    Accepted: 2026-03-17
    In cloud computing environments, workloads and resource states change continuously over time, which often causes reinforcement-learning-based scheduling policies to suffer from unstable randomness during online execution, leading to increased energy consumption or degraded response time. Conventional Soft Actor–Critic (SAC) mainly relies on temperature tuning during training to control policy randomness, and thus struggles to adapt promptly to non-stationary workloads in real systems. To address this issue, this paper proposes an entropy-supervised Soft Actor–Critic algorithm for online cloud task scheduling, referred to as ESAC. Without altering the original training structure, ESAC introduces a policy entropy supervision mechanism during inference to monitor policy randomness in real time and triggers lightweight entropy feedback fine-tuning when the entropy deviates from a stable range, enabling fast correction with constant computational cost. In addition, sliding-window reward normalization and periodic incremental updates are employed to alleviate numerical instability caused by reward scale drift under dynamic workloads. Experiments based on dynamic workload simulations constructed from the Alibaba Cluster Trace 2018 demonstrate that ESAC consistently outperforms several representative scheduling algorithms under different load intensities and burst scenarios, reducing the average energy consumption per task by about 1.8% and the average response time by up to 3.01%. Compared with the A2C baseline, ESAC achieves improvements of 70.7%, 76.0%, and 76.2% in the composite performance metric under three load scenarios, while maintaining acceptable online scheduling overhead. These results verify the effectiveness of the proposed method in enhancing the stability and adaptability of online scheduling in non-stationary cloud environments.
  • HE Yu-lin, HE Jia-hao, MO Pei-heng, KAN Zheng, CUI Lai-zhong, HUANG Zhe-xue
    Accepted: 2026-03-17
    Big data processing frameworks like Apache Spark have gained significant attention due to their widespread applications in large-scale data analysis. However, it is difficult to balance computing costs and runtime performance by relying solely on a single deployment mode (e.g., on-premises or cloud-based), especially when handling data-intensive tasks. Hybrid cloud deployment combines local resources and public cloud resources to offer a flexible and efficient solution that is able to balance the cost and performance. The job scheduling in hybrid cloud environments faces numerous challenges, including optimizing resource utilization and job execution costs. Existing scheduling algorithms often fail to fully account for the directed acyclic graph (DAG) structure of Spark jobs and the characteristics of multi-stage scheduling. This leads to prolonged job execution times in scenarios with parallel jobs and inability to reduce costs in an effective way. To address these issues, this paper proposes an innovative cost-aware particle swarm optimization (CA-PSO) scheduling algorithm for Spark jobs. By incorporating a cost model, the algorithm includes the rental costs of virtual machine (VM) instances in its optimization objectives and dynamically adjusts resource allocation strategies to minimize resource usage while meeting performance requirements, thereby reducing cluster operational costs. Additionally, the scheduling algorithm leverages the DAG dependency structure of Spark jobs and introduces a multi-Spark job, multi-stage scheduling mechanism to optimize resource allocation strategies and stage execution order. This approach not only effectively reduces cluster costs but also significantly improves the overall performance of multi-job scheduling in a hybrid cloud environment. Simulation and real-cluster experimental results demonstrate that, compared to existing scheduling algorithms, the CA-PSO Spark job scheduling algorithm exhibits excellent scalability, adapts to different VM pricing models and various Spark job types, and can reduce the usage cost of hybrid clusters.
  • Tian Feng, Li Xiang , Liu Fang, Zhang Yan, Xie Hongtao, Han Yuxiang, Fang Chao
    Accepted: 2026-03-17
    The rapid development of deepfake technology in recent years brings new opportunities in fields such as entertainment and education, but also causes serious cybersecurity and privacy issues. Current deepfake video detection methods face two main challenges. First, encoding artifacts and noise in low-quality and highly compressed videos can hide subtle forgery traces. Second, existing approaches have difficulty modeling temporal inconsistencies between video frames and lack deep fusion of spatiotemporal features. To solve these problems, this paper proposes a detection model called MSST based on multi-scale spatiotemporal feature fusion. The method builds a complete framework with multi-scale spatial feature extraction, frequency-domain feature enhancement, and multi-scale temporal feature extraction. First, a multi-scale Transformer encoder extracts spatial features at different levels. A learnable frequency-domain filter is used to improve the detection of high-frequency forgery traces. At the same time, a multi-scale temporal Transformer models temporal inconsistencies between frames to capture short- and long-range dynamic anomalies. The model also designs a gated cross-attention module to fuse spatiotemporal features. This module enables dynamic cross-modal interaction and produces more discriminative fused features. Tests on the FF++ (LQ), Celeb-DF, and DFDC datasets show that MSST achieves ACC scores of 92.73%, 96.61%, and 95.15%, and AUC scores of 0.965, 0.981, and 0.976. Compared to current mainstream methods, the proposed approach gives better accuracy and generalization.
  • Duan Yaning, Guo Shuai, Chen Tao, Sun Yongqiang, Zhang Weishan
    Accepted: 2026-03-16
    Digital twin systems for Industrial Internet of Things (IIoT) operating under federated learning face dual challenges: catastrophic forgetting caused by continuously evolving data distributions and model knowledge erosion resulting from intermittent device offline behavior. To address these issues, this paper proposes a Knowledge-Persistent Federated Evolutionary Learning (KPFEL) framework that systematically mitigates knowledge forgetting through a coordinated "Storage-Constraint-Inheritance" mechanism. The framework comprises three core modules: (1) A knowledge persistence storage module that maintains independent storage units for each edge device on the server side, employing a momentum-based update strategy to preserve historical knowledge contributions from offline devices; (2) A knowledge-constrained aggregation module that treats historical gradient update directions as optimization constraints and efficiently computes global update trajectories compatible with historical knowledge via quadratic programming; (3) A generator knowledge inheritance module that synthesizes high-quality historical-class samples for data-free knowledge replay by integrating parameter inheritance, knowledge alignment, and adversarial training. Theoretical analysis proves that the framework achieves an convergence rate. Experiments on CIFAR-100, Tiny-ImageNet, and Stanford Cars datasets demonstrate that the proposed method yields an average improvement of 3.07 percentage points in classification accuracy and a reduction of 3.79 percentage points in forgetting rate over state-of-the-art baselines. Under extreme settings with only 20% client participation, the accuracy drop is limited to 5.21% compared to 15.84% for the baseline, exhibiting strong robustness against intermittent device offline behavior and providing an effective solution for privacy-constrained IIoT digital twin applications with continuously expanding categories.
  • JIA Xiao, LUO Hao, ZHANG Xinyue, YU Jiaheng, ZHU Kai, LI Jing
    Accepted: 2026-03-12
    Sequential recommendation effectively captures the dynamic evolution of user interests. However, systems relying on single-domain data often face challenges like data sparsity and recommendation homogeneity. Cross-domain sequential recommendation was proposed to address these issues by integrating user behavior sequences from multiple domains, which alleviates data sparsity and enables a more comprehensive modeling of user interest dynamics. However, existing methods often employ a uniform global strategy when fusing cross-domain interaction information, neglecting the diversity and complexity of user interests. Moreover, simple graph structures are insufficient to capture complex high-order interaction features between users and items, resulting in incomplete representation of cross-domain interaction information. To address these issues, this paper proposes an interest-enhanced cross-domain sequential recommendation model based on graph and hypergraph fusion. To tackle the problem of insufficient mining of deep-seated user preferences, a capsule network structure was introduced in the private domain. Its dynamic routing mechanism adaptively aggregated contextual information from item embeddings in sequences, extracting multiple potential user interest points to supplement single-domain user profiling. In the shared domain, a hybrid architecture combining Graph Neural Networks and Hypergraph Neural Networks was proposed to overcome the limitations of traditional graph structures in capturing complex group associations and higher-order interactions. This design enabled comprehensive capture of user preference features across different dimensions through multi-level feature interactions, enhancing the representational capacity for cross-domain behavioral dependencies. Subsequently, the user's unique preferences and general preferences were deeply integrated through a sequence relation learning module and a contrastive learning module, generating a complete user preference embedding. Experimental validation on the Hvideo and Amazon datasets showed that compared to the strongest baseline models, the proposed HGIE-CDSR model achieved average improvements of 4.95% and 8.39% in MRR, and 3.58% and 14.37% in NDCG, respectively. Ablation study results further verified the effectiveness of each module within the model.
  • Luo Hao, Yiran Xin, Yunqi Tang
    Accepted: 2026-03-11
    In recent years, generative image technology based on diffusion models has achieved breakthrough progress, with text-to-image models represented by Stable Diffusion, DALL-E, and Midjourney being widely applied in commercial and creative fields. However, highly realistic AI-generated images have also brought challenges to information authenticity, giving rise to social issues such as misinformation dissemination and copyright infringement. To effectively address these challenges, this paper systematically reviews the latest research progress in detection technologies for images generated by diffusion models. First, it outlines the development trajectory of diffusion models from principles and basic frameworks to large-scale applications. Second, it summarizes the evolution of dataset construction, pointing out that dataset development is progressing from using few generators and low resolutions toward multi-model integration and high-quality multi-level filtering. Third, it analyzes three mainstream approaches in detection technology: detection technologies based on implicit features, detection technologies based on explicit features, and detection technologies based on hybrid features. Finally, it analyzes the main challenges facing current detection technologies and provides an outlook on future research directions. This review offers researchers and practitioners a comprehensive technical landscape and reference for development trends.
  • HUO Jiuyuan, KAN Jiayun, YANG Jiguang, ZHENG Shannong, CAO Fang
    Accepted: 2026-03-11
    To address the problem of uneven cluster head load caused by traditional clustering methods in wireless sensor networks (WSNs), this paper proposes a clustering algorithm for WSNs constrained by Vertex-Sum Reducible Edge Coloring (VSRECUC). From a graph-theoretic perspective, the node-to-cluster association and cluster head load are modeled by abstracting the network clustering structure as a multi-star graph. The theory of vertex-sum reducible edge coloring is introduced, where the node association cost is mapped to edge coloring values, and the chromatic sum of each cluster head is used to characterize its communication load, thereby theoretically constraining the load balance among different cluster heads. In the cluster head election stage, residual energy and local node density are jointly considered to construct a candidate cluster head selection function, which is combined with a competition radius mechanism to effectively alleviate the “hotspot problem” caused by cluster head overload near the sink node. In the clustering stage, a node reassignment strategy constrained by vertex-sum reducible edge coloring is proposed. The CRITIC method is employed to determine the weights of competition radius and residual energy, dynamically calculate the cluster head load threshold, and guide nodes to be reasonably reassigned among different cluster heads, ensuring that the load of each cluster head matches its resource capability. Simulation results demonstrate that, in terms of network lifetime, the proposed VSRECUC algorithm extends the lifetime by 369.1%, 59.9%, 116.1%, 57.2%, and 55.7% compared with MH-LEACH, ESPC, EEUC, FSCVG, and BEBMCR, respectively. Moreover, it exhibits significant advantages in performance metrics such as cluster head number control and energy consumption balance. The results indicate that introducing vertex-sum reducible edge coloring theory into WSN clustering design provides a novel modeling perspective and an effective approach for achieving load balancing and network lifetime optimization.
  • Dawei Zhang, Kangbo Kou, Yi Liu, Wei Guo, Yang Yu
    Accepted: 2026-03-11
    High-precision semantic segmentation enables autonomous vehicles to obtain detailed environmental perception. To address the limitations of traditional methods on fisheye images, such as poor edge segmentation, low accuracy, and insufficient training data, we propose RSCAMamba, a model specifically designed for fisheye image segmentation. A zoom augmentation method is employed to transform standard datasets into fisheye datasets, allowing effective modeling of fisheye distortions and ensuring robustness across diverse scenarios. RSCAMamba first adopts a Swin Transformer encoder to capture global feature representations. Second, we propose the restricted spatial-channel attention module. By integrating one-dimensional and two-dimensional restricted deformable convolutions, the module adaptively models distortion-aware nonlinear features and effectively captures anisotropic deformations. Consequently, it provides more accurate representations of strip-like structures and irregular edges. In addition, a channel reduced and edge increased module further enhances edge details, alleviating distortion-induced degradation. Finally, the Mamba module fuses global features, captures long-range dependencies, and reduces redundancy across scales. This helps the model detect complete objects and preserve spatial continuity. Experimental results indicate that, compared with Mask2Former, RSCAMamba achieves a 1.88% improvement in mIoU on the WoodScape public dataset and a 3.30% improvement on the CityScapesFisheye synthetic dataset, demonstrating superior segmentation performance.
  • ZHANG Xin, YI Huawei, ZHAO Mengyuan, WANG Yanfei , LAN Jie
    Accepted: 2026-03-11
    Blind image super-resolution reconstruction aims to restore clear high-resolution images from blurred and degraded images in real-world scenarios. Although deep learning-based reconstruction methods have achieved some progress, the degradation models they rely on still have certain limitations. First, the blurring and noise-adding operations in the degradation process lack adaptability; second, the simulation of the degradation process is insufficient. To address these issues, this paper proposes a hybrid-order adaptive multi-dimensional degradation model. The model employs a hybrid-order degradation approach overall, consisting of two stages. The first stage is the adaptive degradation stage, which utilizes dynamic convolution to perform adaptive blurring and noise addition on high-resolution images. The second stage is the multi-dimensional degradation stage, which further processes the images generated in the first stage through distortion, brightness adjustment, rotation, and down-sampling. The proposed degradation model is integrated with classical super-resolution reconstruction networks to develop a blind image super-resolution reconstruction algorithm based on the hybrid-order adaptive multi-dimensional degradation model. To verify the effectiveness of the proposed method, comparative experiments were conducted on the Set14, BSD100, and DRealSR datasets. The results show that, compared to the PDM-SRGAN baseline method, the proposed method achieves improvements in peak signal-to-noise ratio (PSNR) of 0.84 dB, 0.63 dB, and 1.06 dB on the three datasets, respectively, in 4× super-resolution reconstruction tasks. This demonstrates that the proposed degradation model can effectively enhance the reconstruction performance and real-world adaptability of super-resolution algorithms, enabling the generation of higher-quality images.
  • Zhang Anqing, Zhuang Zhiqi, Li Zijian, Zhang Ting
    Accepted: 2026-03-11
    In recent years, cyberattacks have become increasingly frequent and sophisticated, causing economic losses and security risks for both nations and enterprises. Traditional attack detection methods analyze attack behaviors by constructing source graphs, but this approach loses some semantic information when describing attack behaviors as simple graphs, leading to poor detection performance. This study proposes a network intrusion detection model based on temporal information graph autoencoders, abbreviated as TIGAE. TIGAE generates multiple source graphs through a refined graph construction method, comprehensively recording the interaction behaviors of system entities. Subsequently, an improved linear graph algorithm was devised to transform complex graphs into simpler ones, enhancing the graph structure while preserving the original system behaviour information. A graph autoencoder was then employed to learn benign system behaviour. The experimental results on the three datasets show that the Precision increases by an average of 0.65%, the F1-Score increases by an average of 0.68%, the Recall increases by an average of 1.07%, and the FPR decreases by an average of 0.40%. Experiments demonstrate that TIGAE outperforms existing state-of-the-art methods across multiple attack detection metrics.
  • Li Zongmin, Wang Xingyu , Ma Jinyue, Bai Yun
    Accepted: 2026-03-11
    Addressing the limitations of existing lightweight Vision Transformers (ViTs), specifically the lack of explicit structural and spectral priors during token construction which leads to the loss of local high-frequency details and constrained representation efficiency, this paper proposes a novel framework named OFT-Former (Orientation- and Frequency-Aware Token Interaction Transformer). First, an Orientation-Aware Patch Embedding (OAPE) module is designed to explicitly inject horizontal and vertical spatial structural priors during initialization, thereby mitigating the insufficient geometric perception inherent in traditional embedding methods. Second, a Frequency-Enhanced Token Refinement (FETR) module is proposed, which leverages Fast Fourier Transform (FFT) to decouple frequency-domain features and integrates multi-scale convolutions to specifically enhance the preservation of high-frequency details. Furthermore, a Bidirectional Gated Token Modulation (BGTM) mechanism is constructed to establish bidirectional interaction pathways between local and global features, facilitating adaptive fusion of cross-scale representations via dynamic gating. Experimental results demonstrate that OFT-Former achieves a Top-1 accuracy of 81.4% on ImageNet-1K with only 12.8M parameters and 1.8 GFLOPs. Additionally, the model exhibits superior performance on CIFAR-100 classification and COCO object detection tasks, verifying the effectiveness of the proposed method.
  • Jia Xinyuan, Qin Jiwei , Ma Jie
    Accepted: 2026-03-04
    The dynamic graph anomaly detection method based on graph convolution utilizes graph modeling strategies to capture information about anomalous nodes or edges, and has wide applications in fields such as network security, social networks, and recommendation systems. However, these methods face two main challenges: first, it is difficult to fully learn discriminative knowledge from dynamic graphs where the graph structure and temporal information are coupled, and second, they are ineffective in detecting anomalies in nodes with no attributes. To address these challenges, a novel dynamic graph anomaly detection framework is proposed— the Bidirectional Encoder Representations from Transformers for Graph & Temporal Anomaly Detection (GTBAD). This method first designs a subgraph sampling module based on edges, which centers on target edges and constructs local substructures across multiple time slices, thereby enhancing the contextual awareness of anomaly detection. It then designs an encoding module that comprehensively considers both the graph structure and temporal aspects, aiming to better extract the structural and temporal features of each node in dynamic graphs. Additionally, BERT is employed in the downstream encoder to further extract information from dynamic graphs, enabling the model to effectively capture dynamic graphs of nodes without attributes. Finally, a discriminative anomaly detector is introduced to compute the anomaly scores of edges. Extensive experiments were conducted on four real-world datasets, with the area under the receiver operating characteristic curve (AUC) as the evaluation metric. The experimental results demonstrate that the proposed GTBAD framework outperforms existing frameworks in dynamic graph anomaly detection tasks, achieving higher AUC values, thereby providing a novel solution and approach for dynamic graph anomaly detection.
  • Ding Li , Yang Jun
    Accepted: 2026-03-04
    In order to deal with the core challenges faced by the task offloading decision in the UAV assisted mobile edge computing system, such as multi-dimensional timing coupling, dynamic environment adaptation and insufficient strategy robustness, this paper innovatively proposes a dual delay depth deterministic strategy gradient algorithm (HTAN-TD3) that integrates hierarchical timing attention mechanism. The breakthrough contributions of this study are reflected in three aspects: firstly, a composite optimization objective that integrates total system latency, worst user experience, and multi-user fairness is constructed, which breaks through the limitations of traditional single objective modeling; Secondly, a hierarchical attention network (HTAN) with macro micro dual stream temporal analysis capability was designed. Through the heterogeneous collaboration and attention weighted fusion of LSTM and GRU, accurate perception and deep mining of dynamic features at multiple time scales in the system state were achieved; Furthermore, the Ornstein Uhlenbeck process with temporal correlation is introduced to explore the noise and dynamic adaptive Huber loss function, and the algorithm is systematically enhanced from two dimensions: policy exploration smoothness and training process robustness. In a complex edge scene simulating high load, strong occlusion and multi-user competition, HTAN-TD3 is significantly superior to mainstream baseline algorithms such as DDPG and TD3 and MATOPO in key indicators such as total system delay and user fairness, demonstrating excellent environmental adaptability and decision-making intelligence. This study provides a useful reference and reference for improving the autonomous decision-making ability of intelligent edge computing systems in dynamic and complex environments.
  • Jiang Xiao, Qin Tuanfa, Sun Hongmin, Zhou Huayang, Gu Weiyu, Wang Suhong
    Accepted: 2026-03-04
    In remote and disaster-stricken areas, ground Internet of Things (IoT) devices are constrained by limited computing capabilities and insufficient communication infrastructure, making it difficult to support a large number of emergency tasks with stringent latency requirements within a short time. Existing studies mainly adopt single unmanned aerial vehicle (UAV) or low Earth orbit (LEO) satellite architectures, or treat UAVs merely as communication relay nodes, and their optimization objectives primarily focus on minimizing system latency or a weighted sum of latency and energy consumption, failing to fully exploit the cooperative computing potential of multiple UAVs and multiple LEO satellites as well as to satisfy the heterogeneous quality-of-service (QoS) requirements arising from different task priorities and latency constraints. Therefore, this paper proposes a multi-agent deep reinforcement learning–based task offloading and adaptive resource allocation strategy, termed TOARA. First, a space–air–ground integrated network (SAGIN) architecture with cooperative multiple UAVs and multiple LEO satellites is constructed and integrated with edge computing technologies to effectively alleviate ground resource limitations. In this architecture, UAVs collect ground tasks and make intelligent offloading decisions, dynamically assigning tasks to local edge nodes or LEO satellite nodes for execution. Then, the joint task offloading and resource allocation problem is formulated as a decentralized partially observable Markov decision process and solved using a multi-agent deep deterministic policy gradient (MADDPG) algorithm under a centralized training and decentralized execution framework, enabling agents to autonomously learn efficient offloading decisions and adaptive resource allocation strategies to jointly optimize task processing latency, system energy consumption, and the completion rates of tasks with different priority levels. Finally, simulation results demonstrate that, compared with several baseline strategies, the proposed algorithm reduces the average task processing latency and system energy consumption by at least 26.09% and 27.53%, respectively, while improving the completion rate of high-priority tasks by at least 22.24%, validating its effectiveness in learning efficient task offloading and resource allocation decisions in dynamic and complex environments.
  • Wang Hongyu, Cui Mingzhu, Cheng Li, Luo Weili, Dang Zheng, Shi Hanqi, Ye Hongyuan, Zhao Jintao
    Accepted: 2026-03-03
    Existing methods for detecting small targets in UAV applications suffer from limitations in feature representation and fusion capabilities, struggling to effectively handle complex backgrounds and small-scale objects due to challenges such as low pixel density, significant size variations, and susceptibility to background interference. To address these issues, VD-YOLOv11, an improved algorithm tailored for drone-captured scenes, is proposed. First, a Multi-Scale Feature Enhancement (MSFE) module augments the model’s perception of tiny objects by incorporating multi-scale contextual information and an edge detail reinforcement mechanism. Second, a Multi-Scale Feature Fusion (MSFF) module enhances small object representation through hierarchical integration of semantic and spatial features, improving detection accuracy in complex backgrounds and multi-scale scenarios. Additionally, a Receptive-Field Attention Head (RFAHead) enables dynamic interaction across multi-level features and adaptive allocation of receptive field weights, employing an attention-guided mechanism to refine focus on fine-grained small object regions. Finally, a dedicated small object detection layer is integrated with an optimized neck network, supplemented by an additional detection head to mitigate feature loss and strengthen recognition capability. Experimental results demonstrate that VD-YOLOv11 achieves 42.1% mAP50 on the VisDrone2019 dataset, surpassing the baseline YOLOv11n by 7.4%. On the PDT dataset, it achieves a mAP50 of 94.8% with a computational cost of 19.1 GFLOPs and 3.3M parameters. VD-YOLOv11 achieves an effective balance in detection accuracy, computational complexity, and model size, validating its effectiveness and practicality for UAV-based small object detection.
  • HOU Linchao, XU Yanyan, PAN Shaoming
    Accepted: 2026-03-03
    As a core of intelligent transportation systems, the efficiency and reliability routing algorithms in Vehicular Ad Hoc Networks (VANETs) directly impact critical applications like traffic safety warning, autonomous driving, and intelligent traffic management. In complex traffic, vehicle nodes' interaction makes VANETs' topology change complex and link stability fragile, challenging routing algorithms. Given this, constructing adaptable routing algorithms is crucial for the communication of VANET. To solve this, we propose a novel Neighborhood Potential-based Link-aware Routing (NPLAR) algorithm. The NPLAR innovatively constructs a neighborhood potential energy model, comprehensively quantifying the impact of the static and dynamic features of the neighborhood environment on link stability, offering a more accurate basis for routing decisions. By integrating complex network theory and graph neural networks, it effectively captures the multi-hop neighborhood propagation mechanism of neighborhood potential energy, better predicting network changes and enabling efficient routing path selection. Moreover, NPLAR integrates link stability indices with network link QoS metrics, building a multi-dimensional routing decision framework. This framework achieves adaptive decision optimization in highly dynamic environments, significantly enhancing the routing algorithm's overall performance. Experimental results show that compared with existing VANET routing algorithms, NPLAR increases the average throughput by 8.3%-35.7%. In terms of the packet loss rate, NPLAR reduces it by 6%-50.4%, and the communication delay is reduced by 11.3%-39%. These data clearly demonstrate NPLAR's superiority in enhancing network performance.
  • Gu Yudi, Di Yicheng, Di Lan
    Accepted: 2026-03-03
    Existing click-through rate (CTR) prediction methods typically rely on centralized data storage and modeling. However, due to high user privacy sensitivity and data protection regulations, user behavior data across different platforms cannot be directly shared or aggregated. At the same time, most CTR prediction models adopt deep architectures with large parameter scales, leading to high communication and computation costs that limit their practical application. To address these problems, this paper proposes an efficient Federated Recommendation System (FedRSS) based on a Slim Module and a Salience Awareness Module. FedRSS aggregates cross-platform feature representations within a federated learning framework while preserving privacy. The Slim Module replaces the traditional Hadamard product with an inner product to reduce model complexity and stacks compression layers to decrease parameters, while the Saliency-Aware Module employs a bit-level attention mechanism to dynamically assign feature weights and enhance the modeling of important features. In addition, FedRSS introduces a local differential privacy mechanism to further protect user information. Experiments on three public datasets, Criteo, Avazu, and MovieLens, show that FedRSS achieves notable improvements in both performance and efficiency, with RelaImpr increases of 11.04%, 3.38%, and 4.82%, respectively, and significantly reduced training time. The results demonstrate that FedRSS achieves efficient CTR prediction under privacy constraints and provides a promising direction for developing low-overhead federated recommendation systems.
  • LI Bin, FAN Jiawei
    Accepted: 2026-03-03
    To address the insufficient cross-domain generalization ability of existing ship target detection models and their poor detection stability under extreme noise and complex sea surface conditions in Synthetic Aperture Radar (SAR) imagery, this paper proposes an improved ship target detection algorithm, CK-YOLO, based on YOLOv12. The proposed method aims to enhance the model’s robustness and adaptability in SAR data. First, to improve the extraction of ship boundary features and strengthen contextual modeling capability, an SKC3k2 module is designed. This module enhances boundary feature representation by incorporating a Kolmogorov–Arnold Network (KAN) layer with residual connections into the original C2k2 structure, and introduces a switchable atrous convolution(SAConv) mechanism to adaptively adjust the receptive field for better multi-scale feature extraction. Furthermore, to improve the model’s dynamic modeling capacity and its ability to extract high-level semantic information, a CST module is developed. The CST module consists of a local convolution branch for spatial modeling and a sparse dynamic branch based on a Liquid Neural Network (LNN), which leverages temporal modeling advantages to enhance high-order semantic feature extraction. To validate the effectiveness of the proposed method, experiments were conducted on SAR datasets provided by the China Centre for Resource Satellite Data and Application and the LS-SSDD dataset. The results demonstrate that model A achieves improvements of 0.8% and 1.3% in mAP@50 over YOLOv12n, respectively, thereby exhibiting the best performance among all compared models. In addition, cross-domain generalization experiments using the LS-SSDD and MMShip datasets demonstrate that CK-YOLO achieves the best overall performance among the YOLO series models, showing superior robustness and generalization ability in both intra-domain SAR detection and cross-modal detection tasks. Finally, ablation studies further confirm the effectiveness and contribution of the proposed modules. The CK-YOLO model maintains a lightweight architecture while effectively reducing missed detections and false alarms in SAR images with strong noise and complex sea surface conditions.
  • Zeng Wenyan, Zhang Lei, Liu Bailong, Meng Xiang, Zhang Xuefei
    Accepted: 2026-02-12
    Accurate traffic speed prediction is critical for enhancing the efficiency of Intelligent Transportation Systems (ITS). However, contemporary end-to-end prediction models are often constrained by training data from specific regions or time periods, leading to limited generalization capabilities. Furthermore, most existing methods employ static network structures and parameter-sharing mechanisms, which struggle to capture dynamic traffic characteristics and the inherent diversity across different nodes. To address these two challenges, this paper proposes Adaptive Spatial-Temporal Masking Pre-training for Traffic Speed Prediction (ASTMP), which is divided into a pre-training stage and a prediction stage. In the pre-training stage, a dynamic adaptive graph convolutional layer is designed to provide unique weight and bias parameters for each node. By constructing an adaptive graph based on a node embedding matrix containing individual node attributes, the unique properties of nodes and the dynamic patterns governing their inter-relationships can be deeply explored. Subsequently, a spatial-temporal masking encoding layer is developed to perform random masking on long-term traffic speed sequences. A corresponding decoding layer then utilizes mask tokens to replace data at masked positions, reconstructing the original information based on contextual cues to enhance the model's adaptability and generalization performance. In the prediction stage, the dynamic spatial-temporal representations learned from long-term sequences are integrated with a short-term traffic speed predictor to achieve more precise and efficient forecasting. Experimental results on the METR-LA and PEMS-BAY datasets demonstrate that ASTMP outperforms state-of-the-art baseline methods, validating the feasibility and effectiveness of the proposed approach.
  • WANG Shixin, LI Jun, ZHAO Ning, NIE Jun, LIU Shengqiang
    Accepted: 2026-02-12
    In order to meet the needs of efficient and accurate multi-object tracking (MOT) in campus scenarios, a solution based on the improved YOLOv8 object detection algorithm and the OCSORT multi-object tracking algorithm is proposed. In view of the complex background and crowd distribution of the campus scenarios, a dataset with specific scene features is constructed to optimize the performance of the algorithm. In order to improve the accuracy of pedestrian small object detection, an efficient multi-scale attention (EMA) module is introduced, and the self-calibrated convolutions (SCConv) module is used to replace the cross-stage partial fusion (C2f) module in YOLOv8, which effectively improves the detection effect. In multi-object tracking, an innovative solution is proposed to address the problems of low association accuracy and high computational overhead. Firstly, an ID initialization (IIR) strategy based on person re-identification (ReID) is proposed, which effectively solves the problem of ID inconsistency when pedestrians reappear after leaving for a short time. Secondly, a data association strategy combining shape similarity between frames (SSF) and object box intersection over union (IoU) is designed to further improve the accuracy of object matching between consecutive frames. Finally, in order to improve the efficiency of appearance similarity calculation, a stage-wise data association (SDA) strategy is proposed, which reduces the computational overhead while ensuring high accuracy. Experimental results show that the proposed method effectively improves the accuracy of pedestrian detection and tracking in campus scenarios and exhibits good robustness and a high frame rate in complex backgrounds, providing efficient and reliable technical support for smart campus security and crowd behavior analysis.
  • WANG Kai, YUAN Shaojiang, CHEN Chenglizhao, WANG Shuo, ZHANG Yingchao, ZHANG Huaye, SUI Ruoyu, Wang Xi
    Accepted: 2026-02-12
    Water sample bag impurities refer to tiny foreign objects accidentally entering the bags during industrial production, such as iron filings, hair, and soil particles. Due to their small size, complex background, and significant interference from text labels, traditional detection methods struggle to meet the strict quality control requirements of industrial production. To address this issue, a water sample bag impurity detection method for complex industrial scenarios is proposed, which innovates at both the data and model levels. At the data level, an automated collection and detection device based on dual-view cross-validation is designed; this device uses dual industrial cameras and an electromagnetic control system to realize automatic double-sided detection and intelligent sorting of water sample bags, and based on this device, a dedicated dataset of 3000 images WBID-3K is constructed, covering all types of impurities that may appear in real industrial scenarios. At the model level, based on this dataset, a model named WBID-DETR for cross-domain feature enhancement and hierarchical information fusion is proposed. This model strengthens the high-frequency feature expression of tiny targets through a fine-grained frequency-domain feature optimizer, suppresses text label interference via a multi-scale global feature fusion module, and complements missing information using a complementary feature fusion module, thereby achieving accurate localization and identification of various tiny impurities. Experimental results show that on the self-built WBID-3K dataset, WBID-DETR achieves 4.2% and 3.5% improvements in accuracy and mAP50 respectively compared to the baseline model; on the public VisDrone2019 dataset containing complex backgrounds and dense small targets, WBID-DETR achieves 2.5% and 3.4% improvements in accuracy and mAP50 respectively compared to the baseline model. This fully demonstrates the generalization and robustness of the proposed method for small target detection tasks, providing an effective solution for automated industrial quality inspection.
  • WANG Jun, ZHANG Shengjun, ZUO Zengqiang
    Accepted: 2026-02-12
    Millimeter-wave radar offers distinct advantages for human activity recognition, including robustness in complex environments and inherent privacy preservation. However, existing recognition methods confront several challenges: low accuracy, insufficient data representation, difficulty in modeling temporal dependencies, and high computational costs. To address these issues, this paper proposes a lightweight human activity recognition method based on a novel Time-series Capture and Enhancement Module (TCM-TMEM).The proposed architecture comprises two primary components. First, the Temporal Capture Module (TCM) employs causal convolution to enhance sensitivity to local temporal patterns, while its simplified network design minimizes computational overhead. Second, the Temporal Enhancement Module (TMEM) is constructed using a parameter-efficient Transformer encoder. This module strengthens the network's ability to model long-range, global temporal correlations while preserving the model's lightweight characteristics. Furthermore, to mitigate the representational limitations of traditional range-Doppler maps, an enhanced 11-dimensional feature set is introduced. This set incorporates critical dimensions such as range, Doppler shift, and signal energy, thereby significantly improving the completeness of data representation. Experimental evaluations were conducted on the self-collected PACT dataset and the public R-IHB dataset. The proposed method achieved recognition accuracies of 89.86% and 86.63%, respectively. Importantly, the entire TCM-TMEM model contains only 0.12 million parameters. These results substantiate the effectiveness of the proposed feature construction scheme and model architecture in improving recognition accuracy, effectively capturing temporal dependencies, and substantially reducing computational resource consumption.
  • WANG Hebin, YANG Wenjun, MO Xiuliang
    Accepted: 2026-02-12
    With the wide application of the Internet of Things, a large number of devices access to the network, and its security vulnerabilities are easy to be exploited by attackers, which seriously threatens network and data security. Therefore, it is particularly important to deploy intrusion detection systems in the Internet of things environment to detect and protect abnormal traffic and intrusion behavior. However, IoT devices usually have limited computing power and insufficient storage resources, which makes the existing intrusion detection models based on deep learning difficult to be directly deployed. To solve the above problems, this paper proposes a customized lightweight intrusion detection model named FDRBT, which aims to achieve accurate detection of IoT attacks under resource-constrained conditions. In this paper, Pearson Correlation Coefficient (PCC) and Principal Component Analysis (PCA) are used to fuse feature dimension reduction, and the teacher model based on Transformer structure is gradually replaced by a more concise Poolformer structure by a progressive module replacement method. In order to compensate for the loss of representation ability in the process of knowledge distillation, the Dynamic tanh (DyT) activation function is also introduced to enhance the model, and the traditional normalization layer in Poolformer is replaced by DyT layer. This design enables the model to automatically adjust the activation properties according to the input feature distribution, achieving a normalization layer-like function without calculating the activation statistics. Experimental results on TON-IoT and CIC-BCCC-NRC-2024 datasets show that the FDRBT model achieves 99.91% and 99.96% accuracy respectively. The model also maintains a small size and low computational overhead, which is suitable for resource-constrained IoT intrusion detection scenarios.
  • CHEN Yuang, SHI Lei, TANG Zhiqing
    Accepted: 2026-02-12
    As 6G evolves toward extremely large antenna arrays (ELAA) and high-frequency bands, the near-field region in communication scenarios expands significantly. However, existing research on physical layer security for reconfigurable intelligent surface (RIS)-assisted non-orthogonal multiple access (NOMA) systems is largely confined to far-field communication scenarios. Moreover, the computational complexity is often high, which limits practical application in near-field large-scale systems. For RIS-assisted near-field uplink NOMA systems, this paper considers an uplink system comprising an access point (AP), RIS, a far user, a near user, and an eavesdropper (Eve). By jointly optimizing AP beamforming and RIS phase shifts, the system achieves maximum secrecy sum-rate. This problem is non-convex and challenging due to the Euclidean norm and unit modulus, necessitating an efficient resource allocation strategy. To address this, this paper proposes a low-complexity block coordinate descent (BCD) algorithm that decomposes the original problem into two subproblems. First, a closed-form solution for AP beamforming is derived, and then the manifold optimization is applied to obtain RIS phase shifts. MATLAB simulation results demonstrate that, under the default parameter settings, compared to random phase shift, maximum ratio transmission (MRT), and orthogonal multiple access (OMA) schemes, the proposed scheme enhances the secrecy sum-rate of the system by approximately 4.4bps/Hz, 10%, and 15% respectively. Furthermore, the proposed scheme achieves comparable performance to semi-definite relaxation (SDR) schemes while exhibiting lower computational complexity.
  • BAI Yang, PEI Mengxuan, SHI Fangyuan
    Accepted: 2026-02-12
    Genomic Structural Variations (SVs), which alter the three-dimensional conformation and regulatory networks of the genome through insertions, deletions, inversions, or translocations of large DNA fragments, are key pathogenic variants in various complex diseases. In recent years, breakthroughs in long-read sequencing and 3D genomics have significantly improved the detection capability of SVs. However, due to the complexity of SVs and the scarcity of functional annotations, predicting their pathogenicity remains a major challenge. Several methods have been developed to decipher the pathogenic mechanisms of SVs and reveal their impact on gene expression and phenotypes by integrating multi-modal data such as chromatin interactions, epigenetic modifications, and single-cell transcriptomics. However, there is still a lack of systematic summary of such methods. Therefore, this article systematically reviews methods for predicting the pathogenicity of SVs based on high-throughput sequencing data, including knowledge-driven methods, traditional machine learning methods, deep learning methods, and large model methods. By summarizing the limitations of existing methods, including low sensitivity in predicting rare variants, insufficient functional annotation databases, and limited generalizability of 3D models, this article proposes potential future directions to advance the field through multimodal data fusion, causal inference models, and spatial omics technologies. It aims to provide a theoretical reference for the functional interpretation of genomic structural variations.
  • Yang Xingyu, LIU Yi, HUANG Xumin, KANG Jiawen
    Accepted: 2026-02-11
    Envisioning the sixth-generation (6G) satellite-terrestrial integrated network, Low Earth Orbit (LEO) satellite Mobile Edge Computing (MEC) is key to achieving seamless global coverage. However, existing studies struggle to effectively address the strong coupling among computation offloading, multi-hop routing, and resource allocation variables, as well as the high-dimensional non-convex optimization challenges caused by highly dynamic topologies and limited onboard resources. To address this, we establish a three-layer collaboration architecture comprising MEO, LEO, and ground users, and propose a Hierarchical Soft Actor-Critic (H-SAC) hybrid optimization framework to minimize the weighted sum of system latency and energy consumption. To reduce the complexity of solving the hybrid non-convex problem, H-SAC adopts a hierarchical decoupling strategy, the upper layer utilizes the maximum entropy mechanism of the SAC agent to fully explore the discrete offloading space, effectively avoiding local optima; the lower layer embeds efficient traditional algorithms to solve the sub-problems of continuous resource allocation and routing planning under the given offloading policy. Additionally, a dynamic weight adjustment mechanism is introduced to adaptively balance latency and energy objectives based on real-time service states. Simulation experiments demonstrate that H-SAC significantly outperforms H-TD3 and H-DDPG in key metrics, with final rewards improving by approx. 7.2% and 10%, respectively. Ablation studies verified the necessity of ISL support and flexible offloading, contributing approx. 18% and 15% performance gains. Furthermore, H-SAC reduces inference latency by approx. 73% compared to T-DRL. Overall, the framework achieves efficient and robust resource scheduling in dynamic satellite edge computing scenarios.
  • YU Chuangyu, HUANG Zhiqiang, XUN Chao, SHEN Yu, LIU Lin, Chen Yantao, XU Yanyan, PAN Shaoming
    Accepted: 2026-02-11
    Power data prediction is the foundation for situational awareness and dispatch decision-making in power systems. However, existing power prediction methods still face significant challenges in multi-scale temporal feature modeling and effective integration of unstructured domain knowledge, which limit the prediction accuracy and generalization capability of models in complex power system scenarios. To address these issues, this paper proposes an intelligent power data prediction method named LLM-KGAP (LLM enhanced Knowledge Graph Augmented Power prediction), which integrates large language models with knowledge graphs to construct a data-knowledge dual-driven collaborative prediction framework. First, a large language model is employed to automatically extract key entities and causal relationships from power-related documents to construct a heterogeneous knowledge graph. Second, a knowledge mapping mechanism based on semantic confidence is designed to transform multi-path semantic relationships in the knowledge graph into a weighted prior adjacency matrix, providing knowledge-guided structural prior information for the prediction model. Finally, an Adaptive Spatio-Temporal Information Extraction Network with Mixed Adjacency Matrix (ASIEN-MAM) is proposed. This network employs a progressive segmentation strategy to achieve multi-scale temporal window partitioning and designs a Sparse Attention-xLSTM (SA-xLSTM) module to filter key temporal segments and extract multi-scale features in the temporal dimension, while integrating prior knowledge with data-driven mixed adjacency matrices to accurately characterize complex spatio-temporal dependencies in power systems. Experimental results demonstrate that the proposed method significantly outperforms comparative methods on both public photovoltaic datasets and regional load datasets, reducing the mean absolute error by 11.9%–44.3% and the mean absolute percentage error by 7.0%–27.3%.