Author Login Chief Editor Login Reviewer Login Editor Login Remote Office
Home Browse Just accepted

Just accepted

Please wait a minute...
  • Select all
    |
  • Ji Kang, Wei Songjie, Li Meng
    Accepted: 2026-04-29
    Abstract In the high-risk petrochemical industry, operational environments involve complex factors such as high temperature, high pressure, flammable, explosive, and toxic substances. Even minor deviations in worker behavior can lead to severe accidents, resulting in irreversible casualties and property losses. Traditional supervision methods, which rely heavily on manual inspection, are not only inefficient but also struggle to cover multi-worker and multi-equipment collaborative scenarios, and are highly susceptible to subjective interference. The core challenges in recognizing group collaborative behaviors lie in modeling complex human–object interactions, capturing dynamic features of multiple targets, and bridging the ambiguous mapping between macroscopic group intentions and microscopic individual actions. To address these issues, this paper proposes a graph neural network-based method for group collaborative behavior recognition. By constructing a unified interactive graph structure, entities such as workers and equipment are encoded as nodes, and multimodal perceptual features are integrated to enable end-to-end reasoning of interpersonal and human–object interactions. A hierarchical graph network architecture is further designed to model the correlation and evolution from individual actions to group behaviors, achieving accurate recognition and understanding of multi-target group behaviors in complex operational scenarios. Comparative experimental results show that the proposed method improves the MCA/MPCA metrics by 3.91% and 2.86%, respectively, over the next best method on a self-built dataset. On the public open-source Volleyball dataset, the MCA/MPCA metrics are improved by 0.26% and 0.21% compared to the next best method, fully verifying the method's advancement and robustness.
  • LI Ziyang, ZHENG Jiong, MA Jie, LI Shishen, QIN Jiwei
    Accepted: 2026-04-29
    Multi-interest sequential recommendation models extract multiple user interests through a dynamic routing mechanism to achieve personalized recommendations. However, during the interest extraction phase, improper item–interest routing weight allocation may cause multiple interest representations to become excessively similar, leading to the multi-interest collapse issue. In the prediction phase, neglecting users’ preference intensity across different interests may assign comparable or even greater influence to low-preference interests than to high-preference ones, resulting in the multi-interest preference weight imbalance issue. To address these issues, we propose a multi-interest sequential recommendation model based on disentangled feature representation and adaptive weight fusion (DMIAFRec). First, the model partitions items according to their co-occurrence relationships, grouping frequently co-occurring items with complementary semantic characteristics into the same interest group. This partition serves as a structural guidance mechanism for routing weight allocation, encouraging each group to focus on relatively independent user interests, thereby achieving disentangled multi-interest representations and preventing excessive convergence among interest vectors. Furthermore, a time-decay mechanism and a multi-interest attention fusion strategy are introduced to adaptively assign weights according to users’ preference intensity for each interest. The weighted aggregation of multiple interest representations produces a unified user preference representation that reflects heterogeneous interest importance, thus enhancing personalized recommendation performance. Experimental results on three public datasets show that, compared with the best baseline model, the proposed model achieves average improvements of 6.2%, 4.98%, and 4.07% in R@20, NDCG@20, and HR@20, respectively, on the Retail Rocket, Gowalla, and Books datasets, demonstrating its effectiveness in improving recommendation performance and addressing the aforementioned issues.
  • LIANG Zefeng, QIAO Jie, CAI Ruichu, HAO Zhifeng
    Accepted: 2026-04-29
    Deep neural networks have been widely applied to time-series critical tasks such as medical diagnosis, intelligent sensing, and autonomous driving, yet their security vulnerabilities have gradually emerged. Existing studies show that deep time-series models are also susceptible to adversarial attacks. However, most existing adversarial attack methods for time-series models mainly focus on norm-bounded numerical perturbations and often ignore the inherent causal dependencies and dynamic evolution in the data generation process. As a result, the generated adversarial examples may deviate from feasible system dynamics and lack practicality in real-world scenarios. Therefore, generating effective adversarial examples while adhering to temporal causal dynamics has become an important challenge in time-series adversarial research. To address this challenge, this paper proposes TCADE (Temporal Causal ADversarial Examples), a novel method that explicitly models causal structures in time-series data and performs counterfactual reasoning under causal intervention constraints. By formulating adversarial attacks as feasible interventions on the underlying system, TCADE generates adversarial examples that can effectively mislead model predictions while remaining consistent with the system’s causal relationships and dynamic evolution. Experimental results demonstrate that TCADE achieves significant attack effectiveness in black-box settings, and the generated adversarial sequences conform to the causal generation mechanisms. This work provides a systematic evaluation of the vulnerability of time-series models under realistic and feasible black-box attacks, and offers practical insights for improving model robustness.
  • Jinglin Huang, Maoqiang Wu, Siming Wang, Yue Lai, Rong Yu
    Accepted: 2026-04-28
    Federated learning is a distributed machine learning paradigm that leverages decentralized data resources while ensuring data privacy. However, in real-world scenarios, data across clients are often non-IID (Independent and Identically Distributed), leading to label shift and class imbalance issues, which hinder convergence of global models and degrade generalization performance. To address the impact of such data heterogeneity on model performance, we propose a cross-client data augmentation and classification framework based on diffusion models. In this framework, each client trains an initial diffusion model based on local data and uploads its model parameters to the server. The server aggregates these parameters to construct a global diffusion model, which is then downlinked to all clients. Clients use the global diffusion model to generate supplementary samples, which are uploaded to the server for data augmentation to balance the local class distribution, thereby improving classifier performance. Ultimately, the classification model is trained through federated learning by receiving both local data and generated samples, and is deployed to clients for image classification and recognition. To generate high-quality images, a denoising diffusion probabilistic model is used as the generation backbone, while a ResNet-18 architecture is employed for the federated classification model. Experimental results show that the fine-tuned global diffusion model can generate images that are more consistent with the real data distribution. By augmenting the data through generated samples, the local data distribution on clients becomes more balanced, significantly improving global classification accuracy. Under the non-IID condition with a Dirichlet coefficient α=0.1, the accuracy of CIFAR-10 and CIFAR-100 increased from 46.76% and 21.31% to 54.64% and 25.57%, respectively, demonstrating the effectiveness of the proposed data augmentation strategy in mitigating class imbalance.
  • SUN Yunlei, Xu Ke
    Accepted: 2026-04-28

    This paper proposes a 3D digital rock reconstruction framework based on diffusion prior guidance and multi-scale residual fusion Implicit Neural Representation (INR) to address challenges such as structural discontinuity, topological fracture, and the difficulty in balancing cross-scale microscopic details under sparse 2D slice conditions. The framework introduces the Score Distillation Sampling (SDS) mechanism to transform the geometric and topological priors in pre-trained diffusion models into continuous gradient guidance. It combines the measurement consistency loss for collaborative constraints to achieve the consistent restoration of local fine features and global topological structures under extremely sparse slice constraints. Meanwhile, the framework utilizes multi-scale residual structures to enhance the representation ability of INR for complex pores and improves the generalization performance of the model under different voxel sizes. Experimental results show that this method accurately restores complex pore spaces on various digital rock datasets. The reconstructed structures maintain high consistency with the ground truth in key physical indicators such as porosity distribution and geometric connectivity. In the 256⊃3; scale reconstruction task, the Dice Similarity Coefficient (Ddice) reaches 97.01%, which is 2.6% higher than the baseline INR model. As the reconstruction scale further expands, the Ddice maintains at 95.97% and 92.88% under 512⊃3; and 1024⊃3; high voxel size tasks, respectively, demonstrating excellent stability in large-scale reconstruction. In the cross-sample generalization tests for Berea sandstone and Ketton limestone, the Ddice reaches 93.44% and 95.51%, respectively. This study solves the problem of stable reconstruction for complex porous media in data-limited scenarios and provides a physically reliable and continuous new technical solution for refined geological modeling.

  • Zhu Li, Cui Botao, Zhu Chunqiang, Mi Lugema, Xu Wanru, Wang Jing, Wang Pei
    Accepted: 2026-04-28
    Accurate short-term power load forecasting is crucial for the safe operation and optimal scheduling of power systems. Existing forecasting methods based on decomposition rely on fixed prior knowledge, which leads to rigid decomposition patterns. As a result, they struggle to handle load data with multiple periodicities and strong non-stationarity. Meanwhile, these methods find it difficult to balance computational complexity and prediction accuracy. To address these issues, this paper proposes a forecasting model based on learnable wavelet decomposition and KAN-Mixer, called LWKAN-Mixer. First, a learnable wavelet decomposition module is used to decompose the original load sequence into wavelet components of different frequency bands. Next, the Fast Fourier Transform (FFT) is applied to extract the dominant period of each component. Based on these dominant periods, patches of corresponding sizes are created for each component. Then, a multi-scale time-frequency fusion module is used to model each component independently to capture time-frequency features. The KAN-Mixer and a dual interactive convolution block are used to capture sequence representations and temporal dependencies, respectively. A multi-scale hybrid loss function is also introduced to constrain the quality of decomposition and reconstruction during training. This helps alleviate error accumulation and improve prediction accuracy. Experimental results on three real-world load datasets show that the proposed model reduces MAE by 1.10%–9.37% on the Australia dataset and by 4.97%–17.36% on the Morocco dataset compared to the latest baseline models. On the Cele dataset, the model achieves the second-best MAE. These results demonstrate that LWKAN-Mixer can effectively model complex nonlinearity and non-stationarity in load sequences, achieving strong performance in short-term load forecasting tasks.
  • Cao Tianya, Wang Zhixin, Shi Pengju, Li Kang, Li Shuang
    Accepted: 2026-04-28
    With the continuous advancement of social media and online service platforms, user reviews have increasingly become a critical source of information influencing consumer decisions and product evaluations. Aspect-based sentiment analysis (ABSA), as a key research direction in fine-grained sentiment computing, still faces significant challenges in practical applications—particularly prominent semantic ambiguity in textual content and insufficient extraction of sentiment cues. To address these limitations, this paper proposes CKMA, a novel aspect-based sentiment analysis model that integrates knowledge graph embeddings with a multi-channel attention mechanism.The CKMA model first leverages knowledge graph embedding techniques to map entities and their relationships from external knowledge bases into low-dimensional semantic vectors, which are then fused with textual representations to alleviate semantic ambiguity commonly observed in user reviews. Building upon this knowledge-enhanced representation, we design a parallel multi-channel feature extraction framework comprising three distinct channels: a structured information channel, a context-aware channel, and an aspect-focused channel. Through a staged fusion strategy, this framework enables collaborative modeling of diverse semantic and syntactic signals, thereby enhancing the model’s capacity to capture aspect-relevant sentiment features. To mitigate the loss of original semantic information during deep feature learning, we further introduce a joint fusion mechanism that combines the knowledge-enhanced word-level representations with the outputs of the multi-channel attention modules, thereby improving the completeness and robustness of the final feature representation.Extensive experiments conducted on four widely used benchmark datasets—Restaurant14, Restaurant16, Laptop14, and Twitter—demonstrate that the proposed method achieves superior performance in terms of both accuracy and Macro-F1 scores. Notably, CKMA exhibits more pronounced advantages on datasets characterized by complex syntactic structures, validating the effectiveness of our synergistic modeling strategy that jointly exploits structural and semantic information for aspect-based sentiment analysis.
  • jinhao hu , dongfen li, jinbo wang , Jinshan lai
    Accepted: 2026-04-22
    Federated learning (FL) features two core advantages: keeping raw data local and enabling collaborative training across participants, which safeguards data privacy and facilitates distributed model collaboration. However, its architecture still confronts two key security threats—malicious client selection manipulation and server-side gradient tampering—rooted in the contradiction between distributed training and centralized aggregation. Specifically, malicious servers can rig client selection to skew the aggregated model, and the server’s absolute control over gradient aggregation creates a trust bottleneck for tampering, as client authentication relies on server trust and gradient aggregation lacks decentralized verification. To tackle these issues, this paper proposes a client-verifiable FL framework integrating Verifiable Random Functions (VRF) and lightweight Message Authentication Codes (MAC). In client selection, a VRF-based dynamic protocol ensures unforgeable participant identities and publicly verifiable selection results, preventing undetectable server tampering. In gradient aggregation, an innovative lightweight MAC mechanism with auxiliary node collaboration enables trustless tampering detection via gradient-sensitive parameters. Experiments demonstrate that the VRF-based selection maintains performance close to the theoretical benchmark of unmanipulated scenarios, reducing the malicious node selection rate by over 33% compared with traditional FedAvg. Meanwhile, the MAC-based gradient verification mechanism cuts communication overhead by around 24% relative to the baseline VerifyNet.
  • CHEN Mingyun , YU Xin , WEI Zhipeng, ZHANG Jinxiong
    Accepted: 2026-04-22
    This paper proposes a novel distributed neurodynamic optimization algorithm, designed by combining multi-agent theory and the penalty function method, which ensures fixed-time consensus for distributed nonconvex problems subject to local inequality constraints. The initial conditions of the algorithm can be chosen arbitrarily. By appropriately designing the penalty mechanism, it is guaranteed that the algorithm’s state variables enter the feasible region defined by the constraints within a finite time and remain therein thereafter. The consensus term of the algorithm combines a dynamic switching function with a sign function to achieve fixed-time consensus independent of initial conditions, thereby improving the efficiency and controllability of the optimization process. Based on Lyapunov theory, it is proven that, under appropriate assumptions, the algorithm’s state variables remain bounded, enter the feasible region of inequalities in finite time, achieve fixed-time consensus, and ultimately converge to the set of critical points of the nonconvex problem. Compared with existing distributed algorithms, the proposed algorithm adopts a single-layer differential inclusion framework and integrates a penalty mechanism that avoids complicated penalty-parameter tuning with an advanced fixed-time consensus control strategy. This design ensures highly controllable convergence time while preserving structural simplicity, low computational overhead, and flexibility in selecting initial points. The effectiveness and practicality of the algorithm are demonstrated through two simulation studies and an application to optimal facility location.
  • Liu Yongchang, Yin Yanchao, Chen Hailong
    Accepted: 2026-04-22
    In complex process manufacturing, high process coupling, intricate multi-step coordination, and significant nonlinear relationships between product quality and process parameters pose challenges for quality control. To address these issues, this study proposes a segmented multi-process quality prediction method integrating multi-layer neural networks and ensemble learning. The approach first establishes an overall prediction model and segmented prediction models. The overall model employs Random Forest (RF), LightGBM, and KNN algorithms, overcoming the limitations of single-model generalization through ensemble learning strategies while leveraging multi-algorithm differences to extract multidimensional data features. The segmented model utilizes LSTM-KAN networks, where Long Short-Term Memory (LSTM) captures long-term dependencies between process quality and feature variables, while Kolmogorov-Arnold Networks (KAN) enhance nonlinear mapping capabilities. Subsequently, the XGBoost ensemble learning algorithm integrates both models to achieve complementary advantages. Finally, a case study of predicting the moisture content of materials at the exit of the tobacco dryer in tobacco production is conducted for verification. As a core quality characterization indicator in tobacco primary processing, the stability of the exit material moisture content is directly related to the material softening effect of loose conditioning, the liquid absorption efficiency of leaf moistening and feeding, and the drying uniformity of thin-plate tobacco drying. Comprehensive control of the multi-process quality can be achieved through the accurate prediction of this single indicator. The results show that the fusion model is significantly superior to traditional single models and comparative models in key indicators such as mean absolute error (MAE=0.0072), root mean square error (RMSE=0.0096), mean absolute percentage error (MAPE=0.0566%), and goodness of fit (R⊃2;=0.9890). This verifies the effectiveness of the proposed method in handling nonlinear relationships and time-series characteristics, as well as its advantages in prediction accuracy and generalization performance, making it suitable for complex multi-process scenarios in tobacco primary processing.
  • YU Hang, ZHU Hongqing
    Accepted: 2026-04-22
    Magnetic resonance imaging is an important tool for clinical auxiliary diagnosis and lesion detection. Currently, most MRI reconstruction methods are based on global feature modeling, utilizing transformers to achieve high-quality reconstruction. However, these methods often perform dense feature dependency calculations in the spatial domain, which may introduce redundant information and noise from irrelevant areas. Additionally, existing methods require separate training of models for different sampling patterns, resulting in inefficiency and limited generalization capabilities. To address these issues, this paper proposes the Dual-domain Adaptive Transformer Prompt Network (DATP-Net), a unified reconstruction framework that efficiently models feature relationships and reconstructs images from various sampling patterns simultaneously. The proposed network includes several core designs: (1) A deep feature convolution mixer that performs convolution operations in both spatial and frequency domains to enhance the representation of deep features; (2) An adaptive mixing transformer that combines adaptive self-attention and a fine-grained feedforward network, using dual-branch self-attention computation and fine feature elimination to enhance potentially useful feature relationships; (3) A degradation prompt module that injects learnable prior degradation information flow at the reconstruction end to guide feature reconstruction, enabling the network to integrate MR image reconstruction from multiple sampling patterns and enhance the model's generalization ability. Extensive experiments conducted on public IXI and fastMRI datasets demonstrate that the proposed method significantly outperforms state-of-the-art methods with lower computational costs. At a 4x random sampling rate, the model achieves an average PSNR of 39.82 and an SSIM exceeding 0.96, successfully reconstructing images with high clarity and detail restoration.
  • FAN Tianhao, QI Lianyong, YANG Yijie, LI Chong, SONG Te, ZHANG Dejiang
    Accepted: 2026-04-21
    Partial label learning is a typical weakly supervised learning paradigm in which each training instance is assigned a candidate label set that contains the true label. The goal of partial label learning is to identify the ground-truth label from the candidate set for each instance. In real-world applications, partial label data usually exhibit class imbalance. This makes learning methods based on prediction confidence and label refinement prone to bias and thus degrades classification performance. This issue is more severe in long-tailed scenarios, where head classes dominate the disambiguation process and tail classes are insufficiently learned. Moreover, existing optimal transport–based label refinement methods still suffer from systematic bias in imbalanced scenarios. To address these issues, this paper proposes a method named C2DOT-PLL for long-tailed partial label learning. While preserving the global consistency advantage of optimal transport, the method first employs a dynamic confidence calibration mechanism to alleviate unfair comparisons caused by inconsistent confidence scales across classes and to reduce the impact of class imbalance on instance-level label competition. Then, an unbiased optimal transport scheme is introduced in the pseudo-label refinement stage to correct the systematic bias induced by entropic regularization, thereby producing more accurate pseudo labels. Experiments are conducted on multiple benchmark datasets with different imbalance levels. The results show that, compared with existing partial label learning methods, C2DOT-PLL achieves the best overall classification accuracy.
  • ZHANG Haicang, TANG Shibao, HUO Jiuyuan
    Accepted: 2026-04-21
    Accurate traffic flow prediction can provide scientific decision support for traffic management departments, which is crucial for alleviating urban traffic congestion, improving overall network operation efficiency, and enhancing service levels. Addressing the issue of insufficient exploration of periodic spatio-temporal features in existing traffic flow prediction models, this paper proposes a Multi-Period Spatio-Temporal Gated Network (MPSTG) method for traffic flow prediction. The MPSTG method first designs decoupled parallel multi-period feature extraction branches to model spatio-temporal features under different periods in independent subspaces, considering the multi-period characteristics embedded in traffic flow data. Then, within each individual period branch, a spatio-temporal feature extraction module combining a gating mechanism and graph attention diffusion convolution is introduced to enhance the model’s ability to capture dynamic spatial correlations and temporal dependencies. Finally, a bidirectional feature fusion strategy is constructed to achieve efficient collaborative expression of multi-period information for features of different granularities. Experiments on three public traffic flow datasets show that the proposed method outperforms baseline models. In terms of MAE, it reduces the error by 2.0%, 3.4%, and 3.6% in the 60-minute prediction task on the three datasets, demonstrating its accuracy, adaptability, and robustness in complex traffic scenarios.
  • WANG Peng, JIANG Shaohua , ZHANG Yiwen, WANG Wanyu, ZHANG Lianming
    Accepted: 2026-04-21
    Stance detection is a core task in social media public opinion analysis and plays a crucial role in understanding the distribution of public opinions. However, existing methods perform poorly in multi-turn dialogue scenarios, with a significant decline in modeling capability especially when dealing with deep-level comments. The main bottlenecks lie in the lack of a logical reasoning chain for implicit knowledge and the stance formation process, as well as insufficient target-dependent multi-granularity context modeling. To address these issues, this paper proposes a Chain-of-Thought enhanced Context Modeling method (CoT-CM) to improve the accuracy and robustness of stance detection in multi-turn dialogues. Leveraging the external knowledge of large language models, this method guides chain-of-thought reasoning through prompt design, extracts stance-related intermediate variables, and integrates them interactively with dialogue semantics, thereby depicting the reasoning process of the stance formation logic. Meanwhile, a multi-level dialogue semantic framework is designed to model the historical dialogue context from global, local, and relational perspectives, and a target-guided multi-hop attention mechanism is introduced to capture the most relevant information. In addition, a structural consistency contrastive learning mechanism is proposed, which effectively enhances the discriminative ability between different stances by jointly optimizing classification and contrastive losses. Experiments on Chinese multi-turn dialogue stance detection datasets C-MTCSD and ZS-CSD show that CoT-CM achieves an average F1 improvement of 2.97% and 1.36% respectively.
  • Liu Mingkai, He Peiwen, Liu Mengchi
    Accepted: 2026-04-21
    The Text-to-SQL task aims to convert natural language queries (NLQ) into Structured Query Language (SQL). Although the rise of Large Language Models (LLMs) has redefined the paradigm of this task, most existing studies focus on optimizing the model's schema awareness and SQL generation capabilities through prompt engineering, while often neglecting the prevalent semantic ambiguity in natural language. This neglect leads to comprehension biases when models handle complex scenarios. To address this, we propose a Text-to-SQL framework with Disambiguation, Analysis, Refinement, and Election (DARE-SQL). The framework first leverages the semantic reasoning capabilities of LLMs to construct a semantic expansion module, which generates an expanded set of questions covering the user's potential intent space to explicate and capture fuzzy semantics. Subsequently, differentiated generation strategies are applied to questions from various sources, and a refinement mechanism based on execution feedback is introduced to optimize the results, thereby building a high-quality set of candidate SQLs. Finally, a two-stage selection strategy based on question consensus is employed to filter for the optimal solution that balances both accuracy and execution performance. Experimental results demonstrate that DARE-SQL achieves an Execution Accuracy (EX) of 71.71% and a Valid Efficiency Score (VES) of 70.41 on the challenging BIRD benchmark, and reaches 88.10% EX on the classic Spider dataset. These results validate the effectiveness of explicit ambiguity modeling in enhancing performance for complex Text-to-SQL tasks.
  • LIANG Yu, MA Jiayan, HU Xiyuan , WANG Ziheng, LIU Wen, PENG Tianhao, LI Ying
    Accepted: 2026-04-20
    With the rapid development of the internet and social media, the speed of information generation and dissemination has reached an unprecedented level. The proliferation of misinformation, rumors, and other misleading content has become increasingly prominent, posing significant threats to social governance order, harmony, and stability. In rumor detection, the low proportion of rumor samples leads to data imbalance, while existing text augmentation techniques struggle to enhance detection performance due to their lack of specificity to rumor styles and low generation quality. Additionally, although pre-trained language models excel at capturing global dependencies in text, they often fall short in focusing on key local features of rumors. To address these challenges, this study proposes a rumor detection framework based on large-model data augmentation and multi-granularity feature fusion. First, a rumor generation method integrating a rumor-style lexicon and large language models is proposed. Based on publicly available rumor datasets, a style lexicon is constructed to guide large language models in generating semantically coherent and rumor-style consistent minority-class samples. This approach alleviates data imbalance while ensuring the quality of augmented samples. Second, this study introduces a multi-granularity contextual feature extractor. It combines the strengths of pre-trained language models with disentangled attention mechanisms in capturing global dependencies and the focus of convolutional sub-layers on local features. This enables the simultaneous capture of long-distance logical associations and fine-grained linguistic clues in rumor semantics, effectively mitigating the inherent limitations of such pre-trained models in capturing key local features. Experimental results demonstrate that the proposed detection method achieves accuracy rates of 82.24% and 93.91% on the BuzzFeed and PolitiFact datasets, respectively.
  • Wang Xinyue, Sun Zhigang , Quan We, Huang Rong
    Accepted: 2026-04-20
    Time Sensitive Networking (TSN), as a real-time Ethernet technology with deterministic transmission characteristics, has gradually been applied in safety critical scenarios such as automotive and aerospace. In these scenarios, link failures caused by random environmental factors may interrupt TSN connections, thereby affecting static configurations such as TSN time synchronization trees. Therefore, real-time maintenance of network topology has become the key to ensuring system reliability in security critical scenarios. However, there is relatively little research on TSN topology state monitoring, which makes it difficult to meet the high real-time requirements of TSN systems for network monitoring. Based on this, this paper first compares and analyzes the problems and challenges of existing TSN topology state monitoring methods in safety critical scenarios from the perspective of real-time performance; Based on the above analysis, this article proposes a TSN fast topology state discovery protocol for security critical scenarios - FTDP. In FTDP, each node displays the planned monitoring path guidance monitoring probe through the source routing paradigm, requiring only one probe to collect information from the entire network, reducing the delay of topology state discovery; Finally, through testing in a real hardware environment, the experimental results show that the network topology monitoring delay within 10 nodes does not exceed 100 microseconds, confirming that the FTDP protocol can collect network topology in high real-time to complete monitoring. Furthermore, by comparing existing methods, the advantages of FTDP in real-time are further confirmed.
  • Yuxin LIU, Hui LI, Jianwei ZHANG
    Accepted: 2026-04-20
    The autonomous path planning of drones is the key to ensuring the success of the missions in complex environments, requiring it to be able to plan both globally efficient flight paths and respond to changes in local environments. In the initial static environment, complete planning for different combinations of starting and ending points, while adjusting obstacle avoidance in local areas, requires an effective balance between global path optimality and local obstacle avoidance capabilities. Existing heuristic algorithms exhibit an exponential growth in search time with spatial resolution in complex three-dimensional environments, making it difficult to meet real-time requirements. On the other hand, gradient-based deep reinforcement learning methods often encounter the "perception aliasing" problem when dealing with unstructured mountainous terrain due to the lack of local perception guidance, leading to unstable training convergence and susceptibility to local extremum traps. A proximal policy optimization algorithm based on local information enhancement (LIE-PPO) is proposed, and a state space integrating global position information, relative target information, and a local perception window is designed to enable the agent to balance long-term planning and local decision-making, thereby addressing path planning problems in high-dimensional feature spaces. For the path planning problem, the algorithm adopts a 26 neighborhood discrete action space and designs a multi-objective reward function that comprehensively considers path smoothness, safety, and efficiency. This guides the agent to learn an efficient safe path selection strategy, enabling the online generation of feasible and optimal paths between any given start and end points based on a pre-trained model. The experimental results show that, over multiple tests with random start and end points, the proposed algorithm has an approximate global optimality with an average path length difference of less than 7% compared to the results of the A* algorithm in a static environment; Compared to the standard proximal policy optimization algorithm, the convergence speed has been improved by approximately 1.6 times, demonstrating faster convergence speed and higher training stability. In the presence of unknown obstacles, feasible paths can still be planned, demonstrating good environmental adaptability.
  • ZHANG Shenghao, HAN Weili
    Accepted: 2026-04-20
    Passwords remain the most critical factor in identity verification and are widely used in various security scenarios. Enhancing password security relies heavily on the simulation and study of password guessing. In practice, data-driven credential tweaking attacks are highly constrained by the quantity and quality of training samples. Existing few-shot password guessing frameworks are not suitable for credential tweaking attacks. To address these issues, this paper proposes a few-shot credential tweaking attack method based on large language model and data augmentation technology. This method aims to automatically generate pseudo-aligned password data using a minimal number of high-quality samples, thereby reducing the high dependence on data quantity and quality in credential tweaking attacks. The contributions of this paper are as follows: 1) Based on reinforcement learning technology, a credential tweaking attack framework named PasswordRL is proposed. 2) Based on augmentation techniques, this paper proposes the few-shot credential tweaking attack framework PasswordRL-FS. Using four mainstream guessing methods as the baseline, this paper conducts comparative experiments on the aforementioned two frameworks on two real leaked password datasets. Experiments show that in real-world few-shot scenarios (number of training samples = 1000), with guess budgets of 5, 10, and 100, the hit rates of the proposed attack framework outperform the second-best baseline by 39.54%, 23.72%, and 42.40%, and the guess hit rates reach 83.72%, 81.85%, and 93.68% in data-rich scenarios (number of training samples > 107). These experiments demonstrate the effectiveness of the method proposed in this paper.
  • Yuan Shuai, Miao Disheng, Zhang Haonan
    Accepted: 2026-04-20
    Nonlinear state estimation is a core technology in fields such as radar target tracking and robot localization. However, in practical applications, model uncertainties and unknown or time-varying noise covariance matrices (NCMs) cause traditional filtering algorithms to exhibit increased estimation errors or even divergence. Existing adaptive filtering methods often struggle to achieve a balance between estimation accuracy and computational efficiency. To address these challenges, this paper proposes a Robust Sliding Window Variational Adaptive Cubature Kalman Filter (RSWVACKF). Firstly, variational Bayesian inference (VBI) is integrated with the cubature integration rule to derive a joint recursive solution for the state vector, the process noise covariance matrix (PNCM), and the measurement noise covariance matrix (MNCM), enhancing the algorithm applicability in nonlinear systems. Secondly, a sliding-window-based noise covariance estimator is designed. This estimator uses a cubature Kalman smoother (CKS) to backward smooth the state vectors within the sliding window, enabling online estimation of NCMs while avoiding fixed-point iterations and improving computational efficiency. Finally, a multiple fading factors-based strong tracking filter (MSTF) is incorporated. The online estimated NCMs guide the MSTF in adjusting the prediction error covariance matrix(PECM), thereby enhancing the algorithm robustness. Multiple simulations validate the effectiveness of the proposed RSWVACKF. Results demonstrate that the proposed method exhibits significant advantages over existing state-of-the-art approaches in both estimation accuracy and computational efficiency.
  • Yaxin Li, Jingling Yuan, Xian Zhong
    Accepted: 2026-04-20
    Video analytics extracts high-value information from video streams and plays a crucial role in applications such as intelligent transportation and public safety. Although traditional cloud-based video analytics offers powerful computational capabilities, uploading massive amounts of video data incurs high bandwidth consumption and network latency. Edge computing reduces network latency by processing video data near the cameras, but it still faces two major challenges: first, frame-by-frame analysis leads to redundant inference, and existing frame reuse methods cannot fully exploit local similarities in historical frames; second, uneven core workload arises because task allocation across big and LITTLE cores lacks real-time load awareness. To address these issues, this paper proposes Vable, an efficient video analytics system for big.LITTLE edge devices. Vable employs a multi-historical frame, block-level frame reuse mechanism, which partitions video frames into fine-grained blocks and employs a tree-based storage structure combined with locality-sensitive hashing for similarity matching, enabling efficient cross-frame computation reuse and significantly reducing redundant inference overhead. In addition, Vable introduces a core workload-aware list-based DAG partitioning algorithm, which dynamically allocates analysis tasks by monitoring the real-time load of big and LITTLE cores, balancing computation and communication overhead while avoiding latency increases caused by load imbalance. A prototype of Vable is implemented and evaluated on two real-world datasets. Experimental results show that Vable reduces end-to-end latency by 59.23% and 45.83%, respectively, while maintaining high throughput.
  • WU Jiaheng, DUAN Jiancheng, ZHANG Ronghui, CHEN Junzhou
    Accepted: 2026-04-20
    In complex road traffic scenarios, vehicle detection faces significant challenges, including large variations in object scale, frequent occlusions, and the difficulty of simultaneously achieving high accuracy and real-time performance. To address these issues, An improved vehicle detection algorithm, termed YOLOv13n-FCM, based on the YOLOv13n baseline was improved. First, Frequency Dynamic Convolution (FDConv) is introduced into the backbone to strengthen the modeling capability of multi-frequency information, thereby enhancing the representation of vehicle edge structures and fine-grained details. Second, a Channel–Spatial Fusion (CSF) module is designed to jointly model channel-wise and spatial features, enabling the network to focus on salient vehicle regions while effectively suppressing background interference in complex scenes. Finally, a Multi-Branch Fusion (MBF) module is incorporated into the detection head to perform adaptive, weighted multi-scale feature fusion, further improving the detection performance for vehicles at different scales. The experimental results on the public datasets Vehicle Detection Dataset and BITVehicle show that the YOLOv13n-FCM model achieves good detection performance in various road vehicle scenarios. Specifically, on the Vehicle Detection Dataset, the mAP50 reaches 60.1%, and the mAP50:95 reaches 42.6%, which are 2.7% and 2.6% higher than those of the original YOLOv13n model, respectively; at the same time, compared with the best competing method, it has improved by 2.7% and 1.8% respectively. On the BITVehicle, the proposed method also outperforms the baseline model, indicating its certain cross-scenario adaptability. In addition, after hardware acceleration on an NVIDIA Jetson AGX Orin edge device, YOLOv13n-FCM runs at 78.5 FPS with an input resolution of 640×640. Overall, the proposed method substantially improves detection accuracy while maintaining real-time performance, demonstrating strong practicality for engineering applications.
  • DONG Xianzhe, WANG Xiaoheng, LI Jing
    Accepted: 2026-04-15
    In recent years, Multimodal Large Language Models (MLLMs) have advanced rapidly, making the deployment of efficient inference services increasingly challenging. Existing online inference scheduling strategies, such as continuous batching and stall-free scheduling, are primarily designed for text-only large language models. They typically merge the encoding and prefill stages of requests into a single scheduling unit. However, multimodal inputs require significantly longer and more variable processing times during the encoding stage. Employing these coarse-grained scheduling approaches can easily lead to computational resource idling, increased inference latency, and ultimately constrain the overall effective throughput of the system. To address this issue, this study proposes an online inference scheduling strategy, named STEP (Stage-based Time Estimation Priority Scheduling), aimed at enhancing the effective throughput for MLLMs. The key innovation of STEP lies in fine-grained stage decoupling and scheduling of the inference process. Specifically, the multimodal inference pipeline is decomposed into three independently schedulable stages: encoding, prefill, and decoding. Furthermore, STEP employs a lightweight execution-time prediction model trained on historical profiling data to accurately estimate batch execution time under TPOT(Time per Output Tokens) requirements. Finally, a priority-based scheduling mechanism is introduced to accommodate diverse TTFT(Time to First Token) requirements across requests. Experiments were conducted on five open-source multimodal datasets covering tasks such as visual question answering and image understanding and were compared against several baseline methods. The results demonstrate that through stage-aware fine-grained scheduling and execution time prediction, the STEP strategy effectively adapts to the inference characteristics of MLLMs and significantly improves the effective throughput efficiency of online inference systems.
  • CHEN Wenjie, LIANG Yin, DU Mingjing, HUANG Yaosheng, LIU Yanjie
    Accepted: 2026-04-14
    Aiming at the problems of limited pixel resolution, significant scale variation, and dense distribution of small objects in UAV-aerial images, an improved algorithm named SAM-YOLOv12n based on YOLOv12n is proposed. In the backbone network, a Dual-Attention Coupled C2f for Small Objects (DA-C2f-S) module is designed. By introducing a multilevel feature extraction structure and a dual attention mechanism, the module effectively enhances the ability to capture fine features such as edges and textures of small objects. A Multi-Scale Fusion Convolution (MSFConv) module is constructed, which takes Dilated Depthwise Separable Convolution (DDSConv) as the core and designs differentiated branches with various dilation rates. This achieves cooperative modeling of local details and global contextual features, compensating for the limitations of a single-scale receptive field, and better adapting to the scale fluctuation characteristics of small aerial objects. Experimental results on the VisDrone2019 dataset show that the improved method achieves improvements of 9.9% in mAP@0.5 and 7.2% in mAP@0.5:0.95 compared with the baseline YOLOv12n, validating its effectiveness for small object detection in complex aerial scenarios. Generalization experiments conducted on the TinyPerson ultra-small object dataset and HIT-UAV infrared aerial dataset verify the cross-domain adaptability of the proposed method across different aerial scenes. Its core advantage lies in effectively balancing detection accuracy, model complexity, and inference efficiency, providing reliable technical support for real-time object detection tasks in UAV aerial imaging.
  • Kangyi Zheng, Ji Zhang , Bingyu Lin , Tian Yang Ningyi Liu
    Accepted: 2026-04-14
    Semi-supervised feature selection is a powerful tool in machine learning for processing large-scale partially labeled data. However, most existing feature selection algorithms are hindered by challenges such as insufficient computational efficiency, limited scalability, and inadequate accuracy. Related family is a high-efficiency feature selection framework based on granular computing; while it excels in large-scale data scenarios, it remains incapable of handling partially labeled data. To address this, this paper proposes a semi-supervised algorithm based on related family (SRF). First, a redundancy-free granulation method, termed consistent granulation, and a importance degree matrix are introduced to construct a novel related family. This facilitates the design of a semi-supervised feature evaluation method that reduces the complexity from quadratic to linear, effectively overcoming bottlenecks in computational efficiency and scale. Second, to further enhance classification performance, three strategies are implemented: 1) strengthening the data representation capability of information granules; 2) it balances the consistency and the quality of information granules,which are jointly used to evaluate feature importance; and 3) predicting pseudo-labels based on the selected high-quality feature subset to reduce noise interference. Experimental results on 12 public datasets demonstrate that, compared with four representative algorithms—SemiFREE, Semi2MNR, LMSFS, and GMSFS, SRF improves the classification accuracy by 0.88%, 2.34%, 2.81%, and 2.58% respectively. Meanwhile, it enhances the computational efficiency by 36.70 times, 841.56 times, 6.52 times, and 17.04 times respectively. These results verify the effectiveness and efficiency of the proposed method in handling large-scale partially labeled data.
  • LIU Jiaqi, CHENG Xiaona
    Accepted: 2026-04-14
    Federated learning achieves privacy preservation and collaborative modeling through the distributed paradigm of “data staying local and model being shared.” However, existing schemes show clear limitations in client selection efficiency, malicious node defense, and fairness of incentive allocation. This paper proposes a dynamic malicious node identification mechanism, named GIFL, to jointly optimize malicious node detection, efficient client selection, and dynamic incentive allocation. GIFL adopts a lightweight greedy screening strategy to filter low-contribution and high-cost clients. An influence factor dynamic updating mechanism based on model parameter deviation is used to accurately identify and remove malicious nodes. A dynamic reward payment strategy is designed by jointly considering historical and real-time contributions. Experiments on the Fashion-MNIST, CIFAR-10 and Tiny-ImageNet datasets demonstrate that in cross-device federated learning scenarios where the proportion of malicious nodes is 5%-30%, GIFL significantly outperforms five benchmark methods, including FedAvg and IAFL. The malicious node identification accuracy is improved by 5.4% to 23.9%. Compared with QAIM, the pre-selection time is reduced by an average of 86.1%. Model convergence stability and social welfare are significantly enhanced. Under the condition that model accuracy is not lower than 92% (Fashion-MNIST, CIFAR-10) and 88% (Tiny-ImageNet), the average server cost is reduced by 16.94%. The results indicate that GIFL provides an effective and reliable solution for federated learning in mobile edge networks.
  • Zhang Peng, Zhao Guosheng , Wu Xiaosheng
    Accepted: 2026-04-14
    Addressing issues such as limited adaptive capacity, insufficient adversarial robustness, and inadequate consideration of defense costs in dynamic defense models, an asynchronous advantage actor-critic adaptive dynamic defense model that integrates meta-learning and adversarial training is proposed. This model formalizes the defense process as a partially observable Markov decision process (POMDP), designs a reward function that incorporates penalties for false positives/negatives and operational costs, and constructs a three-layer collaborative optimization framework: the inner layer implements efficient strategy search based on the asynchronous advantage actor-critic algorithm; the middle layer introduces projection gradient descent adversarial training to enhance robustness under adversarial perturbations through a minimax game; the outer layer employs model-agnostic meta-learning to construct a meta-optimizer, enabling the model to quickly adapt to new attacks based on a small number of samples. Experiments on the NSL-KDD, UNSW-NB15, and CICIDS2017 datasets show that the model achieves an optimal defense decision rate (ODR) exceeding 92%, with an average reduction in defense resource consumption of approximately 60%. Under high-intensity perturbations, the attack success rate (ASR) remains below 38.2%, with no performance collapse; the detection accuracy for zero-day attacks can be improved to over 88%. This research provides a feasible path for constructing an intelligent dynamic defense system with high adaptability, strong robustness, and high efficiency.
  • LIU Jiale, DENG Weisi, HU Jiaqiu, JING Zhaoxia, ZOU Wenzhong
    Accepted: 2026-04-14
    In new energy power generation systems, missing data severely constrains the reliability of equipment condition assessment and fault prediction. The data in such scenarios typically exhibit high complexity, long-term dependencies, and strong volatility, making conventional imputation techniques inadequate in terms of both accuracy and generalization. To address these limitations, this paper proposes AFMFormer, an adaptive frequency-aware multi-scale transformer designed for imputation in new energy systems. Initially, Pearson correlation coefficients and maximal information coefficients are employed to select informative multivariate features, thereby enhancing the relevance and quality of the input data. AFMFormer integrates an adaptive frequency-domain feature enhancement module that performs frequency decomposition and dominant frequency amplification, emphasizing critical components within complex long sequences. Furthermore, two parallel temporal branches—a Patch-based Transformer for short-term dynamics and a Standard Transformer for long-term dependencies—jointly capture comprehensive temporal representations. Finally, a feature fusion mechanism combines the outputs of both branches to generate the imputed sequences. The experimental results show that the evaluation metrics of the proposed model are all significantly better than the baseline method, in which the mean square errors on the wind and PV datasets are reduced by 49.3% and 31.5%, respectively, compared with the optimal baseline model, which significantly improves the imputation effect.
  • WANG Jiongjiong, ZHANG Shufen, DAI Jiajia, ZHANG Hanrui, ZHANG Yi
    Accepted: 2026-04-14
    Federated learning trains models by sharing model parameters rather than raw data, but it remains vulnerable to inference attacks, which motivates the integration of differential privacy techniques. To address the limitations of static parameter partitioning and uniform noise injection in conventional Differentially Private Federated Learning (DP-FL), this paper proposes an adaptive differentially private federated learning framework with parameter personalization, termed DP-FedADC. The framework introduces Adaptive Parameter Partitioning (APP) to dynamically analyze model parameters and to separate personalized parameters from shared parameters according to their importance. Based on this partitioning, a Differentiated Parameter Update (DPU) strategy is designed to apply distinct regularization constraints to different parameter types, which stabilizes critical parameter updates and mitigates the distortion of optimization directions caused by gradient clipping. In addition, a Client-level Adaptive Privacy Budget Allocation (CAPBA) strategy is proposed to dynamically adjust privacy budgets according to the proportion of personalized parameters at each client, enabling stronger protection for high-sensitivity clients while avoiding excessive noise perturbation on parameters that dominate global convergence. Experiments conducted on MNIST, CIFAR-10, and Fashion-MNIST demonstrate that under strict differential privacy constraints, DP-FedADC consistently improves classification accuracy, convergence speed, and training stability. Compared with existing baselines, the proposed method achieves up to a 2%–4% improvement in test accuracy and converges to a lower loss range, validating its effectiveness and robustness in differentially private federated learning scenarios.
  • CAO Fu, XING Wenbin, ZUO Yong, ZHANG Ronghui, CHEN Junzhou
    Accepted: 2026-04-14
    Unstructured road segmentation is a crucial component of environmental perception for autonomous driving, facing challenges such as the integrity of global topological modeling, the preservation of boundary details, and the trade-off between model efficiency and accuracy. To address these challenges, this paper proposes a Lightweight Axial Context Network (AXON-Net). Employing an encoder-decoder architecture, the network introduces a Channel-and-Spatial Attention Block (CASAB) in the encoder, which adaptively recalibrates feature weights by aggregating multi-dimensional statistical information to effectively suppress environmental noise, thereby enhancing feature discriminability in complex backgrounds. A Lightweight Partial Context Transformer (LightPCT) is designed at the bottleneck, utilizing a partial channel interaction strategy to reduce computational redundancy and efficiently capture long-range dependencies to restore road topological connectivity. Furthermore, the decoder integrates Dual-Path Channel Fusion (DPCF) and Thin Structure Enhancer (TSE) modules, aiming to bridge the feature semantic gap and explicitly enhance axial geometric features for the refined recovery of blurred road edges. Experimental results on unstructured road datasets constructed from the India Driving Dataset (IDD) and the Off-Road Freespace Detection (ORFD) dataset show that AXON-Net achieves road Intersection over Union (IoU) scores of 95.3% and 88.1%, respectively, with only 8.49 M parameters, achieving a superior balance between segmentation accuracy and model efficiency. Ablation studies further validate the synergistic effectiveness of the proposed modules, demonstrating the network's potential application in unstructured road perception tasks.
  • PAN Yuquan, YUAN Deyu, CHENG Jialin, YE Naifu
    Accepted: 2026-04-13
    Across social networks user identity linkage aims to identify whether users on different social networks belong to the same natural person. To address the problem that existing methods struggle to overcome the negative impact of positive-negative sample imbalance on user identity linkage, a method based on MH-Node2vec is proposed. Firstly, an efficient node embedding algorithm named MH-Node2vec is introduced, which incorporates Metropolis-Hastings sampling and a key parameter adaptive adjustment mechanism to process user nodes from different social networks and generate user feature vectors. Secondly, an innovative input vector concatenation strategy based on attention mechanism is proposed, which effectively integrates user features from distinct social networks. Finally, based on the conclusions drawn from the simplest social network analysis, a model named wF-MLP is proposed by incorporating weight factors and Focal loss.Comparative experiments were conducted on the same datasets with existing models such as WLAlign and CrossMNA. The results show that the proposed model achieves a 7.8% and 5.1% improvement in F1 score over the state-of-the-art methods on the two datasets, respectively, and attains the best performance across all evaluation metrics, demonstrating the effectiveness of the model.
  • WANG Xiaosheng, FANG Xiaohong, YANG Hao, LIU Yining, GUO Qiaosheng, LIU Chaofei
    Accepted: 2026-04-08
    A challenge in speech enhancement is that existing Transformer-based methods are insufficient in modeling local features, making it difficult to accurately restore high-frequency details and transient components in speech. To address this issue, a U-Net speech enhancement network integrating a time–frequency Transformer was designed, aiming to improve denoising performance by refining the attention mechanism and feature fusion. The network incorporates a parallel time–frequency joint attention module that explicitly distinguishes and processes time-domain and frequency-domain data in parallel. Additionally, a local–global feature collaboration module is introduced at the bottleneck layer, combining the multi-scale local feature extraction capability of densely connected atrous spatial pyramid pooling with the global modeling advantages of the Transformer. This local–global feature collaboration module employs a dynamic feature calibration mechanism to achieve synergy between multi-scale local context and global dependencies, thereby enhancing perception of speech structure. The network adopts a spectral mapping approach, converting speech into a time–frequency representation via short-time Fourier transform, processing it, and then reconstructing the time-domain signal through inverse short-time Fourier transform. On a 10-hour training set and a 1-hour validation set constructed from the clean speech dataset LibriSpeech and the noise speech datasets ESC-50 and the Columbia University Noise Library, the network achieved excellent performance on multiple objective metrics. The PESQ reached 3.37, the STOI was 97%, and the SI-SDR reached 19.97 dB, surpassing several existing state-of-the-art models.
  • CHEN Yuxuan, LIU Yajun, MO Jiaqing, ZHOU Gang
    Accepted: 2026-04-08
    Major Depressive Disorder is a prevalent and severe mental disorder, and early accurate diagnosis is crucial for treatment intervention. Functional Magnetic Resonance Imaging, as a non-invasive neuroimaging technique, provides non-invasive neuroimaging evidence for depression diagnosis and helps construct detailed brain functional connectivity for diagnosing. However, traditional deep learning methods have limitations in neglecting global temporal dynamic features and difficulty in modeling high-order interactions among multiple brain regions when processing brain functional connectivity data. To address these issues, an auxiliary diagnosis method for depression based on a Spatio-Temporal Cross-Attention HyperGraph Neural Network is proposed. This method takes brain functional connectivity graphs constructed from functional magnetic resonance imaging data as the research object, captures the temporal dynamic features of brain region signals through a temporal branch, models the high-order correlations between brain regions via a spatial branch, achieves deep fusion of the two types of features using a spatio-temporal cross-attention module. Experimental verification on a large-scale multi-center dataset shows that the proposed model achieves an average accuracy of 83.74%, sensitivity of 73.76%, and specificity of 93.39%, showing significant improvements compared with other methods. Ablation experiments verify the effectiveness of the spatial branch, the temporal branch and the spatio-temporal cross-attention module, providing a new technical solution for the clinical auxiliary diagnosis of depression.
  • ZHU Yijian, MAO Ruirui
    Accepted: 2026-04-08
    Generative models have achieved remarkable results due to their effective data generation capabilities and have been widely applied in the field of recommendation systems in recent years. Generative recommendation, through probabilistic modeling, directly learns the potential distribution of users' historical behaviors and generates possible interaction scenarios, breaking through the traditional retrieval paradigm and becoming a research hotspot in the field of recommendation systems. However, the existing generative recommendation systems have insufficient stability due to the randomness of the model generation process, and the limited representation learning ability affects the accuracy of personalized recommendations. To solve the above problems, a generative adversarial recommendation method based on diffusion model is proposed. Specifically, first of all, to alleviate the resource consumption caused by direct diffusion, the original vector is compressed through a Variational Autoencoder (VAE). Then, the diffusion model is used in latent space for multi-step denoising noising and denoising to learn high-quality user representations. In addition, an adversarial training mechanism is introduced to provide feedback signals for the denoising process, alleviating the problem of uncontrollability in its generation process. Experiments were conducted on three public datasets, namely Amazon-book, Yelp and Movielens-1M. Compared with the baseline model, the proposed method has significant improvements in the metrics of Recall and normalized cumulative loss gain (NDCG), indicating that the method can effectively predict user behavior. Improve the accuracy of recommendations.
  • ZhongweiLi, SiyuanNie, LeiquanWang, DekunYuan, YanPingQi
    Accepted: 2026-04-08
    The Segment Anything Model(SAM) has been widely applied in diverse downstream tasks.The complexityof species morphology, high transparency, and varying speciessizes of marine zooplankton pose significant challenges to the adaptability of existing segmentation models, often resulting in low segmentation accuracy.Moreover, the lack of datasets of marine zooplankton images has impeded the exploration of SAM for instance segmentation in this field. To address this issue, this paper constructs a Marine Zooplankton Instance Segmentation (MZIS) dataset with pixel-level fine-grained annotations, which contains 1908 zooplankton images of 25 species categories.Furthermore, this research proposes a Marine Zooplankton Instance Segmentation framework based on SAM, called MZIS-SAM for the Zooplankton images. Specifically, to compensate for the lack of semantic category information, MZIS-SAM first introducesaZooplankton Microimages Adaptive ViT(ZMA-ViT) encoder to extract visual feature prompts of zooplankton and incorporate them into the network.Subsequently, to enhance the multi-scale feature representation of zooplankton, a Multi-Scale Dilated Attention Aggregation Module(MDAAM) is designed that to progressively integrate multi-level features from SAM’s encoder.Finally, MZIS-SAM devises a Feature Prompt Generation Module(FPGM) to automatically generate visual feature prompts for end-to-end segmentation.The experimental results on the MZIS dataset show that compared to existing instance segmentation methods, MZIS-SAM achieves state-of-the-art performance with scores of 77.0%, 97.7%, and 85.8% on , , and , respectively.
  • JIANG Wenhao, DING Xue, WANG Xiang, MA Li, MENG Xianghe, HE Xiangzhen
    Accepted: 2026-04-07
    Music generation has witnessed rapid advancement in the age of artificial intelligence, with traditional music creation processes being gradually replaced by deep learning-based generative models. In recent years, in particular, the application of technologies such as Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), Transformer architectures, diffusion models and large language models has offered entirely new ideas and approaches for music creation. This paper systematically reviews the latest research progress of artificial intelligence in music generation, focusing on the technological evolution from discrete symbolic representation to continuous audio waveform generation, and especially the breakthroughs achieved in multimodal generation, emotional expression, creative control and other aspects. Meanwhile, it elaborates on the practical applications of various generative models in diverse scenarios including entertainment and mass consumption, professional music production, music education, music therapy and health, as well as games and interactive media, and evaluates the advantages, disadvantages and current challenges of different technologies from the perspectives of generation quality, structural consistency, computational efficiency and user controllability. Finally, it discusses the future development trends of artificial intelligence in music creation, such as strategies for improving generation quality, human-machine collaborative creation modes and potential paths for in-depth integration with the music industry, thus providing a reference for further research in this field.
  • YIN Weiliang, Liu Bing, Luo Shanjun, Huang Liang, Chen Xiaohui
    Accepted: 2026-04-02
    Person Re-identification (Re-ID) is frequently challenged by complex factors such as variations in viewpoint, pose, and occlusion. Existing mainstream deep learning methods primarily rely on the statistical similarity of visual features for matching. While these methods perform well in general scenarios, they often lack high-level semantic understanding and logical reasoning mechanisms. Consequently, they struggle to capture fine-grained differences when distinguishing "hard samples" with similar appearances, leading to accuracy bottlenecks. To address these issues, this paper proposes a two-stage Re-ID method featuring a collaboration between small and large models, designed to integrate the efficiency of specialized small models with the robust discriminative power of general Multimodal Large Language Models (MLLMs). The first stage is a rapid recall phase, where a lightweight deep learning model is combined with the K-reciprocal nearest neighbor algorithm to retrieve candidates. This stage filters a small set of highly relevant candidates from the massive gallery, significantly reducing the data scale for subsequent processing while ensuring a high recall rate. The second stage is a precise refinement phase, where a pre-trained MLLM serves as a discriminator to accurately screen the candidate set by leveraging its powerful multimodal understanding capabilities. This collaborative two-stage approach effectively balances inference speed and recognition accuracy. Experimental results on the Market-1501 and DukeMTMC-reID datasets demonstrate that the proposed method achieves Rank-1 accuracies of 98.5% and 96.5%, respectively. These results represent significant improvements of 2.8% and 6.5% over the CLIP-ReID method, fully validating the effectiveness of the proposed approach.
  • Zhou Zesheng, Li Ping
    Accepted: 2026-04-02
    To address the performance degradation of efficient Transformer models in noisy text classification scenarios, this study proposes a robust and efficient classification method that integrates a dynamic low-rank attention mechanism with a dual-view consistency constraint. The proposed approach adaptively adjusts the attention rank based on the variance of input features, allocating higher ranks to semantically complex samples to enhance representation capacity and lower ranks to simpler samples to maintain near-linear computational complexity, thus achieving a dynamic balance between expressiveness and efficiency. During training, a dual-view consistency mechanism is introduced by constructing clean and perturbed text views and enforcing consistency between their semantic representations, which suppresses noise-induced shifts in the decision boundary and further improves robustness. Extensive experiments on multiple Chinese and English text classification datasets — including sentiment analysis, topic identification, and fine-grained emotion classification — demonstrate that the proposed method outperforms fixed-rank baselines in terms of accuracy and exhibits more stable performance across various noise types and intensities. This study provides a novel solution for achieving efficient and robust text classification in complex noisy environments.
  • MA Handa, OUYANG Tao
    Accepted: 2026-04-02
    To address the limitations of existing relation triplet extraction methods in complex contexts, including insufficient multi-relation semantic representation and difficulty in extracting implicit relations, this paper proposes a dual-channel joint encoding model with attention mechanisms, termed AMJERE (Attention-Mechanism Joint Encoding for Relation Extraction). The model constructs independent yet interactive sentence and relation encoding channels to enhance the completeness and discriminative ability of relation semantic representations. Specifically, AMJERE employs a sentence–relation dual-channel independent encoding architecture to separately represent input sentences and candidate relations, reducing semantic interference. A relationship fusion module based on self-attention is introduced to enhance implicit relation modeling by incorporating sentence contextual information. Furthermore, a cross-channel attention mechanism enables deep semantic interaction between sentence and relation representations, capturing latent dependencies between entities and relations and producing compact joint representations. Finally, multiple linear classifiers are used to perform relation prediction and entity label identification, achieving joint extraction of relation triplets. Experimental results on the NYT and WebNLG datasets demonstrate that AMJERE outperforms several baseline models in terms of precision, recall, and F1 score, achieving F1 values of 93.3% and 93.5%, respectively. Ablation studies and qualitative analyses further verify the effectiveness of the proposed model.
  • Long Haiqing, Li Mao
    Accepted: 2026-04-02
    Interactive image retrieval breaks the traditional single-query-return-results paradigm by reshaping the retrieval process into a multi-turn iterative dialogue, allowing users to dynamically guide and refine their intentions based on preliminary results. Text and sketch, as two intuitive and complementary query modalities, offer significant advantages in scene-level image retrieval by effectively expressing complex visual requirements. However, existing methods often rely on the latest-is-best interaction assumption, and their evaluation metrics typically focus only on whether the target is retrieved in any round, ignoring real-world challenges such as noisy feedback, evolving user intent, and insufficient ranking stability. Moreover, sketches are highly abstract and user-drawn with uncertainty, and existing static retrieval models lack the ability to effectively refine ambiguous or incomplete initial inputs through interaction, limiting their practicality and robustness. To address these issues, this paper proposes an interactive text-and-sketch-based scene-level image retrieval framework named IScene. The framework designs three core modules: dialogue rewriting, similarity optimization selection, and visual extension, constructing a retrieval pipeline that progressively refines semantics, maintains discriminative stability, and enhances visual representation. Additionally, to support interactive research, the first multi-turn dialogue dataset for this task is constructed. Experimental results demonstrate that IScene significantly outperforms existing baseline methods in retrieval accuracy and stability across multiple datasets, providing an effective solution for more natural and robust interactive scene retrieval.
  • HAO Guanyi, SUN Jingchao
    Accepted: 2026-04-01
    In the digital era, the complex interactions between modalities such as text, images, and audio have given rise to multimodal misinformation. Its propagation speed and concealment level far exceed those of traditional unimodal misinformation, posing severe challenges to information security and social governance. However, research in this field is relatively scarce in China, and a comprehensive framework has yet to be established. Therefore, this study systematically reviews the research status and development trajectory of multimodal misinformation detection, providing a comprehensive summary of this field. Based on a clear understanding of the core concepts and task spectrum of multimodal misinformation detection, the study details the characteristics of datasets and evaluation metrics. It also analyzes the applicability and detection performance of different multimodal methods and models, such as SAFE, CAFE, CFFN, SSA-MFND, PSCC-Net, DGM4, CCN, SNIFFER, and KGAlign. The study summarizes three core detection methods: cross-modal consistency, anomaly feature recognition, and external fact-driven approaches. Furthermore, it explores the interpretability and generalization robustness of multimodal misinformation detection. With the rise of large-scale visual-language models (LVLM), their application in multimodal misinformation detection is continuously deepening. This study reviews various application scenarios, advantages, and limitations of LVLMs in this domain. Finally, the paper outlines future research directions in multimodal misinformation detection, aiming to provide insights and inspiration for the further development of this field.
  • Tiejun Wang, Ziyi Lu, Xiaoyan Hu, Mengyang Kang, Wenhao Wang, Kaiyan Wang, Chengjie Xu
    Accepted: 2026-03-30
    Existing methods for inpainting bamboo slip text images struggle with structural-texture confusion, complex degradation, and low text-background contrast, often causing structural damage, instability, and artifacts. This paper proposes AmdmaNet, a multi-granularity feature-guided inpainting network. It separately reconstructs texture and structural features to avoid semantic confusion. A Multi-scale Dynamic-range Map Attention (Mdma) mechanism classifies pixels by degradation level, preventing over/under-inpainting. An Adaptive Mask-aware Pixel-shuffle Downsampling (Ampd) method weights damaged pixels using surrounding information and guides downsampling to prevent mask shift, reducing artifacts, blur, and mosaics. Experiments on a custom dataset show our method outperforms state-of-the-art approaches in both visual quality and metrics, demonstrating superior robustness for complex cases like broken strokes and background noise.
  • LIN Suqing, WU Jingheng, CHEN Qixuan, YAN Ming
    Accepted: 2026-03-30
    The rapid growth of tourism renders personalized POI recommendation essential for user experience. But the recommendation encounters feature extraction obstacles caused by extreme interaction sparsity and semantic fragmentation in short reviews. Traditional probabilistic topic models struggle to capture latent semantic correlations due to their reliance on word co-occurrence statistics. Iterative deep learning based on back-propagation are prone to gradient instability and training inefficiency. This paper proposes DeepTSN, a deep learning recommendation framework integrating semantic-enhanced topic modeling. By introducing the semantic clustering-enhanced topic modeling SynTopic, short-text representation is enhanced via an LLM-constructed topic library. Redundancy is removed and similar topics are merged using BERT-Chinese based clustering. This process extracts latent topic features to compensate for missing data. High-dimensional vectors are constructed through feature integration to capture non-linear interactions. A sampling network is integrated to reconstruct the data distribution via adaptive probability density sampling. By employing a constructive learning mechanism to analytically determine network weights, the proposed method effectively mitigates interference from missing data and resolves convergence challenges, significantly enhancing both recommendation accuracy and training efficiency. Experiments on multi-source datasets demonstrate that DeepTSN outperforms baselines across real-world and public scenarios with varying interaction densities. The model reduces MAE by up to 21.34% and 12.72%, and MSE by 22.89% and 7.32%, respectively. Furthermore, it cuts runtime by approximately 61.69% and peak memory by 72.87%.
  • ZHANG Ke, Li Fei
    Accepted: 2026-03-30
    To address the insufficient representation of original sequence features and the information loss caused by the decomposition strategy of existing "decomposition-ensemble" forecasting models in long-term time series prediction tasks, this paper proposes a High-Dimensional Feature Series Enhancement Network (HDFSENet) incorporating an attention mechanism. The network integrates embedding techniques, the Mixture of Experts Decomposition (MOEDecomp) block, and the Feature Series Enhancement (FSE) block to capture the inherent characteristics of time series while reducing information loss in decomposition strategies. Firstly, the method strengthens the feature information of the original time series through three embedding techniques: value embedding, position embedding, and temporal embedding. Secondly, the enhanced time series is decomposed into trend feature series and seasonal feature series via the MOEDecomp block. Subsequently, an FSE block based on the attention mechanism is constructed to capture the interactions between the decomposed trend and seasonal feature series, thereby improving the representation capability of these features. Afterwards, these interaction features are integrated into the model as key variables to further enhance forecasting accuracy. Finally, the effectiveness of the model is verified on multiple benchmark datasets. Experimental results demonstrate that HDFSENet significantly outperforms several benchmark models in evaluation metrics such as MSE and MAE, indicating that the proposed model provides a reliable approach for more accurate time series forecasting.
  • JU Hongzheng , TANG Jianhang , ZHANG Yang , JING Kebing
    Accepted: 2026-03-30
    In recent years, an increasing number of studies have focused on modeling users’ multiple interests from their behavioral sequences in order to better capture complex user preferences. However, in implicit modeling scenarios where external auxiliary information such as item categories is unavailable, existing multi-interest models often struggle to accurately determine the interest attribution of individual behaviors. As a result, items that are weakly related or even irrelevant to the target interest are easily aggregated into the same interest representation, leading to the introduction of interest-specific noise. To address this issue, we propose a two-stage denoising multi-interest recommendation algorithm, termed DMIRec, which suppresses interest-specific noise at both the item-feature level and the interest-representation level. In the item denoising stage, learnable adaptive filters are employed to filter out irrelevant item features within each interest, yielding denoised behavior sequences for different interests. In the interest denoising stage, a conditional diffusion model is introduced, where items highly related to the current interest serve as guidance signals to iteratively remove noise components from the corresponding interest representations. Furthermore, to enhance the overall denoising effectiveness, we design a target-guided multi-interest loss that explicitly incorporates the recommendation target into the multi-interest learning process. This loss encourages appropriate responsibility assignment among different interests and reduces the influence of interest-specific noise from an optimization perspective. Experiments conducted on three real-world datasets, Book, Beauty, and Retail Rocket, show that, compared with the best Top-50 recommendation results among baseline models, the proposed method achieves improvements of 8.84%, 2.03%, and 2.27% in Recall; 9.78%, 0.95%, and 0.72% in Hit Rate (HR); and 9.07%, 3.87%, and 2.49% in Normalized Discounted Cumulative Gain (NDCG), respectively. These results demonstrate the effectiveness and robustness of the proposed approach.
  • Liang Hao, Bohejun Su, Jinghua Wang, Yong Xu
    Accepted: 2026-03-27
    Model quantization technology effectively reduces model storage and computational overhead by mapping high-precision floating-point data to low-bit discrete spaces. A core focus of model quantization research is how to rationally account for the characteristics of parameter distributions to construct superior mapping schemes. Existing Post-Training Quantization (PTQ) schemes nearly universally assume that the data distribution of non-activation layers follows a symmetric bell-shaped curve, but overlook the fact that small biases introduced by the model’s activation layers and inputs induce distributional asymmetry. Consequently, the resulting quantization mapping is skewed to one side due to this subtle asymmetry, leading to significant approximation loss. This paper investigates quantization schemes for image super-resolution and proposes improvements to the widely recognized two-stage post-training quantization scheme. First, the max-min-based equal partitioning employed in the pre-search for quantization bounds is modified to a sorting-based non-uniform partitioning approach. Second, a bias term is introduced during the pseudo-quantization process, where a portion of the data and its mean are adaptively shifted to mitigate estimation loss caused by data bias. The improved scheme outperforms the original counterpart across almost all performance metrics while retaining the same high compression ratio and acceleration ratio: compared to the original SwinIR-light model, it reduces parameter count by approximately 67.4% and accelerates the super-resolution process by 3.99×.
  • Lin Cao, Zhanqi Zhang, Benkui Zhang∗, Ying Chang, Zhizhe Liu, Kangning Du, Yanan Guo
    Accepted: 2026-03-27
    With the rapid growth of cyber-physical systems, massive time series data are continuously collected by sensors. Timely and accurate anomaly detection in such data is crucial for maintaining system stability and preventing potential risks. Due to the scarcity and imbalance of anomalous samples, time series anomaly detection is often modeled as an unsupervised learning task. In particular, contrastive learning leverages the latent consistency shared by normal samples across different views. By minimizing the representation distance between different augmented views of the same sample, it constructs a more compact and discriminative normal feature space. This significantly enhances the separability between normal and abnormal patterns, making it a highly promising mainstream paradigm in the field. Although contrastive learning–based methods have achieved notable progress, they still struggle to capture complex contextual variations in time series, limiting detection performance. To address this challenge, we propose Dual-Branch Intra- and Inter-Sample Representation Learning for Time Series Anomaly Contrastive Detection (I2CD). The framework explores hierarchical contextual dependencies within samples while leveraging inter-sample information to enhance normal variation patterns, enabling more discriminative representations for abnormal changes. Specifically, we design a multi-expert temporal pyramid module to adaptively capture hierarchical dependencies in multivariate sequences. In addition, we introduce a prototype-guided normal pattern enhancement module that builds inter-sample information interactions using representative prototypes of normal patterns, suppressing anomalous variations and enlarging the representational gap between normal and abnormal samples. Experiments on six real-world benchmark datasets demonstrate the effectiveness and robustness of our approach in time series anomaly detection.
  • Lu Xiaochen, Wang Shenglan, Zhong Yan, Zhang Jingjing, Zhang Lei
    Accepted: 2026-03-27
    In recent years, deep learning has achieved increasing success across various research fields such as computer vision, in which activation functions play an important role in enhancing the nonlinear fitting capability of deep neural networks. However, existing activation functions such as ReLU, SiLU, etc., have revealed more and more issues as research progresses, such as the problems of gradient vanishing/dead and the lack of adaptive regulation capability in the negative region, etc. This paper proposes a new activation function—Adaptive Parametric Softplus-Sigmoid (APSS)—for the salient feature preservation and dropping in common object detection and recognition tasks. It aims to extract and learn the multi-scale collaborative features from complex backgrounds. This activation function is based on the base-gate combination mechanism in biological neuroscience. The base unit ensures the learnability of basic features and gradient stability. The gate unit achieves the suppression of invalid features by dynamically adjusting the response intensity in the negative value region. The combination of two units can promote the network model's balance of retaining or suppressing features. To verify the advantages of this activation function, this paper conducts comparative experiments with several typical object detection and recognition network prototypes on three experimental datasets: SoccerNet, UA-DETRAC, and BEEF24. The research results show that the proposed APSS activation function is significantly superior to the activation functions in the original network models. It has better target feature extraction and fitting capabilities.
  • Anbo Huang, Haicheng Qu
    Accepted: 2026-03-24
    The rapid growth of the open-source ecosystem has accelerated the spread of software vulnerabilities, posing significant threats to information security. Sequence-based deep learning methods struggle to capture the structural characteristics of source code, while existing graph neural network–based approaches struggle to sufficiently integrate topological structures with node features. To address these challenges and overcome the limitations of current deep learning–based techniques, we propose MVGE-Net, a source code vulnerability detection method that integrates multi-view graph representations with edge-type information.In MVGE-Net, source code is first transformed into a graph representation. Then, depending on the semantic richness of the nodes, different pretrained models are utilized to obtain node embeddings. Subsequently, topology graphs, feature graphs, and shared graphs are constructed from multiple perspectives to capture complementary information. Meanwhile, edge-type information is incorporated into node features to enhance representational capability. Finally, a lightweight gating mechanism fuses the extracted features to generate the final vulnerability prediction.Experiments conducted on two benchmark datasets show that our method achieves improvements of 9.14, 9.13, 1.75, and 5.74 percentage points in Accuracy, Precision, Recall, and F1 score, respectively, compared with the baseline method Devign.Both qualitative and quantitative analyses confirm the effectiveness of the proposed approach. Overall, MVGE-Net successfully addresses the limitations of existing GNN-based methods and provides a more robust and efficient solution for vulnerability detection.
  • HuangTianyi, ZhangCong, LiuShiyi, ZuoJiayi, WangZheng
    Accepted: 2026-03-24
    Fine-grained image-text matching technology achieves high-quality image-text matching by aligning visual semantic fragments such as regions in images and words in sentences. Although existing studies have made significant progress at the region-word alignment level, in the text-word aggregation link, there still exists the problem that the aggregation strategy is difficult to adapt to the text length and the semantic distribution of words, which will lead to the loss of semantic information and ultimately reduce the overall matching accuracy. To solve this problem, this study proposes a Lightweight Dynamic Aggregator (LDA). The LDA consists of a micro neural network and a Softmax function. It dynamically generates the weights for summation and mean aggregation by analyzing the text length and the semantic distribution of words. The LDA network first projects the input text features into a high-dimensional space and performs nonlinear transformation to capture complex interactions, and then maps them back to a low-dimensional space to compress the features. To prevent the loss of feature information during the transformation process, the network uses residual connections to enhance the information flow, and finally normalizes through the Softmax function to stabilize the weights. The experimental results show that the proposed method outperforms the existing advanced algorithms on public datasets. On the Flickr30K dataset, the proposed method achieves the best overall score and top performance on all metrics in the text-to-image retrieval direction, with a 2.1% improvement on R@1. On the 1K and 5K test sets of the MS-COCO dataset, the retrieval total score was the best result, and in all metrics of the two directions, it demonstrated comparable or superior performance, while only introducing negligible additional computational overhead. This work not only verifies the significance of the joint optimization of text length and semantic distribution in the aggregation stage, but also provides an efficient and robust new aggregation idea for fine-grained image-text matching.