Author Login Chief Editor Login Reviewer Login Editor Login Remote Office
Home Browse Just accepted

Just accepted

Please wait a minute...
  • Select all
    |
  • ZHENG LEYU, LI KE, REN YI, ZHANG LEI
    Accepted: 2025-12-12
    Evolutionary algorithms have demonstrated strong performance in solving constrained multi-objective optimization problems (CMOP). However, when the unconstrained Pareto front (UPF) and constrained Pareto front (CPF) do not intersect and are far apart, the evolutionary process often lacks effective differentiation. This leads to the transfer of negative individuals and a lack of diverse feasible solutions, which can hinder population convergence and overall optimization performance. To address these issues, this paper proposes a Problem-Type Guided Dynamic Knowledge Transfer Cooperative Evolutionary Algorithm (DKTCEA), which includes two phases: independent exploration and cooperative evolution. In the independent exploration phase, the main task leverages prior knowledge from the auxiliary task to navigate infeasible regions, identify the problem type, and design a differentiated evolutionary strategy for guiding population evolution in the next phase. In the cooperative evolution phase, the auxiliary task introduces an improved ε-constraint handling mechanism to enhance solution feasibility. Furthermore, an improved knowledge transfer strategy is employed to select individuals from the source task to transfer to the target task. This minimizes the transfer of negative individuals, improving population quality and enhancing the global convergence of the main task population. Compared to five state-of-the-art constrained multi-objective optimization algorithms, DKTCEA achieved 14 and 11 optimal results in Inverted Generational Distance (IGD) and Hypervolume (HV) across 23 problems in the MW and DOC test sets, respectively. Ablation experiments further validate the effectiveness of the proposed strategies.
  • TANG Na, LI Hao, LI Jing-Jing, CHEN Wei-Qi, TANG Yong
    Accepted: 2025-12-12
    With the development of mobile terminal positioning technology, the scale of trajectory data has increased dramatically. The storage and rapid query of massive trajectory data have become research hotspots. The distributed framework can provide efficient data processing capabilities. This paper first proposes a local trajectory index TRindex, which effectively preserves the proximity of temporal and spatial data and supports spatiotemporal queries. This paper also designs a multi-layer range circle mapping method in TRindex, maps the spatial minimum bounding rectangle (MBR) to a one-dimensional axis, establishes an order of the distance from the trajectory to the center of the range circle, and builds a spatial range tree based on this order. This design ensures spatial proximity, providing spatial proximity for range queries. It also forms an ordered relationship between the distances from trajectories to reference points, enabling efficient pruning of K-nearest neighbor queries and effectively reducing the problem of duplicate calculations in K-nearest neighbor queries. Finally, based on TRindex, this paper constructs a distributed trajectory index (DTRindex), which consists of three main components: data partitioning, local indexing, and global indexing. The global index is a modified R*-tree with a Bloom filter applied to each node, effectively improving query efficiency.The DTRindex effectively supports three spatiotemporal query algorithms: spatiotemporal range queries, K-nearest neighbor queries, and mobile object trajectory queries. Finally, the Hadoop-based distributed trajectory index HadoopTrajectory, the single-machine index PM-tree, and the NoSQL-based distributed trajectory index TMan were selected as experimental counterparts for comparison. Through simulation experiments, DTRindex has been demonstrated to exhibit superior performance across multiple metrics: in spatio-temporal range query efficiency, it achieves average improvements of approximately 57%, 74%, and 25% compared to HadoopTrajectory, PM-tree, and TMan respectively; For k-nearest neighbour queries, performance improved by 40%, 48%, and 20% on average; for mobile object trajectory queries, efficiency increased by 50%, 53%, and 30%. Furthermore, ablation experiments validated the effectiveness of each core module. The spatial range tree layer contributed most significantly, achieving an overall average performance improvement of 2.5times. The temporal index layer contributed secondarily, yielding an average performance improvement of 1.2 times. The moving object double linked list contributed approximately 90% to the average performance improvement, making its contribution most substantial in moving object trajectory queries, where efficiency increased nearly fourfold.
  • HUANG Jie, TANG Jianhang, ZHANG Yang, DU Luole, FENG Yixiong
    Accepted: 2025-12-12
    Smart grid has rich power infrastructure in industry 5.0 ,and there are many kinds and widely distributed load detection devices in smart grid, which leads to strong heterogeneity of user load data collected by edge load detection devices. Using distributed federated learning for load training of larger models is prone to unstable model convergence. To address this issue, an efficient training method for the partitioned federated learning model for the smart grid was proposed. This method applies the training of neural network models to the area from substations to users. By using the split layer, the global model for power load prediction is divided into the top model and the bottom model. The server first collects the resource information of the load detection devices, then uses the freshness index of the load prediction model to define the priority to select the training set of the load detection devices, and allocates appropriate batches for heterogeneous load detection devices for the training of the bottom model. The server merges the heterogeneous load detection device features in the training set to generate a larger mixed feature sequence, reducing the impact of device heterogeneity on the training data and improving the model accuracy. The KL-divergence is used to measure the distribution difference of the training set, and the batch size is fine-tuned to reduce the distribution difference. Based on the public power load curve dataset, three baseline methods were compared. In non-independent and identically distributed data, the accuracy of this method was up to 3.6%, 11.7%, and 12.9% higher than the baseline methods.
  • Chen Haiyun, Deng Zhouyao, Xiang Haorui
    Accepted: 2025-12-12
    Small-target detection in aerial imagery is challenged by tiny object sizes, complex backgrounds and large scale variations, while existing detectors still underperform in feature extraction, multi-scale fusion and small-target awareness; to address these limitations we present MA-DETR, an improved RT-DETR-based aerial small-target detection algorithm. First, a Dual Adaptive Perception Network (DAPN) is embedded in the backbone, leveraging a spatial–scale separation module and a dual adaptive pooling mechanism to enhance perception across diverse scales. Second, an Adaptive Multi-Scale Feature Fusion Network (AMSFN) is designed with a multi-module collaborative architecture that establishes a bidirectional multi-path feature transmission mechanism to boost small-target representation. Additionally, a small-target detection layer based on Adaptive Wavelet Convolution (AWC) is introduced, serially combining wavelet convolution with a remote-sensing anchor attention mechanism to strengthen small-target features in both the frequency and spatial domains. Finally, a CF-CGDL loss integrating a core focusing mechanism and a corner geometric distance loss is proposed to refine bounding-box regression. Experiments on VisDrone2019 yield 43.5 % mAP@50, outperforming the baseline by 6.4 % while reducing parameters by 1.1 × 10⁶; generalization tests on DOTA v1.0 and RSOD reach 71.8 % and 95.5 % mAP@50, gains of 3.1% and 7.1 % respectively, demonstrating the method’s effectiveness and robustness.
  • Li Luyang, Yan Jinlong, Fang Zeru, Jin Qiqi, Xue Hongxin
    Accepted: 2025-12-12
    In 3D object detection from point clouds, the inherent sparsity of LiDAR data poses pronounced challenges for small objects. Few effective points lead to weak structural cues and blurry boundaries; limited contextual awareness hinders spatial reasoning and semantic completion, causing localization bias; and the difficulty of precise spatial localization, weak channel expressiveness, and background dominance jointly constrain accuracy. To mitigate the impact of the above issues on detection accuracy, we propose a dynamic-aware 3D detector that integrates dynamic feature extraction with feature-enhancement mapping, targeting the two critical stages of small-object detection—feature extraction and candidate generation. Specifically, we introduce a dynamic point-feature prediction network that adaptively predicts and supplements sampling points to strengthen structural perception of small objects; we then build a feature-enhancement mapping network that deeply fuses the original features with those produced by the dynamic module to yield context-rich 2D feature maps, thereby compensating for contextual deficiency and improving localization; finally, we design a point-cloud feature-enhancement module to sharpen focus on key small-object regions along both channel and spatial dimensions. Experiments on the nuScenes dataset demonstrate that our approach surpasses mainstream detectors: relative to the CenterPoint baseline, mean Average Precision (mAP) increases from 56.1% to 59.4%, and the nuScenes Detection Score (NDS) rises from 64.4% to 67.4%.
  • HUANG Zhengting , CHEN Xuexin, LIN Zhiyong , CAI Ruichu
    Accepted: 2025-12-12
    Predicting synthetic lethality (SL) interactions holds significant promise for anticancer drug discovery. However, existing interpretable SL prediction methods typically assume a fixed number of explanation patterns, limiting their ability to capture the inherent diversity underlying SL mechanisms. In this study, we propose DiSE4SL, a model that formulates the generation of explanatory subgraphs as a stochastic process in function space, thereby addressing the critical challenge of adaptively determining the number of explanatory patterns. Built upon the neural process framework, DiSE4SL first leverages a base SL predictor to obtain prediction scores and node embeddings for gene pairs. A context encoder then integrates structural features with predictive semantics into a unified vector representation, which parameterizes the conditional posterior of a Gaussian Mixture Model (GMM), mapping distinct explanatory patterns to different Gaussian components. During training, latent variables are sampled via the Gumbel–Softmax mechanism, and mode-aware attention weights sparsify local subgraphs to yield explanations. In addition, contrastive loss and Lipschitz regularization are introduced to encourage discriminative yet smooth explanatory patterns across components. Finally, by sampling latent variables and applying clustering without a preset number of clusters, DiSE4SL can adaptively extract multiple explanatory subgraphs for each gene pair. The effectiveness of DiSE4SL is validated on benchmark datasets, where it delivers competitive predictive performance (AUPR 0.9337) against the strongest baseline and significantly enhances explanation diversity and fidelity by 29.1% and 9.5%, respectively, compared to the second-best method.
  • Ren Haimeng, Yu Hongfei, Ai Xin
    Accepted: 2025-12-10
    To address the issues of insufficient feature interaction depth and weak long-term sequence modeling capability in existing trajectory prediction models, a vehicle trajectory prediction model based on coarse and fine-grained feature interaction and long short-term memory enhancement is proposed. This model aims to achieve interactive enhancement of coarse and fine-grained features in the scene, deeply integrating the inherent advantages of dual perspectives. It extracts coarse-grained features such as road structure and traffic flow distribution from the scene center perspective to construct a macroscopic motion framework; and extracts fine-grained features such as the relative motion between the target vehicle and surrounding agents and local interaction relationships from the agent center perspective to depict microscopic behavior details. Through the dynamic constraint and deep interaction of fine-grained features on coarse-grained features, the problem of insufficient feature interaction depth is effectively improved, achieving precise refinement of the end positions of multi-modal predicted trajectories. Meanwhile, to effectively alleviate the weakness in long-term sequence modeling capability, a long short-term memory enhancement module with dual memory units is designed to capture long-distance temporal dependency features. Through feature weighting and trajectory endpoint correction strategies, the model's prediction capability for long-term trajectories is effectively enhanced. Experimental results show that compared with mainstream trajectory prediction models, the proposed method has significant improvements in key indicators. On the Argoverse 1 dataset, the average improvements in the minimum final displacement error, minimum average displacement error, and minimum final displacement error are 4.4%, 5.4%, and 4.9% respectively. On the Argoverse 2 dataset, the corresponding indicators are improved by an average of 5.1%, 6.3%, and 5.8% respectively. This result not only proves the improvement in trajectory prediction accuracy of the proposed model but also verifies its generalization effectiveness in different data distribution scenarios.
  • Wang Fatang, Song Ran, Huang Yuxin, Xiang Yan
    Accepted: 2025-12-10
    Multi-modal Entity Alignment (MMEA) aims to integrate structural, textual, and visual information to identify nodes in different multi-modal knowledge graphs that represent the same real-world entity. Existing methods often ignore the inconsistency in attribute-type descriptions between knowledge graphs when fusing multi-modal features, which leads to deviation in entity representation and affects alignment performance. To address this issue, this paper propose an MMEA method based on attribute filtering enhancement and multi-round instruction reasoning. The method consists of three main modules: First, multi-modal information is integrated and entity similarity is calculated to obtain candidate entity sequences. Secondly, in the entity information processing section, an attribute screening enhancement mechanism is employed to select semantically similar entity attribute types between knowledge graphs, thereby mitigating the interference caused by differences in attribute descriptions and redundant information. This helps reduce interference caused by descriptive differences and redundant information. Finally, the alignment task is modeled as a multiple-choice problem, where the filtered attributes and natural language information of entities are combined to fine-tune large language models. During reasoning, a multi-round reasoning strategy is introduced, dividing the large number of candidate entities into subsequences to enhance the model's ability to distinguish semantic differences among the entities in the subsequences, thereby improving the accuracy of the final alignment reasoning. Experiments are conducted on multiple public datasets FB-DB15K, FB-YAGO15K, EN-FR-15K V2, EN-DE-15K V2, and the results demonstrate consistent improvements in entity alignment performance of our method over the baseline methods. Specifically, on the FB-DB15K, EN-FR-15K V2, and EN-DE-15K V2 datasets, our method achieves absolute gains in MRR of 2%, 1%, and 0.2%, respectively, compared to the second-best model. Notably, a significant substantial margins of 9.1% in MRR and 7.8% in Hits@1.
  • HAN Song , CHE Chang-chang , WANG He-long
    Accepted: 2025-12-10
    With the rapid development of autonomous driving technology, accurate trajectory prediction has become essential for ensuring driving safety. In this context, Adversarial Multimodal LSTM–Informer for Integrated Driving Intention and Trajectory Prediction (AMLI-DIR) is proposed. The model adopts a hierarchical architecture. In the intention recognition layer, a GATv2-BiLSTM network is constructed to extract the spatial and temporal features of the target vehicle and its surrounding vehicles. Meanwhile, a spatiotemporal cross-attention mechanism is introduced to effectively fuse these features, thereby achieving precise driving intention recognition. In the trajectory prediction layer, independent prediction models are built for lane-keeping and lane-changing scenarios, and a multi-criteria generator is employed to produce accurate predicted trajectories. During the prediction stage, the AMLI-DIR model first identifies the most probable driving intention and then activates the corresponding trajectory prediction model, enabling intention-specific trajectory prediction. The model is trained, validated, and tested using the NGSIM and CQSkyEyeX datasets based on real-world traffic scenarios. Experimental results demonstrate that the AMLI-DIR model outperforms all comparison models across multiple evaluation metrics. Notably, in long-term prediction (3 s), it achieves the lowest RMSE of 1.05 m, which is approximately 22.2% lower than that of the second-best model, STEI. Furthermore, the RMSE of the AMLI-DIR model increases by only 0.26 m from 1 s to 3 s, significantly lower than other models, further validating its effectiveness and superiority in trajectory prediction tasks.
  • CHEN Lingqiang , HU Haifeng, ZHANG Suofei
    Accepted: 2025-12-09
    To address the issues of limited single-round generation quality and low search efficiency in small-scale language models for automated workflow generation, a workflow generation method based on Monte Carlo Tree Search and Self-Refine (WGM-MCTSR) is proposed. This method enhances workflow generation performance through two core mechanisms: first, a workflow self-refine optimization mechanism is designed, which employs iterative generation-evaluation-reconstruction cycles to perform structural reconstruction or correction of workflows using feedback evaluation information, thereby compensating for the limited reasoning capabilities of small-scale language models; second, the selection and backpropagation phases of the Monte Carlo Tree Search algorithm are improved by introducing an Upper Confidence Bound Apply to Tree (UCT) selection strategy to replace the traditional soft-mix selection probability, and implementing a child node score backpropagation mechanism to dynamically adjust parent node selection probabilities, thus optimizing the search direction. Experimental results on six datasets including GSM8K, MATH, DROP, HotpotQA, HumanEval, and MBPP demonstrate that the method achieves solve rates of 70.11% and 23.45% in mathematical reasoning tasks, F1 scores of 54.87% and 52.47% in question-answering tasks, and pass rates of 81.83% and 58.82% in code generation tasks. Compared with existing workflow generation methods, the method achieves performance improvements of 5.4% on GSM8K and 9.6% on MATH, obtaining optimal results across all task types, which validates the effectiveness of the improved mechanisms in enhancing workflow generation efficiency and quality for small-scale language models.
  • Li Qian, Liu Peng, Yao Lian, Wu Jigang
    Accepted: 2025-12-08
    The Memristive Crossbar Array (MCA) serves as the fundamental hardware component of the Computing-in-Memory (CIM) architecture, enabling matrix operations to be performed with O(1) time complexity. However, due to the limited bit-width of device, existing methods often require configuring a large number of memory cells to represent numerical values, leading to increased hardware resource consumption and making it difficult to achieve both high precision and high energy efficiency. To address this issue, paper proposes a mixed-precision quantization method based on crossbar-aware. This method first employs K-means clustering to optimize output channel rearrangement, enhancing weight distribution consistency within sublayers to reduce quantization error and improve post-quantization model accuracy. Building upon this, sublayers are partitioned according to the physical constraints of the MCA, ensuring the output channel count aligns with parallel processing capacity of the MCA. This reduces the number of dequantization operations and lowers computational complexity. Simultaneously, an array-aware regularization term is introduced, combining the number of MCA required per sublayer with group Lasso regularization. This dynamically induces bit-level sparsity in weights, reducing hardware resource overhead while compressing bit width. Experiments show that the method is able to quantize the network model to an average of 1.3-bit with no more than 0.2% loss in accuracy and a reduction in hardware area overhead of about 74% compared to traditional quantization methods on different neural networks (ResNet/VGG). Compared with existing quantization schemes, the method proposed in this paper achieves a synergistic optimization of accuracy and hardware resources at very low bit-width.
  • ZHAO Peiyuan, GONG Xiaoliang
    Accepted: 2025-12-04
    Addressing issues in existing rehabilitation robot simulation research, such as the mismatch between biomechanical characteristics and robot control strategies, and insufficient automation in human-robot coupling simulations, this study innovatively integrates robot kinematics analysis, training trajectory planning and design, and biomechanical characteristics of musculoskeletal models to construct a human-robot joint simulation system for upper limb rehabilitation robots based on OpenSim and MATLAB, and proposes an automated human-robot coupling simulation process. The system enables synchronous joint angle adjustment and motion playback visualization for matched models. At the robot simulation layer, it provides forward and inverse kinematics calculations, and offers four trajectory planning algorithms for different application scenarios. The computational results are converted in format and then transmitted to the biomechanical simulation layer. In the biomechanical simulation layer, residual reduction is combined with computational muscle control to compensate for unmodeled external forces (i.e., indirectly compensate for robot external force data errors) and optimize muscle activation solutions. It also supports the visualization of biological information such as muscle activation levels and muscle fiber lengths in simulation results, helping rehabilitation physicians more accurately assess patient recovery outcomes. The system is validated through joint kinematic and dynamic simulation experiments focused on the flexion of the right upper limb elbow. Compared to traditional methods, this innovative system significantly improves efficiency and automation while simplifying the complexity of cross-platform simulation operations.
  • CUI Haoran, QUAN Ting , CHEN Maowei, DAI Rong
    Accepted: 2025-12-04
    In the process of solving computational fluid dynamics (CFD) problems, the Algebraic Multigrid (AMG) algorithm can effectively accelerate the solution process. As the most widely used open-source CFD software, OpenFOAM employs the Geometric Agglomerated Algebraic Multigrid (GAMG) algorithm based on the Lower-Diagonal-Upper (LDU) matrix format to accelerate flow field solutions on CPUs.In recent years, CPU+GPU heterogeneous parallel computing systems have flourished, and domestic GPGPUs have achieved breakthroughs, enabling localized substitution. Targeting such heterogeneous computing systems, extensive research has been conducted on GPU-accelerated algorithms in CFD, implementing a heterogeneous parallel version of the GAMG algorithm in OpenFOAM on domestic platforms can fully leverage domestic computing power and significantly improve simulation efficiency.Targeting a heterogeneous computing platform composed of CPUs and domestic GPGPU accelerator cards, this work designs and implements a parallel acceleration method for the LDU-based GAMG algorithm. By fully utilizing the multithreading capabilities of GPUs, all components of the GAMG algorithm are optimized for parallel execution on the GPU.Benchmark tests on the 3D lid-driven cavity flow and motorBike flow-over cases are conducted to verify the correctness and evaluate the performance of the heterogeneous GAMG algorithm at different problem scales. Experimental results show that the proposed algorithm maintains the same computational accuracy as the original version. The heterogeneous GAMG implementation configured with a Jacobi smoother achieves a 10–27× speedup compared to the CPU serial implementation configured with a Gauss-Seidel smoother. Performance analysis indicates that the computational speed of the time-dominant restriction and smoothing operators has been significantly improved.These results validate the effectiveness and computational potential of the GAMG parallel solver framework on domestic heterogeneous platforms and provide a feasible approach and technical foundation for the heterogeneous parallelization and engineering application of CFD solvers on domestic GPGPU systems.
  • WANG Hao, QIN Jin , YANG Changhao
    Accepted: 2025-12-03
    The Ant Colony Optimization (ACO) algorithm is widely employed for solving combinatorial optimization problems, where effective heuristic information facilitates rapid convergence to high-quality solutions. The existing neural ant colony optimization algorithms, such as Deep Ant Colony Optimization(DeepACO) and Generative Flow Ant Colony Sampler(GFACS), leverage deep reinforcement learning to automatically design heuristic information, substantially enhancing the solution quality of existing ACO algorithms. However, the existing neural ant colony optimization algorithms generate heuristic information solely based on the static features of problem instances, neglecting the temporal characteristics of partial solutions constructed by individual ants. This limitation impedes the heuristic information from effectively guiding differentiated search behaviors among ants during the exploration process, thereby compromising population diversity. Moreover, when aggregating information via graph neural networks (GNN), the existing neural ant colony optimization algorithms only process node features without integrating edge features prior to aggregation, resulting in insufficient information capture by GNN. To address these issues, the Temporal Edge Feature enhanced Neural Ant Colony Optimization (TEF-NACO) algorithm is proposed. TEF-NACO extracts temporal features of each ant via a Recurrent Neural Network (RNN) and subsequently integrates them with global graph structural information. Furthermore, during the node aggregation phase of the GNN, both node and edge features are comprehensively incorporated to enhance the network’s information capture capacity. Additionally, an edge-attention-based regularization term is introduced into the loss function to improve training stability. The experiments show that the TEF-NACO algorithm achieves the best performance in 24 combinatorial optimization tasks, with the percentages exceeding those of ACO, DeepACO and GFACS being 100%, 87.5% and 75% respectively. The average accuracy improvement is 21.5%, 3.4% and 3.2% respectively.
  • Wu Qingbo, Wu Youxin, Yu Chengyuan
    Accepted: 2025-12-03
    3D Gaussian Splatting (3DGS) has shown remarkable performance in novel view synthesis and high-precision scene reconstruction. However, its excessively high model storage overhead significantly limits its practical applications. To address this issue, a lightweight compression method is proposed to reduce the storage cost of 3DGS models and enhance rendering efficiency. First, an importance score metric based on local color differences and redundancy is introduced to identify and eliminate redundant Gaussian primitives. Furthermore, a progressive training strategy that combines Gaussian filtering and downsampling is proposed to improve the stability and efficiency of training. On this basis, a hybrid quantization scheme is applied to different properties of the Gaussian primitives to further improve the compression ratio. Finally, Morton encoding and residual encoding are utilized to compress the coordinate attributes of the Gaussian primitives, further reducing the model size. To validate the effectiveness of the proposed method, experiments were conducted on multiple real-world datasets and compared with various existing compression models. The results show that the proposed method reduces the model size by 97.8% compared to the original 3DGS, and by an additional 38.8% compared to Reduced-3DGS, while maintaining comparable rendering quality to Reduced-3DGS. It also enhances both training and rendering efficiency, demonstrating significant advantages over other existing compression models. The model achieves a good balance between compression ratio and rendering quality, providing an effective solution for advancing the practical application of 3DGS in 3D scene reconstruction.
  • Wangjing Lv, Zhaobo Qi, Xinyan Liu, Beichen Zhang, Weigang Zhang
    Accepted: 2025-12-02
    Long-term action anticipation, as a crucial task in computer vision, aims to predict the sequence of actions a person is likely to perform in the distant future based on first-person video. The main challenge of this task lies in the inherent uncertainty of future behaviors—actors in similar contexts may follow multiple plausible action trajectories, while most video samples in existing datasets typically cover only one. This limits the model’s ability to learn action diversity. Moreover, the input video segments are relatively short compared to the extended range of future prediction, further exacerbating the difficulty due to the contradiction between insufficient observations and long-range reasoning.To address these challenges, we propose a predictive framework named Vision and LLM Cooperative Network (ViLLCoNet), which is based on a cooperative mechanism between a lightweight model and a large-scale model. These two modules are responsible for predictive modeling and constraining the prediction space, respectively. The lightweight model comprises a visual encoder, a visual auxiliary information extractor, and an action predictor. It encodes the input video, extracts visual auxiliary cues, and generates the future action distribution. The visual auxiliary extractor introduces a cross-attention mechanism to capture interactions between hands and object regions by fusing hand cues and object features.The large-scale auxiliary module, built upon a large language model, identifies low-probability object nouns in the current scene and uses them to constrain the predictor of the lightweight model. By masking semantically implausible candidates in the prediction space, this mechanism improves both accuracy and plausibility of predictions. In addition, the loss function is optimized by introducing a noun temporal smoothing loss, which constrains the predicted noun distribution to exhibit temporal coherence. The proposed method is evaluated on the Ego4D and 50Salads datasets. Experimental results demonstrate that, compared with the baseline model, the proposed ViLLCoNet achieves an 8.9% improvement in noun prediction and a 4.2% improvement in verb prediction on the Ego4D dataset.
  • Li Haoxuan, Zhang Zhiyuan, Liu Rui, Xu Peihua, Tian Xin
    Accepted: 2025-11-27
    High-resolution climate data is crucial for local and regional-scale production and livelihoods, while deep learning-based downscaling techniques can effectively bridge the gap between existing low-resolution climate data and application requirements. Deep learning-based downscaling methods that can generate high-resolution climate data hold considerable significance for both local and regional production activities. However, existing methods are often constrained by fixed scaling factors, leading to high training costs in multi-scale scenarios. Meanwhile, their results in climate data are usually blurred and inaccurate in high-frequency details. To address these limitations, this study proposes a deep learning super-resolution network that fuses implicit neural representation and adaptive feature encoding for arbitrary-scale climate downscaling. In detail, the method designs the dynamic pixel feature aggregation module to dynamically adjust the feature encoding process through a learnable modulator, which can adapt to different scaling factors. Besides, the implicit neural representation for the images is designed to predict continuous-domain pixel values by fusing coordinate linear differences features and neighborhood nonlinear features via an attention mechanism. Finally, combined with a high-order degradation training strategy, experiments on the ECMMWF HRES and ERA5 datasets demonstrate that the proposed method achieves a PSNR improvement of at least 0.7 dB at ×2 scaling factor compared to fixed-ratio methods, and outperforms existing arbitrary-ratio methods by at least 0.48 dB under the same scaling condition. These quantitative results demonstrate that our approach is superior to existing methods, as it provides a more flexible and efficient solution for meteorological data processing.
  • SONG Chengqun, ZHANG Ke, YANG Mengjie, CHENG Jun
    Accepted: 2025-11-26
    To address the inefficiency and safety risks of manual patrols in large facilities and complex venues, this study aims to balance global coverage and the prioritization of high-risk areas while improving the efficiency and robustness of path planning. We propose a risk-aware Intelligent Patrol Strategy (IPS): (i) model patrol as a combination of comprehensive and single patrols; (ii) build static/dynamic risk heat map via a Gaussian Mixture Model (GMM); and (iii) design a tanh-based target-point updating method to suppress clustering and balance risk and spatial distribution. For path generation, we develop a Multi-Target Rapidly-exploring Random Tree (MT-RRT) algorithm comprising Multi-Target Feasible Path Planning (MTFPP) and Information Subset Optimization (ISO). MTFPP estimates feasible inter-point costs with an improved RRT-Connect and determines the visiting order using Ant Colony Optimization (ACO), yielding a single feasible path through all targets. ISO samples within an ellipse-shaped informed subset and applies RRT*-style rewiring to iteratively refine that path into a shorter and smoother one. Simulations show that, compared with Euclidean-distance baselines, our method significantly reduces final path length and improves success rate and convergence under limited iterations; it achieves full-area coverage while assigning higher patrol frequency to high-risk regions, making it suitable for industrial plants, hazardous-material warehouses, and large public buildings.
  • ZHANG Longyao, Wen Dongxin, MA Zhuangyu, SHU Yanjun, LI Qing, LIU Mingyi, ZUO Decheng
    Accepted: 2025-11-26
    Large Language Model-based Multi-Agent Systems have demonstrated significant potential in handling complex tasks. Their distributed nature and interaction uncertainty can lead to diverse anomalies, threatening system reliability. To systematically identify and classify such anomalies, this study conducts a comprehensive review. The research selected seven representative multi-agent systems and their corresponding datasets, collecting 13,418 operational traces, and employed a hybrid data analysis method combining preliminary LLM analysis with expert manual validation. A fine-grained, four-level anomaly classification framework was constructed, encompassing Model Understanding and Perception Anomalies, Agent Interaction Anomalies, Task Execution Anomalies, and External Environment Anomalies, and typical cases were analyzed to reveal the underlying logic and external causes of each type of anomaly. Statistical analysis indicates that Model Understanding and Perception Anomalies account for the highest proportion, with "Context Hallucination" and "Task Instruction Misunderstanding" being the primary issues. Agent Interaction Anomalies represent 16.8%, primarily caused by "Information Concealment." Task Execution Anomalies make up 27.1%, mainly characterized by "Repetitive Decision Errors." External Environment Anomalies constitute 18.3%, with "Memory Conflicts" as the predominant factor. In addition, model perception and understanding anomalies often act as root causes, triggering anomalies at other levels, highlighting the importance of enhancing the fundamental capabilities of the model. These classification and root cause analysis aims at providing theoretical support and practical reference for building highly reliable LLM-based multi-agent systems.
  • Wang Wen, Yang Kuiwu, Tong Songsong, Wei Jianghong, Xue Yan, Zhou Rongkui
    Accepted: 2025-11-26
    Model intellectual property protection has become an issue that cannot be ignored in model security. Watermarking technology, as the core means of model traceability, provides technical support for copyright verification by embedding special identifiers into model parameters or generated content. However, the trained watermarked models are very easy to be copied and spread, which enables attackers to destroy or remove the watermarks embedded in DNN models through specific technical means such as fine-tuning, pruning, or adversarial sample attacks, making it impossible to verify the model ownership. To gain a deeper understanding of model watermarking attack methods, this paper first introduces model watermarking attacks, then classifies the model watermarking attack methods into two categories: white-box watermarking attacks and black-box watermarking attacks, based on the attacker's access rights and information acquisition capabilities to the target model. It also sorts out and analyzes the motives, hazards, attack principles, and specific implementation methods of DNN model watermarking attacks. Meanwhile, it compares and summarizes the existing research on model watermarking attacks from the aspects of attacker capabilities and performance impacts. Finally, it further explores the potential positive role of neural network model watermarking attacks in future research and provides suggestions for in-depth research in the fields of model security and intellectual property protection.
  • ZHANG Junna, WANG Hongzun, DING Chuntao
    Accepted: 2025-11-25
    Post-Training Quantization (PTQ) is an efficient model compression method that converts the parameters of high-precision floating-point models into low-bit integer representations without the need for retraining, using only a small amount (or no) unlabeled calibration data. This method significantly reduces storage and computational overhead while maximizing the retention of the original model's inference accuracy, making it widely recognized and adopted in both academia and industry. This paper systematically summarizes the research progress of PTQ from four dimensions: quantization steps, method classification, tool ecosystem, and application advancements.First, a clear framework for the quantization process is constructed, covering steps such as dynamic range statistics, quantization parameter calculation, weight and activation quantization, error optimization, and model generation. Second, a complete classification system for quantization methods is proposed, which includes quantization granularity, bit width, calibration methods, and structure-guided quantization. Third, the tool ecosystem supporting the large-scale application of PTQ is analyzed, discussing its value in hardware adaptation and engineering deployment. Finally, the paper summarizes the integration and application progress of PTQ methods and highlights the challenges faced in practice, especially those related to cross-modal consistency, extremely low-bit semantic collapse, and hardware adaptation. These practical challenges not only reveal the limitations of current technologies but also provide important directions for future research. This review provides a reference framework for PTQ methods for both academia and industry, facilitating the widespread application of artificial intelligence in resource-constrained scenarios.
  • ZHANG Ke, CHEN Jiahao
    Accepted: 2025-11-21
    Multi-Hop Graph Convolutional Network (Multi-Hop GCN) has achieved certain results in alleviating the over-compression problem. However, the multi-hop propagation design has specific parametric information compression loss during the information aggregation process and is sensitive to the local topological structure, which makes it difficult for this type of model to achieve an ideal prediction effect when performing node classification tasks. To address the above problems, this paper starts from the intra-layer and inter-layer perspectives of the multi-hop graph convolutional model, and uses a decoupling-based technique inspired by predictive propagation decoupling and a knowledge jump module to solve the above issues, thereby constructing a new type of multi-hop graph convolutional network—the Knowledge-Semi-Decoupled Multi-Hop Network DrJK-Net. Firstly, a semi-decoupling technique that retains the activation function is proposed to simplify the intra-layer structure of the multi-hop propagation layer. By removing the linear layer in the hidden layer, the number of feature changes during the multi-hop propagation process is reduced, and the parametric information compression loss is decreased. Then, a knowledge jump connection is added between the propagation layers. By connecting all hidden layer embeddings, the model's adaptive selection ability of hidden layer embeddings is improved, and the sensitivity to the local topological structure is reduced. Subsequently, the multi-hop graph convolutional skeleton is combined with the semi-decoupling technique for simplifying intra-layer information propagation and the knowledge jump connection module for establishing inter-layer information channels, proposing a model framework DrJK-Net with lower parametric information compression loss and stronger adaptability to the local topological structure. Finally, comparative experiments and ablation experiments are carried out on multiple public paper networks such as Citeseer, CoraFull, and Actor, as well as social network datasets. The results of the comparative experiments show that DrJK-Net surpasses most cutting-edge models in node classification accuracy and has a significant advantage in running speed. The results of the ablation experiments further verify the effectiveness of the proposed semi-decoupling technique and the introduced knowledge jump connection mechanism, providing new ideas and methods for the development of multi-hop graph convolutional networks.
  • NIU Yan, SUN Yang, LI Jun
    Accepted: 2025-11-21
    Multimodal emotion recognition aims to understand complex human emotion expressions, however, existing methods generally face the challenges of insufficient accuracy and robustness when dealing with nuances of emotion expressions and complex inter-modal interactions. Specifically, traditional speech feature extraction methods are difficult to comprehensively capture emotion information across multiple time scales, and existing fusion strategies are limited in their efficiency in integrating complementary information and dealing with complex inter-modal associations, while category imbalance and boundary sample problems often lead to degradation of model performance. Aiming at the above problems, this paper proposes a new method for multimodal emotion recognition using speech and facial images. The method firstly introduces a multiscale attention mechanism in the speech feature extraction stage, replacing the traditional multilayer perceptron, which can adaptively focus and capture the emotion features from microscopic phoneme changes to macroscopic rhythmic patterns, and realize a more comprehensive emotion information extraction; secondly, a adaptive multi-expert collaborated decision making architecture is designed, which can be used to recognize the emotion information through expert networks and an adaptive multimodal expert coordination network. Adaptive Multimodal Expert Coordination Network, which efficiently integrates complementary information of different modalities and handles complex interactions between modalities; finally, a boundary
  • Guo Wei, Meng Qiaoqiao, Jin Haibo, Tian Congcong
    Accepted: 2025-11-20
    In the field of industrial quality inspection, there are common problems in the detection of steel surface defects, such as insufficient fusion of target features, missed detection of fine edge defects, and unbalanced sample classification. Therefore, a steel surface defect detection algorithm based on multi-scale interaction and dynamic collaboration is proposed. In the backbone network, by fusing the shifted sparse convolution and inverted residual structure, the interactive fusion of defect features under different receptive fields is strengthened, and the feature expression ability of multi-scale defects is improved. Introduce the large separation kernel attention mechanism to dynamically enhance the feature response to fine defect areas and reduce the missed detection rate of cracks and inclusions. In the neck network, by combining the DySample dynamic upsampling strategy, dynamic upsampling based on defect content is achieved, which not only improves the clarity of the defect contour of small targets but also reduces computational redundancy, adapting to the deployment of edge devices. In addition, an EMASlideLoss loss function integrating exponential moving average and sliding threshold mechanisms is designed to dynamically balance the learning weights of difficult and easy samples, thereby improving the detection deviation caused by the uneven distribution of defect samples. Experiments on the NEU-DET dataset show that the mean mAP50% of the average accuracy of this algorithm reaches 84.4%, which is 5.8% higher than that of the original YOLO11n. While the precision and recall rates increase by 5.2% and 4.8% respectively, the computational load decreases by 8%. This algorithm not only optimizes the computational efficiency but also improves the detection accuracy, and is more capable of meeting the detection requirements in industrial scenarios.
  • LIU Ying, ZHANG Runyu , YANG Chaoshu
    Accepted: 2025-11-20
    The Log-Structured Merge tree (LSM-tree) has been widely adopted in key-value storage systems due to its high write performance enabled by sequential write operations. However, it also suffers from issues such as high read/write amplification, significant compaction overhead, and data redundancy. Traditional optimization approaches aim to improve system performance by modifying tree structures, refining compaction strategies, and adopting key-value separation mechanisms. In the era of big data, the rapid growth of data volume leads to increasingly frequent write and compaction operations in LSM-tree systems, placing continuous pressure on CPU computing resources and gradually turning them into performance bottlenecks. Moreover, traditional solutions fail to fundamentally avoid the substantial I/O traffic between the host and storage devices, resulting in high overhead due to redundant data movement. Computational storage technology offers a promising solution to these challenges. By integrating computing resources at the storage layer, it enables task offloading to alleviate the CPU's workload and supports near-data processing to reduce the performance overhead caused by data migration. This survey focuses on optimization strategies for LSM-tree based on computational storage. First, the architecture of computational storage is reviewed. Then, in response to the major bottlenecks under the big data context, existing solutions are classified and compared from two perspectives: compaction optimization and data migration optimization. Finally, potential future research directions are suggested to provide insights in this field.
  • Gong tong , Lu Xiaoli, Sang yu, Li Siman, Yu Bowen
    Accepted: 2025-11-19
    Nighttime object detection presents significant challenges due to the low luminance of targets and the high cost of manually annotating large-scale nighttime datasets, making supervised training difficult. To address these issues, a domain adaptation method DTN-DETR for object detection tailored to nighttime imagery based on improved RT-DETR is proposed. First, a Photometric Consistency Matching is designed to generate a synthetic dataset resembling the nighttime domain by aligning the photometric properties of the daytime source domain with the nighttime target domain. Second, a backbone network improved Bidomain Refinement Module (BRM) is proposed, which comprises two key components: the Feature Refinement Module (FRM) and the Bidomain Information Interaction (BII) module. The FRM eliminates redundant information in the feature channels. The BII module leverages the interaction between the frequency and spatial domains to handle glare and noise with inconsistent frequency characteristics, addressing the coupling phenomena of multiple local light sources in nighttime scenes. Finally, a P2 detection head is introduced, which enhances the perception of small objects in nighttime scenes through multi-level feature fusion. Experimental results on the public datasets BDD100K, SODA10M and Foggy Cityscapes demonstrate that the proposed method significantly outperforms existing state-of-the-art approaches in object detection tasks, validating its effectiveness and robustness.
  • TAN Taizhe, YANG Yang, ZHAN Yinwei, YANG Zhuo
    Accepted: 2025-11-14
    The complex lighting environment underground in coal mines leads to low contrast and blurry details in images. Existing image enhancement algorithms have insufficient feature capture capabilities and inefficient fusion methods for semantic features at different levels. This paper proposes an underground coal mine image enhancement method (ICM) that combines convolution and MLLA (Mamba Like Linear Attention). In the convolution stage, multiple mixed expert modules with degradation perception are stacked to enable the model to adaptively restore local texture details lost during image enhancement, solving the problems of artifacts and unclear detail features. Using an MLLA module with background perception capability to model long-term dependencies in images to improve the global structural consistency and texture fidelity of output enhanced images. Introducing interactive fusion branches to encode the stage correlation between backbone features and reconstructed features, effectively utilizing local and global features to assist in image enhancement. The segmented loss function sets different loss objectives at different enhancement stages, enabling the network to adaptively optimize at each stage. Compared with recently excellent deep learning methods, the ICM method shows the best performance in evaluation metrics PSNR, SSIM, NIQE, and LPIPS, with values of 30.524dB, 0.946, 3.06, and 0.23, respectively. It can effectively improve the brightness, contrast, and clarity of low light images in coal mines, providing reliable visual support for mine safety monitoring and intelligent decision-making.
  • Jie DUAN, Lijuan SONG, Zirui MA
    Accepted: 2025-11-13
    Deep learning–based survival prediction has advanced the integration of whole-slide images (WSI) and genomics, yet the ultra–high resolution of WSIs and the high dimensionality of transcriptomics pose substantial challenges for feature extraction and cross-modal fusion. Although prototype aggregation reduces computational burden by compressing tiles and gene expressions into morphological and pathway prototypes, two key bottlenecks remain: accurately capturing fine-grained interactions between modality-specific prototypes, and addressing the pronounced representational heterogeneity between WSI morphological prototypes and genomic pathway prototypes. To tackle these issues, we propose a weakly supervised survival prediction model based on multi-level optimal transport (MOTSurv), comprising three synergistic innovations: first, a dual-modality prototype encoder—integrating a Pyramid Position Encoding Generator (PPEG) in the pathology encoder and modeling intra-pathway dependencies in the pathways encoder—to strengthen intra-modality structure while preserving modality specificity; second, a cascaded multi-level optimal transport fusion mechanism that performs coarse global alignment followed by refined matching with error correction, balancing alignment accuracy and information preservation; and third, an Orthogonal Disentanglement Module (ODM) that enforces multi-level constraints—inter-modal specificity orthogonality, intra-modal specificity–shared orthogonality, and global specificity–shared orthogonality—to achieve explicit feature disentanglement and enhance interpretability. Experiments on the TCGA BLCA, BRCA, and LUAD datasets demonstrate that MOTSurv improves C-index by an average of 4.22% over state-of-the-art methods. Ablation studies further validate the independent and synergistic contributions of each module, highlighting the model’s comprehensive advantages in multimodal alignment, structured representation, and biological interpretability.
  • WANG Zeyu , JI Genlin, ZHU Wei
    Accepted: 2025-11-13
    Zero-shot skeleton-based action recognition uses text label descriptions and skeleton action sequences to distinguish visible and unseen categories of actions. Existing methods are usually limited by the problem of low generation quality in visual feature, so we cannot accurately align semantic, resulting in poor performance in identifying similar actions. To address this issue, this paper proposes a method based on dual discriminators and spatiotemporal self-calibration (DD-STSC) to explore visual semantic alignment. This method combines variational autoencoders and generative adversarial networks, using discriminators and generators for adversarial training to mine the differential information among different features. At the same time, it better separates useful information from useless information during disentanglement, thereby further improving the quality of generated samples. In addition, this paper introduces action self- calibration module(ASCM). By learning the skeleton information in the spatiotemporal direction, the required key motion information can be obtained more effectively, so as to improve the accuracy of classification tasks. Experiments on several widely available datasets NTU60, NTU120, and pku51 demonstrate that the proposed method outperforms the existing mainstream methods.
  • XU Haizhe, HUANG Lingxiao, YAO Xinbo, GAO Yongzhan, ZHOU Kaiyuan
    Accepted: 2025-11-13
    The study addresses the critical challenges in weakly supervised semantic segmentation (WSSS) based on contrastive language-image pre-training (CLIP), such as inadequate fine-grained semantic alignment of images, limited perception of local details in text context, and insufficient local detail perception along with noise propagation in pseudo-label images. To tackle these issues, we propose the Feature Fusion Contrastive Learning framework (FFCLIP), a novel architecture that leverages a frozen CLIP model as the backbone and integrates three innovative modules—Panoramic Perception Attention (PPA), Rectangular Calibration Module (RCM), and Weighted Cross-modal Fusion (WFF)—to effectively enhance cross-modal semantic alignment, refine local boundary perception, and improve the quality of generated pseudo-labels. The multi-stage weakly supervised semantic segmentation training framework based on the CLIP backbone network achieved mIoU scores of 76.9% and 77.5% on the VOC2012 validation and test sets, respectively, surpassing the mainstream method CTI by 2.8% and 4.3%. On the COCO2014 dataset, it attains an mIoU of 47.1%, significantly outperforming baseline models like CPAL. Experimental results demonstrate that FFCLIP substantially enhances semantic segmentation accuracy under weak supervision while maintaining low computational overhead, with only 6M additional parameters and a peak GPU memory consumption of 6.2GB, thereby offering a novel direction for integrating multi-modal learning with weakly supervised segmentation. Code link: https://github.com/xuwudang/FFCLIP
  • SU Na, PEI Houqing, XU Li , WANG Jingjun , JI Shujuan
    Accepted: 2025-11-11
    Existing log anomaly detection techniques often neglect temporal contextual information in semantic modeling, exhibit insufficient modality fusion capabilities, and generally over-rely on log parsing. These limitations make it difficult for models to capture complex patterns where sudden semantic content changes coexist with temporal behavioral anomalies. To address these challenges, this paper proposes a model that operates without log parsing (Log Spatio-Temporal Fusion, LogSTF). This model employs a dual-branch architecture for semantic and temporal processing. The semantic branch extracts context-aware semantic features, while the temporal branch models both local bursts and global evolution through dual-granularity at temporal and sequence levels. Building upon this foundation, bidirectional cross-attention achieves modal fusion, explicitly establishing fine-grained dependencies between semantics and time. This enhances the model’s ability to represent and discern complex log behaviors. Experiments conducted on three public log datasets—HDFS, BGL, and Thunderbird— results show LogSTF achieves F1 scores of 99.64%, 98.45%, and 99.67% respectively across the three datasets. Compared to the two state-of-the-art models LAnoBERT and LogFormer, LogSTF demonstrates average relative F1 improvements of 5.20% and 2.03%. Ablation experiments validate the critical role of temporal information and modality collaboration in performance enhancement. Robustness testing under lightweight semantic perturbations validated LogSTF’s stability and generalization capabilities under suboptimal log conditions. This approach achieves high-precision detection of complex anomaly patterns without requiring log parsing.
  • Li Xu, Luo Dezhe, Wang Hongjun
    Accepted: 2025-11-10
    With the rapid development of global maritime transportation, ship trajectory prediction plays an important role in shipping safety and management. However, achieving high-precision and physically feasible continuous trajectory prediction remains a key challenge due to the large-scale ship trajectory data and the uncertainty of complex maritime environments. Traditional prediction methods have limitations in handling complex maritime environments and large-scale dynamic data. To address these challenges, this paper proposes a geographically constrained multi-method fusion ship trajectory prediction model. The model introduces a geographical constraint loss function to optimize the accuracy, heading stability, and physical feasibility of trajectory predictions. Additionally, a multi-method fusion network structure is designed, incorporating bidirectional gated recurrent units, attention mechanisms, and multi-scale convolutions, which enhances the ability to extract temporal features and integrate multi-scale information. Experimental results demonstrate that the proposed model achieves lower prediction errors across multiple maritime datasets, with particularly significant advantages in long-term predictions compared to existing models. The study confirms that this model offers high accuracy and stability in ship trajectory prediction, providing effective support for practical applications in the maritime field.
  • YAO Xun, HE Yuan, HU Xinrong, YANG Jie
    Accepted: 2025-11-10
    Sequential recommender systems excel at capturing users' dynamic interests, yet their open nature makes them highly vulnerable to data poisoning attacks. Attackers can effectively manipulate recommendation outcomes by altering the textual descriptions of items, posing a severe challenge to model robustness. Existing defense strategies, which primarily rely on static rules or fixed-intensity perturbations, struggle to counter the growing complexity and variability of semantic-level textual attacks.To address this challenge, we propose RADAR, a two-stage collaborative defense framework. This framework synergizes robustness enhancement at the training stage with real-time protection at the inference stage. First, during training, it employs dynamic adversarial training to bolster the model's intrinsic resilience against unknown textual perturbations. Second, at inference, it leverages a Large Language Model (LLM) for precise semantic-level anomaly detection and content restoration.Experimental results demonstrate the superior defense performance of RADAR. In attack tests on the Scientific dataset, compared to the strongest baseline model(Cert-LLM), RADAR reduces the exposure increase of malicious items from 3.1796% to just 0.9921%. This powerfully validates the framework's effectiveness in enhancing the security and robustness of sequential recommender systems.
  • GUO Yang, SUN Jing-yu
    Accepted: 2025-11-07
    With the development of quantum computing technology, traditional image encryption algorithms are facing the challenge of insufficient quantum attack resistance, while existing quantum image encryption algorithms have problems such as high quantum bit consumption and limited parameter space of chaotic systems. To address the above unsolved problems, this paper proposes a dual-quantum image encryption algorithm based on a chaotic system, aiming to achieve a balance between low resource consumption and high security. Firstly, a dual-bit-plane quantum image representation model (DBRQI) is proposed, which only requires 2n+4 quantum bits to store a grayscale image, reducing quantum bit consumption by 50% compared with the BRQI model. Secondly, a 3D hyperchaotic system (3D-CHCMM) is constructed: the parameter space of its 4 control parameters is increased by 33% compared with existing systems, and its 3 Lyapunov exponents are all positive. Moreover, the system has passed 15 NIST tests, enabling it to generate pseudorandom sequences with high randomness. The algorithm maps quantum states through DBRQI, scrambles pixel information via odd-even bit-plane scrambling and random row-column scrambling, and then performs an XOR operation with the pseudorandom sequences to generate ciphertext. Experimental results show that the horizontal correlation of the encrypted image is as low as 0.0041, the information entropy reaches 7.9993, and the NPCR is 99.6251%, indicating that the algorithm’s attack resistance and anti-interference capability are significantly enhanced. The algorithm in this paper provides an efficient solution for image encryption in current scenarios with limited quantum hardware.
  • Zhang Yao, Zhang Junsan, Ma Junpeng, Yao Zongquan, Liu Tianyi
    Accepted: 2025-11-07
    This paper proposes an improved YOLOv8-based model named CAFR-YOLO to address the issues of insufficient cross-level feature interaction and limited feature representation capability in multi-scale object detection under complex scenes. First, a novel cross-scale feature reorganization pipeline was designed, constructing the Channel Attention-guided Feature Reorganization (CAFR) module. By using a specific layer as the fusion backbone and incorporating scale alignment, attention-weighted fusion, and feature subset splicing strategies, it alleviates insufficient cross-level interaction in traditional feature pyramid structures. Secondly, at the local level, the method introduces the C2f_DCNv3 module into the backbone network, significantly enhancing the model's geometric adaptability by exploiting the dynamic sampling characteristics of deformable convolution. From a global perspective, the C2f_SAConv module is constructed by combining Switchable Atrous Convolution (SAC) with the C2f module, optimizing multi-scale semantic feature fusion through dynamic atrous rate adjustment. These two approaches enhance the model's robustness to complex scenes. Finally, SPDConv replaces traditional convolution structures, strengthening feature representation through spatial-channel reorganization while reducing computational complexity. Experimental results demonstrate that CAFR-YOLO achieves 86.3% mAP@0.5 and 67.2% mAP@0.5:0.95 on the PASCAL VOC dataset with comparable computational costs to the original model. On the MS COCO dataset, it improves mAP@0.5 and mAP@0.5:0.95 by 3.5% and 3.9%, respectively. Compared to existing state-of-the-art methods, CAFR-YOLO exhibits significant advantages across multiple metrics. The proposed CAFR-YOLO model substantially enhances multi-scale object detection accuracy and robustness while maintaining computational efficiency, providing a novel solution for real-time object detection tasks.
  • TIAN Hongpeng, LI Zhiqiang, YANG Sai
    Accepted: 2025-11-05
    In lightweight small UAV image object detection tasks,there are common challenges such as low detection accuracy, complex backgrounds, large variations in target scale, dense target distribution, and a relatively large number of model parameters. Therefore, this paper proposes a novel improved RT-DETR object UAV object detection algorithm. First, an enhanced C2f-Heat-Lsk module is developed through integrating the HeatBlock thermal conduction module and LskBlock spatial selective attention mechanism into the C2f structure. This modified module collaborates with the original C2f module to redesign the RT-DETR backbone network, which improves spatial feature extraction while reducing model parameters Second, a novel feature fusion structure SOFEP replaces the original feature pyramid to mitigate detail loss in small objects and enhance their feature representation. Third, a combined Focaler-MPDIoU loss function is constructed by integrating Focaler-IoU and MPDIoU loss mechanisms, which improves bounding box regression accuracy and reduces miss detection rates. Experimental results on the VisDrone test set show that the improved model reduces parameter count by 16.9% compared to RT-DETR, while achieving improvements of 2.6% in mAP0.5 and 1.9% in mAP0.5:0.9. The model also outperforms RT-DETR on the DOTAv1.0 and HIT-UAV datasets. These advancements demonstrate that the proposed method achieves higher detection accuracy with reduced computational complexity, effectively meeting the requirements for small object detection in UAV aerial images.
  • LiangShichao, WenWen, FengYali, ZhengJiabi, HaoZhifeng
    Accepted: 2025-11-05
    How to model and learn user’s behavior patterns is a crucial issue in temporal recommendation. However, the majority of existing research primarily centers on pattern learning within a single type of behavior. This limitation restricts the ability to take full advantage of the user's diversified behavior patterns revealed by various types of behaviors, such as clicking, purchasing, marking as favorite, and so on. As a result, the potential for enhancing recommendation performance remains underexplored. To address this gap, this research delves into the multi-seasonal sequential dependencies of individual behaviors and the intricate dependencies among different types of behaviors over time. Specifically, we propose a novel model, named multi-seasonal multi-behavior (MSMB) model, for learning temporal patterns across multiple behaviors. In the proposed model, a dual-channel sequence encoder is employed, which incorporates a multi-scale exponential moving average (EMA) mechanism to effectively capture the multi-seasonal temporal dependencies within individual behavior sequences. Additionally, a cross-behavior dependency module is introduced to account for different periodic granularities, thereby enabling the model to effectively capture the time-variant dependencies across various types of behaviors. Extensive experiments conducted on three benchmark datasets demonstrate the effectiveness and superiority of the proposed MSMB model in enhancing temporal recommendation performance.
  • CHEN Haozhi, CAI Ruichu, LI Zijian, HAO Zhifeng
    Accepted: 2025-11-05
    Time series segmentation, an important task in time series analysis, has been widely applied in fields such as biological behavior analysis and physical system analysis. However, most existing time series segmentation methods fail to account for the nonstationary dynamics of time series induced by distribution shifts, thereby limiting their ability to achieve accurate segmentation in nonstationary regimes. To solve this problem, this paper first proposes a data causal generation process hypothesis based on real-world scenarios. Under this hypothesis, the latent variables underlying the observed data can be decomposed into stationary and non-stationary latent variables. Here, the stationary variables represent information that is unchanged or changes periodically, while the nonstationary variables represent dynamically changing information. Secondly, based on this causal generation process hypothesis, a Stationary Nonstationary Disentangle Model (SNDM) is designed. This model disentangles stationary and nonstationary variables, thus enabling enhanced focus on non-stationary dependencies in the time series. Moreover, in order to accurately disentangle and extract variables, the evidence lower bound (ELBO) of variational inference is used to construct the loss function of the model. Leveraging this ELBO, this study introduces stationary and nonstationary prior neural network modules to improve latent variable disentanglement accuracy. Finally, through experiments, we validate that our model outperforms several state-of-the-art time series segmentation methods on various benchmark datasets, thereby highlighting its advantages in practical scenarios.
  • Zhao Weiyue, Wu Jingya, Lu Wenyan, Li Xiaowei and Yan Guihai
    Accepted: 2025-11-05
    Emerging applications in datacenters have introduced a significant amount of large-granularity RDMA communication requirements. RDMA relies on physical addresses, and, when accessing large-granularity data, the Page Table Entries (PTEs) required for address translation exceed the cache capacity of hardware devices. Current high-performance commercial solutions store PTEs in the host memory. However, this architecture requires large-granularity communication to be executed only after fetching the PTEs from the host memory, which introduces PCIe traversal and host memory access latency, severely degrading address translation efficiency and increasing host CPU overhead. To achieve efficient large-granularity RDMA, this paper designs a configurable high-performance address mapping structure: XiRang. XiRang efficiently extends the access granularity through a streaming prefetch mechanism and a hierarchical cache design, and implements flexible and high-throughput address translation performance through a configurable address translation array. The XiRang prototype is implemented based on a DPU. Experiments show that: 1) XiRang effectively offloads the address translation load of the RDMA data plane, decoupling it from the host CPU; 2) The streaming prefetch extension mechanism used by XiRang effectively reduces storage overhead, with cache consumption at only the 10-byte level under concurrent modes, and concurrent storage overhead being negligible; 3) Under a high number of concurrent memory access requests, XiRang maintains a translation table entry query hit rate close to 100%, reducing the idle time of the translation engine by 2 to 3 orders of magnitude compared to the RNIC architecture; 4) The translation throughput of XiRang is more than 60 times that of the RNIC translation architecture and more than 3.5 times that of the basic DPU address mapping structure; 5) In performance enhancement mode, XiRang's address translation speed can support a data transfer bandwidth of 1.4 TB/s.
  • Jia Xinglong, Qin Junping, Yan Kai, Liu Zheng, Wang Dan, Shao Xinran, Shao Zezhou
    Accepted: 2025-11-05
    In order to solve the problem of insufficient accuracy in identifying endangered animals in complex backgrounds in the wild, this study improved the YOLOv8 model. First, the Dynamic Snake Convolution (DSConv) was introduced in the backbone network to enhance the detection performance of the model under occlusion. Secondly, the global attention mechanism (GAM) was introduced in the neck network to improved the model's attention to information related to endangered animals, suppress irrelevant features such as the environment, and reduce redundant information. Then, a small target detection head was designed in the head network to fuse shallow feature maps to improved the network's perception and positioning capabilities for small targets. Finally, the bounding box loss regression function based on the minimum point distance (MPDIoU) was used to replace the traditional CIoU algorithm, thereby improving the convergence speed and positioning accuracy of the algorithm. The experimental results show that the detection accuracy and average precision of the proposed model for endangered animals in complex backgrounds are 96.2% and 97.2%, respectively, which are 2.1 and 2.4 percentage points higher than the basic YOLOv8n detection accuracy and average precision, respectively. Using the same data set to conduct comparative experiments on different target detection models, the average precision is increased by 28.7, 22.5, 3.5, and 2.4 percentage points compared with Faster-RCNN, SSD, YOLOv5, YOLOv7 and other models, respectively. The experiment proves that the improved YOLOv8 model can provide a theoretical basis for the detection of endangered animals in complex backgrounds.
  • WU Shixun, TANG Peiyao, LAN Zhangli, Xu Kai, ZHANG Miao
    Accepted: 2025-11-04
    WiFi fingerprint positioning based on received signal strength indication (RSSI) has gained wide attention due to its ease of deployment and cost-effectiveness. However, existing fingerprinting methods typically rely on large-scale training data, while data augmentation often produces virtual samples of uneven quality, thereby limiting positioning accuracy and generalization. To address these issues, this study proposes a multi-parameter optimization WiFi fingerprinting method driven by few-shot learning (FSL). The method integrates an attention-enhanced convolutional neural network (CNN) with a meta-learning framework to enable rapid adaptation under limited data, while particle swarm optimization (PSO) is employed for automated data selection and joint hyperparameter tuning under physical constraints. Experimental results demonstrate that the proposed method achieves average positioning errors of 0.52 m on the CJU dataset and 6.88 m on the public Tampere dataset, improving accuracy by at least 49.5% and 8.7% compared with baseline methods. In addition, a generalization test on the CJU-2024 dataset shows that the model adapts effectively to new environments with only a small amount of data, achieving an average positioning error of 2.17 m and an accuracy improvement of at least 26.7%. These results confirm that the proposed method significantly improves indoor positioning accuracy while maintaining strong generalization capability.
  • YANG Yingying , CHE Jin , BAI Xuebing, XIAO Long, JIAN Liqiong
    Accepted: 2025-11-04
    Existing unsupervised person Re-ID methods focus only on pedestrians’ global features, causing global feature bias and insufficient data diversity that impair recognition accuracy.To address this, this paper proposes an innovative ViT-based method(DAFP) integrating Multi-level Data Augmentation (MDAM) and Feature Purification (FP). Firstly, the MDAM—including geometric spatial transformations, appearance feature perturbations, and occlusion simulation—expands training sample diversity and enhances the model’s cross-camera robustness. Additionally, the FP module divides the local features output by the Transformer into upper and lower parts according to spatial positions, performs adaptive weighted fusion with global features via a multi-view distance matrix, and generates high-quality pseudo-labels with DBSCAN, effectively alleviating similar pedestrian misclustering caused by over-reliance on single global features in traditional methods. Finally, a global-local clustering contrastive loss dynamically updates global and local clustering centers to strengthen fine-grained feature learning. Experimental results on Market1501, DukeMTMC-reID, and MSMT17 show that its mAP/Rank-1 reaches 90.5%/96.0%, 77.6%/87.6%, and 64.5%/86.0%, respectively, significantly surpassing the current state-of-the-art methods and fully verifying the superior performance of this method.
  • Tang Weilin, Wang Junfeng, Ge Wenhan, Zhang Chengcheng, Zhan Weilu
    Accepted: 2025-11-04
    Cyber Threat Intelligence (CTI) plays a pivotal role in mitigating the asymmetry between cyber attacks and defenses. However, current extraction methods for Tactics, Techniques, and Procedures (TTPs) predominantly rely on supervised language models with manual annotation, which suffer from inefficiency and inconsistency issues. Although the MITRE ATT&CK framework has mitigated TTP description problems through standardized classification, existing NLP-based approaches still face three major challenges: insufficient generalization capabilities, delayed version adaptation, and poor interpretability. To address this, DetecTTive is proposed—a zero-shot learning-based TTP extraction method for large language models that combines the prior knowledge of large language models with external trustworthy knowledge. This framework innovatively utilizes the ATT&CK official knowledge base as an external knowledge source, combining vector-based semantic retrieval and graph-enhanced association reasoning, along with agent workflow to achieve automated white-box reasoning. This enhances zero-shot performance while ensuring result traceability. Experiments demonstrate that the proposed zero-shot approach achieves an F1 score of 80.02% and a recall of 83.46% in benchmark datasets. This method effectively addresses the data bias and version adaptation issues inherent in conventional models, providing an interpretable and cost-efficient solution for TTP extraction in dynamic threat environments.
  • Fan Qinlong, Sun Yepeng, Lu Jicang, Zhu Taojie and Liu Yilin
    Accepted: 2025-11-04
    With the popularization and development of the internet, the massive volume of user-generated comments on trending topics and their widespread dissemination profoundly influence the progression and development of real-world events. Consequently, mining public stances and attitudes toward trending topics holds significant practical value for domains such as online public opinion monitoring and social security governance. Stance detection technology aims to identify user attitudes toward specific targets from user-generated texts. Although numerous studies have proposed diverse task scenarios and technical methodologies, a unified classification framework for stance detection tasks remains elusive. First, this paper presents a comprehensive review of stance detection tasks from two dimensions: task scenarios and technical methodologies, systematically organizing the current research landscape and development trends. From the task scenario perspective, we classify stance detection into three paradigms: target-specific, target transfer, and target generalization, highlighting the field's evolution from domain-specific applications toward broader adaptability. From the methodological perspective, we categorize stance detection approaches into three primary classes: model-based engineering, knowledge-driven engineering, and data-centric engineering, analyzing the strengths and limitations of each. Additionally, we conduct statistical and experimental analyses of publicly available resources across multiple dimensions, revealing key characteristics and developmental trajectories of these benchmark datasets. Finally, the paper concludes with a summary and outlines prospective research directions and persistent challenges.
  • Wu Qiannan, Ding Weiping, Fan Xiaoxue, Ju Hongrong, Zhou Linlin, Wang Jing
    Accepted: 2025-10-31
    Feature selection can effectively identify informative features from complex data to improve information processing efficiency. However, in partially labeled data scenarios, traditional feature selection methods face significant challenges due to inherent label ambiguity, complex inter-sample relationships, and difficulties in feature importance evaluation. To address these challenges, this paper proposes MFG-FS, an effective feature selection framework for partially labeled datasets. First, to tackle label ambiguity, we design an end-to-end disambiguation method based on the MLP-Mixer model and contrastive learning, which optimizes the feature representation space to enhance discriminative power and obtain more reliable label confidence distributions. Second, to accurately characterize complex sample relationships in partially labeled data, we construct fuzzy similarity relations and information granules that integrate multi-source information, effectively combining local feature-space structures, global correlations from disambiguated labels, and label constraints. Subsequently, based on the constructed fuzzy information granules, we define and employ a fuzzy mutual information measure for feature evaluation, which quantifies the relevance between feature subsets and labels while assessing internal redundancy, thereby providing a robust basis for high-quality feature subset selection. Finally, extensive experiments on five synthetic and four real-world datasets demonstrate that MFG-FS can select more discriminative and robust feature subsets, achieving superior performance in partial label disambiguation and classification accuracy.
  • HUANG Yuqi, YANG Xiaoxia, YANG Ronghao , LIAO Fangzhou, YAN Le, GUO Junqiang, LI Minghan
    Accepted: 2025-10-30
    Object detection for autonomous driving perception aims to locate and identify traffic participants such as motor vehicles, non-motor vehicles, and pedestrians within onboard camera views in real time, providing accurate input for the environmental perception module to support decision-making and control in autonomous driving systems. The perception system suffers from false and missed detection rates due to complex road backgrounds, diverse object shapes, and large scale variations. Specific challenges include low accuracy in detecting deformed objects, insufficient multi-scale detection, and weak global perception. To address these issues, an improved algorithm named YOLOv8-DDL based on YOLOv8n is proposed. First, deformable attention is introduced to improve the C2f module in the backbone network, which dynamically learns feature offsets to enhance the capture capability for various object shapes in traffic scenes, improving the model's adaptability to complex spatial distributions and effectively reducing false detections. Second, large separable kernel attention is integrated to enhance the spatial pyramid pooling fast module, expanding the receptive field through large-kernel convolution to strengthen global context modeling and robustness in complex backgrounds. Finally, a dynamic multi-scale adaptive fusion module and a dynamic feature pyramid network are designed to reconstruct the neck network, dynamically fusing high-level and low-level features to enhance multi-scale feature representation and improve multi-scale object detection performance. Experimental results on the public SODA10M dataset show that compared to YOLOv8n, YOLOv8-DDL improves precision, recall, F1-score, and mean average precision by 5.9%, 1.3%, 3%, and 1.5%, respectively. Additional validation on the public BDD100K dataset confirms improvements of 2%, 0.6%, 1%, and 2% in these metrics, respectively.
  • CHEN Junhong, ZHOU Feng, TIAN Youliang, YANG Kedi, ZHANG Qijia
    Accepted: 2025-10-29
    As the demand for data training across industries increases, data has become a key factor of production. Data rights confirmation can clarify data ownership and allocate benefits, preventing unauthorized use. However, the existing schemes have problem such as uncontrollable rights and low efficiency of rights confirmation in rights collection, storage and use. To address in these challenges, this paper proposes a trapdoor hash-based data confirmation scheme for rights-controllable. First, in order to prevent the loss of data right during data transfer, this paper constructs a right confirmation model with the separation of holding, management, and usage rights, thus achieving a refined allocation of rights. Second, Aiming at the problem of uncontrollable generation of management rights of existing correlation algorithms, a data confirmation algorithms based on trapdoor hash is proposed, which realizes controllable generation of data management rights with changes and improves the efficiency of correlation at the same time. In addition, combined with blockchain technology, this paper designs a data transaction mechanism for authorization-traceable, which realizes the non-repudiation and traceability of data transactions by finely controlling the collection and access of data and uploading the corroboration information. Finally, through the security analysis and performance analysis, it is concluded that compared with the traditional scheme, the proposed scheme has advantages in terms of computation and storage overhead while ensuring that the rights signatures cannot be forged.
  • Gao Jianwei, Zhao Shutong, Huang Ningbo
    Accepted: 2025-10-28
    Under the background of rapid development of artificial intelligence, a group intelligent emergency decision-making method based on large language model and retrieval enhancement generation technology is proposed to address the problems of insufficient public participation and strong dependence on specialized knowledge in current emergency decision-making. It aims to integrate social media public data and domain knowledge base, construct a public-expert collaborative multi-attribute decision-making model, improve the scientific and response effectiveness of disaster response, and apply it to emergency management. Firstly, we use Python crawler tool to obtain public comments from microblogging platform to form the emergency disaster demand database; secondly, we integrate the emergency management professional database based on RAG technology to enhance the model generating ability, guide the topic classification through cue word engineering, construct the topic word co-occurrence network, adopt Louvain algorithm clustering, and combine with the expert checking and optimization, to generate attribute sets of emergency decision-making; and then, we integrate the importance and cohesiveness of the public-expert collaborative multi-attribute decision-making model, and apply it to the emergency management. , synthesize the importance and cohesion factors to construct the attribute weight measurement model; finally, consider the psychological behavior of decision makers, and use TODIM method to sort and optimize the alternative emergency solutions. Taking the 7-20 Henan rainstorm event as an example, the experimental results show that the method proposed in this paper is able to generate emergency decision-making topics that meet the public demand, and performs well in the consistency and diversity of the topics, which are 0.583 and 0.943, respectively, verifying the scientificity and effectiveness of the method proposed in this paper.
  • ZHAO Shuxu, CHEN Yanhong, WANG Xiaolong, JIANG Kaijun
    Accepted: 2025-10-28
    】To address issues such as resource mismatch, load bottlenecks, and service instability caused by demand fluctuations and large-scale bursty tasks in mobile edge computing, a cooperative supply strategy based on approximate Shapley values (ASVC) is pro posed. First, a task allocation model based on bidirectional preference matching is constructed, which considers both the performance requirements of user tasks and the resource status of edge nodes. The Gale-Shapley algorithm is used to achieve optimal supply-demand matching. Second, to reduce the computational complexity of Shapley value estimation during coalition formation, an adaptive sam pling-based optimization scheme is introduced. This approach significantly reduces the computation time of Shapley values while maintaining accuracy. Finally, task data is allocated according to the proportional contribution of each node, improving system fairness and resource utilization efficiency. Simulation results show that, compared with existing algorithms, the proposed ASVC algorithm improves service quality, delay control, task completion rate, and system load balancing by approximately 27.8%, 31.0%, 30.8%, and 21%, respectively.
  • Yanli Lv, Yiwen Jiang, Hanyu Feng, Zhenqi Guo, Sheng Xiang
    Accepted: 2025-10-28
    As generative AI technologies become increasingly integrated into sensitive industries, the over-reliance of large generative models on memorizing training data during fine-tuning poses a growing risk of privacy leakage, where user identities, behavioral traces, and other sensitive information may be reconstructed during inference. To address this issue, a novel fine-tuning approach combining Differential Privacy (DP) with Low-Rank Adaptation (LoRA) is proposed. This method freezes the parameters of the pre-trained model and updates only the inserted LoRA modules. Additionally, Differential Privacy Stochastic Gradient Descent (DP-SGD) is introduced, implementing gradient norm clipping and Gaussian noise injection on a per-sample basis to minimize the model’s dependence on individual training samples. Based on the Qwen2-1.5B language model, a task-specific fine-tuning dataset incorporating user profiles is constructed, and adversarial samples targeting typical sensitive fields—such as identity markers, behavioral characteristics, and location data—are developed to evaluate the anti-leakage capabilities of traditional full-parameter fine-tuning versus the DP-LoRA approach. Experimental results demonstrate that fully fine-tuned models exhibit a high sensitive-information match rate of 73.07% across 130 adversarial samples, indicating severe privacy vulnerabilities. In contrast, the DP-LoRA fine-tuned models achieve a significantly reduced match rate of only 1.5%, with generated content showing minimal correlation to original training data. This approach effectively mitigates the risk of sensitive information disclosure, offering a cost-efficient and highly adaptable training strategy for deploying generative models in real-world scenarios with stringent data security requirements.