Author Login Chief Editor Login Reviewer Login Editor Login Remote Office
Home Browse Just accepted

Just accepted

Please wait a minute...
  • Select all
    |
  • Li Xu, Luo Dezhe, Wang Hongjun
    Accepted: 2025-11-10
    With the rapid development of global maritime transportation, ship trajectory prediction plays an important role in shipping safety and management. However, achieving high-precision and physically feasible continuous trajectory prediction remains a key challenge due to the large-scale ship trajectory data and the uncertainty of complex maritime environments. Traditional prediction methods have limitations in handling complex maritime environments and large-scale dynamic data. To address these challenges, this paper proposes a geographically constrained multi-method fusion ship trajectory prediction model. The model introduces a geographical constraint loss function to optimize the accuracy, heading stability, and physical feasibility of trajectory predictions. Additionally, a multi-method fusion network structure is designed, incorporating bidirectional gated recurrent units, attention mechanisms, and multi-scale convolutions, which enhances the ability to extract temporal features and integrate multi-scale information. Experimental results demonstrate that the proposed model achieves lower prediction errors across multiple maritime datasets, with particularly significant advantages in long-term predictions compared to existing models. The study confirms that this model offers high accuracy and stability in ship trajectory prediction, providing effective support for practical applications in the maritime field.
  • YAO Xun, HE Yuan, HU Xinrong, YANG Jie
    Accepted: 2025-11-10
    Sequential recommender systems excel at capturing users' dynamic interests, yet their open nature makes them highly vulnerable to data poisoning attacks. Attackers can effectively manipulate recommendation outcomes by altering the textual descriptions of items, posing a severe challenge to model robustness. Existing defense strategies, which primarily rely on static rules or fixed-intensity perturbations, struggle to counter the growing complexity and variability of semantic-level textual attacks.To address this challenge, we propose RADAR, a two-stage collaborative defense framework. This framework synergizes robustness enhancement at the training stage with real-time protection at the inference stage. First, during training, it employs dynamic adversarial training to bolster the model's intrinsic resilience against unknown textual perturbations. Second, at inference, it leverages a Large Language Model (LLM) for precise semantic-level anomaly detection and content restoration.Experimental results demonstrate the superior defense performance of RADAR. In attack tests on the Scientific dataset, compared to the strongest baseline model(Cert-LLM), RADAR reduces the exposure increase of malicious items from 3.1796% to just 0.9921%. This powerfully validates the framework's effectiveness in enhancing the security and robustness of sequential recommender systems.
  • GUO Yang, SUN Jing-yu
    Accepted: 2025-11-07
    With the development of quantum computing technology, traditional image encryption algorithms are facing the challenge of insufficient quantum attack resistance, while existing quantum image encryption algorithms have problems such as high quantum bit consumption and limited parameter space of chaotic systems. To address the above unsolved problems, this paper proposes a dual-quantum image encryption algorithm based on a chaotic system, aiming to achieve a balance between low resource consumption and high security. Firstly, a dual-bit-plane quantum image representation model (DBRQI) is proposed, which only requires 2n+4 quantum bits to store a grayscale image, reducing quantum bit consumption by 50% compared with the BRQI model. Secondly, a 3D hyperchaotic system (3D-CHCMM) is constructed: the parameter space of its 4 control parameters is increased by 33% compared with existing systems, and its 3 Lyapunov exponents are all positive. Moreover, the system has passed 15 NIST tests, enabling it to generate pseudorandom sequences with high randomness. The algorithm maps quantum states through DBRQI, scrambles pixel information via odd-even bit-plane scrambling and random row-column scrambling, and then performs an XOR operation with the pseudorandom sequences to generate ciphertext. Experimental results show that the horizontal correlation of the encrypted image is as low as 0.0041, the information entropy reaches 7.9993, and the NPCR is 99.6251%, indicating that the algorithm’s attack resistance and anti-interference capability are significantly enhanced. The algorithm in this paper provides an efficient solution for image encryption in current scenarios with limited quantum hardware.
  • Zhang Yao, Zhang Junsan, Ma Junpeng, Yao Zongquan, Liu Tianyi
    Accepted: 2025-11-07
    This paper proposes an improved YOLOv8-based model named CAFR-YOLO to address the issues of insufficient cross-level feature interaction and limited feature representation capability in multi-scale object detection under complex scenes. First, a novel cross-scale feature reorganization pipeline was designed, constructing the Channel Attention-guided Feature Reorganization (CAFR) module. By using a specific layer as the fusion backbone and incorporating scale alignment, attention-weighted fusion, and feature subset splicing strategies, it alleviates insufficient cross-level interaction in traditional feature pyramid structures. Secondly, at the local level, the method introduces the C2f_DCNv3 module into the backbone network, significantly enhancing the model's geometric adaptability by exploiting the dynamic sampling characteristics of deformable convolution. From a global perspective, the C2f_SAConv module is constructed by combining Switchable Atrous Convolution (SAC) with the C2f module, optimizing multi-scale semantic feature fusion through dynamic atrous rate adjustment. These two approaches enhance the model's robustness to complex scenes. Finally, SPDConv replaces traditional convolution structures, strengthening feature representation through spatial-channel reorganization while reducing computational complexity. Experimental results demonstrate that CAFR-YOLO achieves 86.3% mAP@0.5 and 67.2% mAP@0.5:0.95 on the PASCAL VOC dataset with comparable computational costs to the original model. On the MS COCO dataset, it improves mAP@0.5 and mAP@0.5:0.95 by 3.5% and 3.9%, respectively. Compared to existing state-of-the-art methods, CAFR-YOLO exhibits significant advantages across multiple metrics. The proposed CAFR-YOLO model substantially enhances multi-scale object detection accuracy and robustness while maintaining computational efficiency, providing a novel solution for real-time object detection tasks.
  • TIAN Hongpeng, LI Zhiqiang, YANG Sai
    Accepted: 2025-11-05
    In lightweight small UAV image object detection tasks,there are common challenges such as low detection accuracy, complex backgrounds, large variations in target scale, dense target distribution, and a relatively large number of model parameters. Therefore, this paper proposes a novel improved RT-DETR object UAV object detection algorithm. First, an enhanced C2f-Heat-Lsk module is developed through integrating the HeatBlock thermal conduction module and LskBlock spatial selective attention mechanism into the C2f structure. This modified module collaborates with the original C2f module to redesign the RT-DETR backbone network, which improves spatial feature extraction while reducing model parameters Second, a novel feature fusion structure SOFEP replaces the original feature pyramid to mitigate detail loss in small objects and enhance their feature representation. Third, a combined Focaler-MPDIoU loss function is constructed by integrating Focaler-IoU and MPDIoU loss mechanisms, which improves bounding box regression accuracy and reduces miss detection rates. Experimental results on the VisDrone test set show that the improved model reduces parameter count by 16.9% compared to RT-DETR, while achieving improvements of 2.6% in mAP0.5 and 1.9% in mAP0.5:0.9. The model also outperforms RT-DETR on the DOTAv1.0 and HIT-UAV datasets. These advancements demonstrate that the proposed method achieves higher detection accuracy with reduced computational complexity, effectively meeting the requirements for small object detection in UAV aerial images.
  • LiangShichao, WenWen, FengYali, ZhengJiabi, HaoZhifeng
    Accepted: 2025-11-05
    How to model and learn user’s behavior patterns is a crucial issue in temporal recommendation. However, the majority of existing research primarily centers on pattern learning within a single type of behavior. This limitation restricts the ability to take full advantage of the user's diversified behavior patterns revealed by various types of behaviors, such as clicking, purchasing, marking as favorite, and so on. As a result, the potential for enhancing recommendation performance remains underexplored. To address this gap, this research delves into the multi-seasonal sequential dependencies of individual behaviors and the intricate dependencies among different types of behaviors over time. Specifically, we propose a novel model, named multi-seasonal multi-behavior (MSMB) model, for learning temporal patterns across multiple behaviors. In the proposed model, a dual-channel sequence encoder is employed, which incorporates a multi-scale exponential moving average (EMA) mechanism to effectively capture the multi-seasonal temporal dependencies within individual behavior sequences. Additionally, a cross-behavior dependency module is introduced to account for different periodic granularities, thereby enabling the model to effectively capture the time-variant dependencies across various types of behaviors. Extensive experiments conducted on three benchmark datasets demonstrate the effectiveness and superiority of the proposed MSMB model in enhancing temporal recommendation performance.
  • CHEN Haozhi, CAI Ruichu, LI Zijian, HAO Zhifeng
    Accepted: 2025-11-05
    Time series segmentation, an important task in time series analysis, has been widely applied in fields such as biological behavior analysis and physical system analysis. However, most existing time series segmentation methods fail to account for the nonstationary dynamics of time series induced by distribution shifts, thereby limiting their ability to achieve accurate segmentation in nonstationary regimes. To solve this problem, this paper first proposes a data causal generation process hypothesis based on real-world scenarios. Under this hypothesis, the latent variables underlying the observed data can be decomposed into stationary and non-stationary latent variables. Here, the stationary variables represent information that is unchanged or changes periodically, while the nonstationary variables represent dynamically changing information. Secondly, based on this causal generation process hypothesis, a Stationary Nonstationary Disentangle Model (SNDM) is designed. This model disentangles stationary and nonstationary variables, thus enabling enhanced focus on non-stationary dependencies in the time series. Moreover, in order to accurately disentangle and extract variables, the evidence lower bound (ELBO) of variational inference is used to construct the loss function of the model. Leveraging this ELBO, this study introduces stationary and nonstationary prior neural network modules to improve latent variable disentanglement accuracy. Finally, through experiments, we validate that our model outperforms several state-of-the-art time series segmentation methods on various benchmark datasets, thereby highlighting its advantages in practical scenarios.
  • Zhao Weiyue, Wu Jingya, Lu Wenyan, Li Xiaowei and Yan Guihai
    Accepted: 2025-11-05
    Emerging applications in datacenters have introduced a significant amount of large-granularity RDMA communication requirements. RDMA relies on physical addresses, and, when accessing large-granularity data, the Page Table Entries (PTEs) required for address translation exceed the cache capacity of hardware devices. Current high-performance commercial solutions store PTEs in the host memory. However, this architecture requires large-granularity communication to be executed only after fetching the PTEs from the host memory, which introduces PCIe traversal and host memory access latency, severely degrading address translation efficiency and increasing host CPU overhead. To achieve efficient large-granularity RDMA, this paper designs a configurable high-performance address mapping structure: XiRang. XiRang efficiently extends the access granularity through a streaming prefetch mechanism and a hierarchical cache design, and implements flexible and high-throughput address translation performance through a configurable address translation array. The XiRang prototype is implemented based on a DPU. Experiments show that: 1) XiRang effectively offloads the address translation load of the RDMA data plane, decoupling it from the host CPU; 2) The streaming prefetch extension mechanism used by XiRang effectively reduces storage overhead, with cache consumption at only the 10-byte level under concurrent modes, and concurrent storage overhead being negligible; 3) Under a high number of concurrent memory access requests, XiRang maintains a translation table entry query hit rate close to 100%, reducing the idle time of the translation engine by 2 to 3 orders of magnitude compared to the RNIC architecture; 4) The translation throughput of XiRang is more than 60 times that of the RNIC translation architecture and more than 3.5 times that of the basic DPU address mapping structure; 5) In performance enhancement mode, XiRang's address translation speed can support a data transfer bandwidth of 1.4 TB/s.
  • Jia Xinglong, Qin Junping, Yan Kai, Liu Zheng, Wang Dan, Shao Xinran, Shao Zezhou
    Accepted: 2025-11-05
    In order to solve the problem of insufficient accuracy in identifying endangered animals in complex backgrounds in the wild, this study improved the YOLOv8 model. First, the Dynamic Snake Convolution (DSConv) was introduced in the backbone network to enhance the detection performance of the model under occlusion. Secondly, the global attention mechanism (GAM) was introduced in the neck network to improved the model's attention to information related to endangered animals, suppress irrelevant features such as the environment, and reduce redundant information. Then, a small target detection head was designed in the head network to fuse shallow feature maps to improved the network's perception and positioning capabilities for small targets. Finally, the bounding box loss regression function based on the minimum point distance (MPDIoU) was used to replace the traditional CIoU algorithm, thereby improving the convergence speed and positioning accuracy of the algorithm. The experimental results show that the detection accuracy and average precision of the proposed model for endangered animals in complex backgrounds are 96.2% and 97.2%, respectively, which are 2.1 and 2.4 percentage points higher than the basic YOLOv8n detection accuracy and average precision, respectively. Using the same data set to conduct comparative experiments on different target detection models, the average precision is increased by 28.7, 22.5, 3.5, and 2.4 percentage points compared with Faster-RCNN, SSD, YOLOv5, YOLOv7 and other models, respectively. The experiment proves that the improved YOLOv8 model can provide a theoretical basis for the detection of endangered animals in complex backgrounds.
  • WU Shixun, TANG Peiyao, LAN Zhangli, Xu Kai, ZHANG Miao
    Accepted: 2025-11-04
    WiFi fingerprint positioning based on received signal strength indication (RSSI) has gained wide attention due to its ease of deployment and cost-effectiveness. However, existing fingerprinting methods typically rely on large-scale training data, while data augmentation often produces virtual samples of uneven quality, thereby limiting positioning accuracy and generalization. To address these issues, this study proposes a multi-parameter optimization WiFi fingerprinting method driven by few-shot learning (FSL). The method integrates an attention-enhanced convolutional neural network (CNN) with a meta-learning framework to enable rapid adaptation under limited data, while particle swarm optimization (PSO) is employed for automated data selection and joint hyperparameter tuning under physical constraints. Experimental results demonstrate that the proposed method achieves average positioning errors of 0.52 m on the CJU dataset and 6.88 m on the public Tampere dataset, improving accuracy by at least 49.5% and 8.7% compared with baseline methods. In addition, a generalization test on the CJU-2024 dataset shows that the model adapts effectively to new environments with only a small amount of data, achieving an average positioning error of 2.17 m and an accuracy improvement of at least 26.7%. These results confirm that the proposed method significantly improves indoor positioning accuracy while maintaining strong generalization capability.
  • YANG Yingying , CHE Jin , BAI Xuebing, XIAO Long, JIAN Liqiong
    Accepted: 2025-11-04
    Existing unsupervised person Re-ID methods focus only on pedestrians’ global features, causing global feature bias and insufficient data diversity that impair recognition accuracy.To address this, this paper proposes an innovative ViT-based method(DAFP) integrating Multi-level Data Augmentation (MDAM) and Feature Purification (FP). Firstly, the MDAM—including geometric spatial transformations, appearance feature perturbations, and occlusion simulation—expands training sample diversity and enhances the model’s cross-camera robustness. Additionally, the FP module divides the local features output by the Transformer into upper and lower parts according to spatial positions, performs adaptive weighted fusion with global features via a multi-view distance matrix, and generates high-quality pseudo-labels with DBSCAN, effectively alleviating similar pedestrian misclustering caused by over-reliance on single global features in traditional methods. Finally, a global-local clustering contrastive loss dynamically updates global and local clustering centers to strengthen fine-grained feature learning. Experimental results on Market1501, DukeMTMC-reID, and MSMT17 show that its mAP/Rank-1 reaches 90.5%/96.0%, 77.6%/87.6%, and 64.5%/86.0%, respectively, significantly surpassing the current state-of-the-art methods and fully verifying the superior performance of this method.
  • Tang Weilin, Wang Junfeng, Ge Wenhan, Zhang Chengcheng, Zhan Weilu
    Accepted: 2025-11-04
    Cyber Threat Intelligence (CTI) plays a pivotal role in mitigating the asymmetry between cyber attacks and defenses. However, current extraction methods for Tactics, Techniques, and Procedures (TTPs) predominantly rely on supervised language models with manual annotation, which suffer from inefficiency and inconsistency issues. Although the MITRE ATT&CK framework has mitigated TTP description problems through standardized classification, existing NLP-based approaches still face three major challenges: insufficient generalization capabilities, delayed version adaptation, and poor interpretability. To address this, DetecTTive is proposed—a zero-shot learning-based TTP extraction method for large language models that combines the prior knowledge of large language models with external trustworthy knowledge. This framework innovatively utilizes the ATT&CK official knowledge base as an external knowledge source, combining vector-based semantic retrieval and graph-enhanced association reasoning, along with agent workflow to achieve automated white-box reasoning. This enhances zero-shot performance while ensuring result traceability. Experiments demonstrate that the proposed zero-shot approach achieves an F1 score of 80.02% and a recall of 83.46% in benchmark datasets. This method effectively addresses the data bias and version adaptation issues inherent in conventional models, providing an interpretable and cost-efficient solution for TTP extraction in dynamic threat environments.
  • Fan Qinlong, Sun Yepeng, Lu Jicang, Zhu Taojie and Liu Yilin
    Accepted: 2025-11-04
    With the popularization and development of the internet, the massive volume of user-generated comments on trending topics and their widespread dissemination profoundly influence the progression and development of real-world events. Consequently, mining public stances and attitudes toward trending topics holds significant practical value for domains such as online public opinion monitoring and social security governance. Stance detection technology aims to identify user attitudes toward specific targets from user-generated texts. Although numerous studies have proposed diverse task scenarios and technical methodologies, a unified classification framework for stance detection tasks remains elusive. First, this paper presents a comprehensive review of stance detection tasks from two dimensions: task scenarios and technical methodologies, systematically organizing the current research landscape and development trends. From the task scenario perspective, we classify stance detection into three paradigms: target-specific, target transfer, and target generalization, highlighting the field's evolution from domain-specific applications toward broader adaptability. From the methodological perspective, we categorize stance detection approaches into three primary classes: model-based engineering, knowledge-driven engineering, and data-centric engineering, analyzing the strengths and limitations of each. Additionally, we conduct statistical and experimental analyses of publicly available resources across multiple dimensions, revealing key characteristics and developmental trajectories of these benchmark datasets. Finally, the paper concludes with a summary and outlines prospective research directions and persistent challenges.
  • Wu Qiannan, Ding Weiping, Fan Xiaoxue, Ju Hongrong, Zhou Linlin, Wang Jing
    Accepted: 2025-10-31
    Feature selection can effectively identify informative features from complex data to improve information processing efficiency. However, in partially labeled data scenarios, traditional feature selection methods face significant challenges due to inherent label ambiguity, complex inter-sample relationships, and difficulties in feature importance evaluation. To address these challenges, this paper proposes MFG-FS, an effective feature selection framework for partially labeled datasets. First, to tackle label ambiguity, we design an end-to-end disambiguation method based on the MLP-Mixer model and contrastive learning, which optimizes the feature representation space to enhance discriminative power and obtain more reliable label confidence distributions. Second, to accurately characterize complex sample relationships in partially labeled data, we construct fuzzy similarity relations and information granules that integrate multi-source information, effectively combining local feature-space structures, global correlations from disambiguated labels, and label constraints. Subsequently, based on the constructed fuzzy information granules, we define and employ a fuzzy mutual information measure for feature evaluation, which quantifies the relevance between feature subsets and labels while assessing internal redundancy, thereby providing a robust basis for high-quality feature subset selection. Finally, extensive experiments on five synthetic and four real-world datasets demonstrate that MFG-FS can select more discriminative and robust feature subsets, achieving superior performance in partial label disambiguation and classification accuracy.
  • HUANG Yuqi, YANG Xiaoxia, YANG Ronghao , LIAO Fangzhou, YAN Le, GUO Junqiang, LI Minghan
    Accepted: 2025-10-30
    Object detection for autonomous driving perception aims to locate and identify traffic participants such as motor vehicles, non-motor vehicles, and pedestrians within onboard camera views in real time, providing accurate input for the environmental perception module to support decision-making and control in autonomous driving systems. The perception system suffers from false and missed detection rates due to complex road backgrounds, diverse object shapes, and large scale variations. Specific challenges include low accuracy in detecting deformed objects, insufficient multi-scale detection, and weak global perception. To address these issues, an improved algorithm named YOLOv8-DDL based on YOLOv8n is proposed. First, deformable attention is introduced to improve the C2f module in the backbone network, which dynamically learns feature offsets to enhance the capture capability for various object shapes in traffic scenes, improving the model's adaptability to complex spatial distributions and effectively reducing false detections. Second, large separable kernel attention is integrated to enhance the spatial pyramid pooling fast module, expanding the receptive field through large-kernel convolution to strengthen global context modeling and robustness in complex backgrounds. Finally, a dynamic multi-scale adaptive fusion module and a dynamic feature pyramid network are designed to reconstruct the neck network, dynamically fusing high-level and low-level features to enhance multi-scale feature representation and improve multi-scale object detection performance. Experimental results on the public SODA10M dataset show that compared to YOLOv8n, YOLOv8-DDL improves precision, recall, F1-score, and mean average precision by 5.9%, 1.3%, 3%, and 1.5%, respectively. Additional validation on the public BDD100K dataset confirms improvements of 2%, 0.6%, 1%, and 2% in these metrics, respectively.
  • CHEN Junhong, ZHOU Feng, TIAN Youliang, YANG Kedi, ZHANG Qijia
    Accepted: 2025-10-29
    As the demand for data training across industries increases, data has become a key factor of production. Data rights confirmation can clarify data ownership and allocate benefits, preventing unauthorized use. However, the existing schemes have problem such as uncontrollable rights and low efficiency of rights confirmation in rights collection, storage and use. To address in these challenges, this paper proposes a trapdoor hash-based data confirmation scheme for rights-controllable. First, in order to prevent the loss of data right during data transfer, this paper constructs a right confirmation model with the separation of holding, management, and usage rights, thus achieving a refined allocation of rights. Second, Aiming at the problem of uncontrollable generation of management rights of existing correlation algorithms, a data confirmation algorithms based on trapdoor hash is proposed, which realizes controllable generation of data management rights with changes and improves the efficiency of correlation at the same time. In addition, combined with blockchain technology, this paper designs a data transaction mechanism for authorization-traceable, which realizes the non-repudiation and traceability of data transactions by finely controlling the collection and access of data and uploading the corroboration information. Finally, through the security analysis and performance analysis, it is concluded that compared with the traditional scheme, the proposed scheme has advantages in terms of computation and storage overhead while ensuring that the rights signatures cannot be forged.
  • Gao Jianwei, Zhao Shutong, Huang Ningbo
    Accepted: 2025-10-28
    Under the background of rapid development of artificial intelligence, a group intelligent emergency decision-making method based on large language model and retrieval enhancement generation technology is proposed to address the problems of insufficient public participation and strong dependence on specialized knowledge in current emergency decision-making. It aims to integrate social media public data and domain knowledge base, construct a public-expert collaborative multi-attribute decision-making model, improve the scientific and response effectiveness of disaster response, and apply it to emergency management. Firstly, we use Python crawler tool to obtain public comments from microblogging platform to form the emergency disaster demand database; secondly, we integrate the emergency management professional database based on RAG technology to enhance the model generating ability, guide the topic classification through cue word engineering, construct the topic word co-occurrence network, adopt Louvain algorithm clustering, and combine with the expert checking and optimization, to generate attribute sets of emergency decision-making; and then, we integrate the importance and cohesiveness of the public-expert collaborative multi-attribute decision-making model, and apply it to the emergency management. , synthesize the importance and cohesion factors to construct the attribute weight measurement model; finally, consider the psychological behavior of decision makers, and use TODIM method to sort and optimize the alternative emergency solutions. Taking the 7-20 Henan rainstorm event as an example, the experimental results show that the method proposed in this paper is able to generate emergency decision-making topics that meet the public demand, and performs well in the consistency and diversity of the topics, which are 0.583 and 0.943, respectively, verifying the scientificity and effectiveness of the method proposed in this paper.
  • ZHAO Shuxu, CHEN Yanhong, WANG Xiaolong, JIANG Kaijun
    Accepted: 2025-10-28
    】To address issues such as resource mismatch, load bottlenecks, and service instability caused by demand fluctuations and large-scale bursty tasks in mobile edge computing, a cooperative supply strategy based on approximate Shapley values (ASVC) is pro posed. First, a task allocation model based on bidirectional preference matching is constructed, which considers both the performance requirements of user tasks and the resource status of edge nodes. The Gale-Shapley algorithm is used to achieve optimal supply-demand matching. Second, to reduce the computational complexity of Shapley value estimation during coalition formation, an adaptive sam pling-based optimization scheme is introduced. This approach significantly reduces the computation time of Shapley values while maintaining accuracy. Finally, task data is allocated according to the proportional contribution of each node, improving system fairness and resource utilization efficiency. Simulation results show that, compared with existing algorithms, the proposed ASVC algorithm improves service quality, delay control, task completion rate, and system load balancing by approximately 27.8%, 31.0%, 30.8%, and 21%, respectively.
  • Yanli Lv, Yiwen Jiang, Hanyu Feng, Zhenqi Guo, Sheng Xiang
    Accepted: 2025-10-28
    As generative AI technologies become increasingly integrated into sensitive industries, the over-reliance of large generative models on memorizing training data during fine-tuning poses a growing risk of privacy leakage, where user identities, behavioral traces, and other sensitive information may be reconstructed during inference. To address this issue, a novel fine-tuning approach combining Differential Privacy (DP) with Low-Rank Adaptation (LoRA) is proposed. This method freezes the parameters of the pre-trained model and updates only the inserted LoRA modules. Additionally, Differential Privacy Stochastic Gradient Descent (DP-SGD) is introduced, implementing gradient norm clipping and Gaussian noise injection on a per-sample basis to minimize the model’s dependence on individual training samples. Based on the Qwen2-1.5B language model, a task-specific fine-tuning dataset incorporating user profiles is constructed, and adversarial samples targeting typical sensitive fields—such as identity markers, behavioral characteristics, and location data—are developed to evaluate the anti-leakage capabilities of traditional full-parameter fine-tuning versus the DP-LoRA approach. Experimental results demonstrate that fully fine-tuned models exhibit a high sensitive-information match rate of 73.07% across 130 adversarial samples, indicating severe privacy vulnerabilities. In contrast, the DP-LoRA fine-tuned models achieve a significantly reduced match rate of only 1.5%, with generated content showing minimal correlation to original training data. This approach effectively mitigates the risk of sensitive information disclosure, offering a cost-efficient and highly adaptable training strategy for deploying generative models in real-world scenarios with stringent data security requirements.
  • Guozheng Yang, Dongzhen Qi, Pan Chen, Zhaobin Shen, Pengyu Yin, Yanlin Huo
    Accepted: 2025-10-27
    Resource Public Key Infrastructure (RPKI) is an important mechanism to safeguard BGP routing security, which realizes the legitimacy verification of BGP announcements by Route Origin Authorization (ROA) and Route Origin Validation (ROV). As RPKI continues to advance globally, its deployment status and actual defense effect have become the focus of research. In recent years, researchers have carried out a great deal of researches about ROA configuration problems and ROV deployment measurements, portraying the operational status and protection capability of RPKI in real networks from different dimensions. Current RPKI-related surveys mainly focus on the theoretical research of the RPKI system itself, emphasizing its architectural vulnerabilities, without systematically organizing and deeply summarizing the key challenges and related studies encountered in the actual deployment of RPKI. This review systematically summarizes recent studies on deployment issues of the RPKI system. It focuses on classifying common types of errors in ROA configuration, including benign ROA conflicts and loose ROA registrations, providing a systematic analysis that reveals their causes and impacts on routing security. Finally, this review outlines future research directions in the field of RPKI deployment issues, providing a theoretical foundation and methodological reference for subsequent research in the directions of RPKI deployment optimization, security assessment and strategy research. This will help promote the widespread adoption of RPKI and enhance the defense against BGP prefix hijacking.
  • Liu Meigui, Zhang Neng, Li Jiale, Zhao Yuqi, Li Zengyang
    Accepted: 2025-10-27
    Redundant dependencies in software projects can lead to increased build size, performance overhead, and long-term maintenance burden. Although existing studies have investigated redundant dependencies in the Maven ecosystem, there remains a lack of analysis regarding their distribution across different dependency scopes (e.g., compile and test), their evolutionary patterns, and their impact on project popularity. To address this gap, we select 2,214 Java Maven open-source projects from GitHub as our study subjects. We employ a mvn command to identify dependencies that are declared but not actually used, and conduct a quantitative analysis of redundancy ratios based on their scopes. Furthermore, we apply the Mann-Kendall non-parametric trend test on 3,817 historical versions from 698 projects to identify trends in the evolution of redundant dependencies. To assess the relationship between redundant dependencies and project popularity or community activity, we construct five GitHub-based popularity and activity metrics, including star growth rate, fork growth rate, and issue closing rate, and perform Pearson correlation analysis. Experimental results show that redundant dependencies are primarily concentrated in the compile and test scopes, with median redundancy ratios of 33.33% and 30.00%, respectively. In terms of evolutionary trends, 48.1% of the projects maintained a stable redundancy ratio, 36.2% exhibited fluctuations, and a small proportion showed an increasing or decreasing trend. In the correlation analysis, only the issue closing rate shows a significantly weak negative correlation with the redundancy ratio. These findings provide developers with a detailed perspective on dependency management and can help optimize project configurations and improve software maintainability.
  • GAO Song, GAO Bo-lin, LU Jian, WU Yue-long, WANG He, XU Yue-yun
    Accepted: 2025-10-27
    Quantifying the discrepancy between different sensor perception algorithms' mapping of the physical world and identifying boundary data is a key challenge in automating the extraction of high-value boundary data. This paper proposes a discrepancy engine based on multi-source sensor data for the autonomous discovery of boundary data. The engine consists of two main modules: the discrepancy cognition module and the discrepancy rate calculation module. In the discrepancy cognition module, a discrepancy rate was defined, and an association model linking the discrepancy rate with perception mapping discrepancies was established. The average discrepancy rate of a dataset was used as the baseline discrepancy rate to quantify mapping discrepancies and identify boundary data. In the experiments, the baseline discrepancy rates of LiDAR, millimeter-wave radar, and vision-based perception algorithms were calculated as 0.17, 0.23, and 0.19, respectively. In the discrepancy rate calculation module, a 2D pixel distance matching strategy combining the chi-square distribution and Welsh loss was used to match camera-detected objects with those detected by LiDAR, millimeter-wave radar, and other cameras. Compagred to a fusion algorithm that used only a 3D distance matching strategy, the proposed approach achieved discrepancy rates of 0.16 and 0.14 relative to the ground truth on the test dataset, demonstrating that the improved matching strategy significantly enhanced the accuracy of the fusion algorithm. The results indicate that the discrepancy engine achieves average recognition accuracies of 0.85, 0.74, and 0.82 for the boundary data of LiDAR, millimeter-wave radar, and vision-based perception algorithms. Validation in real-world road scenarios, including straight urban roads, simple intersections, and complex intersections, confirms the engine's effectiveness in identifying perception boundary data.
  • Yu Chengwen, Xie Bin, Zhou BoBo, Li Xiang
    Accepted: 2025-10-27
    Extremely Large-scale Multiple-Input Multiple-Output (XL-MIMO) systems are considered as one of the key technologies to realize 6G communications. However, due to the significant increase in the number of antennas in XL-MIMO systems, the channel exhibits hybrid field characteristics, thus posing a great challenge to channel estimation. To address this problem, this paper proposes a deep learning-based Adaptive Frequency Filter Parallel Joint Convolutional Network (AFF-PJCN) channel estimation algorithm. Firstly, the received signal is processed by the adaptive frequency filter network, which is equipped with learnable filters that can automatically optimize the filtering parameters according to the input data, enabling adaptive signal analysis and modeling within the frequency domain, and effectively filtering out noise interference. Then, through the parallel joint convolutional network, the multi-scale convolutional operation of the parallel structure can effectively capture the global and local features of the received signal, further enhancing the channel estimation performance. To enhance the generalization ability of the model, a segmented hybrid data training strategy is adopted. The training set is constructed by independently sampling randomly in different signal-to-noise ratio intervals, ensuring that the model maintains robust performance under diverse channel conditions. The experimental results show that the proposed AFF-PJCN algorithm not only achieves superior estimation accuracy but also demonstrates stronger generalization and robustness compared with other existing channel estimation schemes in the hybrid field channel model of XL-MIMO systems.
  • FAN Zhengwei, CHANG Daofang, MAN Xingyu, WANG Chongwen
    Accepted: 2025-10-21
    X-ray inspection, as an intuitive means of nondestructive testing (NDT) of pipeline weld defects, plays a key role in the prevention of pipeline safety accidents. However, it remains challenging to accurately identify tiny defects in low-grayscale, low-contrast, and dark-toned X-ray images. Therefore, an innovative method is proposed to optimize the display effect of X-ray images of pipe welds under low-light conditions, and to achieve a certain improvement in the accuracy of defect detection. Firstly, the improved network framework of Retinex-Net is introduced, and the attention mechanism residual block is added to the network to restore illumination and enhance details of low-light X-ray images, suppress noise and artifacts, and output natural and obvious distortion enhancement images, providing high-quality input for subsequent detection. Secondly, a weld positioning and feature extraction algorithm based on drift Gaussian algorithm is designed, which adaptively tracks irregular long welds and automatically crops the weld area, which significantly reduces background interference and improves processing efficiency. Finally, the welding defect detection algorithm based on cross-layer feature fusion is optimized, and a feature codec architecture based on RSU module is constructed, and the attention mechanism is integrated in the feature extraction stage to strengthen cross-layer multi-scale feature fusion, so as to improve the detection accuracy and reduce the missed detection rate.The results show that the proposed method significantly improves the performance indicators in the public GDXray dataset, which not only effectively enhances the image quality, but also realizes the high degree of automation and fast response ability of weld defect detection, which proves its efficiency and accuracy in practical application scenarios.
  • ZHANG Bin, LI Run-hao, FENG Chao
    Accepted: 2025-10-20
    Automatic heap memory layout manipulation is the core technology for realizing exploit code generation of software memory corruption vulnerabilities, with the goal of constructing the necessary memory layout conditions for vulnerability exploitation by precisely controlling the allocation state of heap memory. However, existing memory automatic layout manipulation methods based on search and solving exhibit significant limitations in terms of efficiency. To address these challenges, this paper innovatively proposes a Large Language Model (LLM)-based approach for automatic memory layout manipulation. This method first leverages LLMs to automatically learn from the target heap manager's public documentation, source code comments, and analysis materials to acquire the allocator's operational mechanisms and key characteristics. Building on this foundation, the approach employs the powerful reasoning and feedback-driven thinking capabilities of LLMs to adopt an iterative layout strategy of "plan-verify-replan." By continuously incorporating feedback from debugger execution results to refine the layout planning strategy, it ultimately achieves automated memory layout. Experimental validation demonstrates that this solution successfully achieves precise memory layout in 12 real-world Linux user-space vulnerabilities and attains a 94.54% layout success rate on a benchmark comprising 3,735 test samples across six different heap managers. Compared to the search-based Gollum system, it improves layout manipulation speed by 2.33 times. Relative to the solving-based MAZE and BAGUA systems, it reduces the heap allocator behavior learning time from weeks to an average of 7.3 minutes without significantly compromising layout speed. These results verify that the proposed solution balances high efficiency and scalability, offering a new technical paradigm for LLM-based research on automated vulnerability exploitation.
  • Bojia Chen, Tingnian He, Lianjie Zhang, Shu'an Chen
    Accepted: 2025-10-20
    Cross-domain recommendation systems are widely applied in e-commerce and content platforms. Although the dual-target cross-domain recommendation (DTCDR) proposed in recent years has achieved a breakthrough in simultaneously improving the performance of both domains, it still faces two major challenges: 1) the generated user-item representations lack sufficient correlation and diversity; 2) the semantic noise mixed in the shared preferences leads to negative transfer problems. To address these issues, a dual-target cross-domain recommendation model based on heterogeneous graph and hierarchical preference disentanglement (HGPD-DTCDR) is proposed. Its core innovations include: 1) a heterogeneous graph collaborative learning framework is proposed to integrate user-item interactions, user social networks, and item attribute similarities, constructing a multi-relation heterogeneous graph, and generating high-order semantic representations through a relation graph convolutional network (R-GCN) to enhance the diversity and correlation of the representations; 2) a two-stage decoupling process is designed, first separating domain-specific and shared preferences through a variational graph encoder, and then introducing a semantic filtering network to optimize the quality of shared preferences. Experiments on five real cross-domain datasets show that the performance improvement of this model stems from the synergistic effect of heterogeneous graph modeling and hierarchical decoupling mechanisms. Compared with the best baseline, it achieves average improvements of 3.55%, 7.27%, and 15.57% in hit rate, normalized discounted cumulative gain, and mean reciprocal rank, respectively. In data-sparse scenarios, the performance improvement is even more significant, with an average gain of 10.35%. Ablation studies further verify the effectiveness of each technical component and their synergistic effects.
  • Xu Haoyu, Zhang Jing, Zhang Jiamin
    Accepted: 2025-10-20
    To address the challenges of small target scale, complex background, and insufficient feature representation in the detection of potential hazards on high-voltage overhead transmission lines, this paper proposes an improved lightweight real-time detection model, LG-DETR. First, a lightweight backbone network, ResNet-WT, is designed by introducing wavelet transform convolution to enhance multi-scale feature extraction while reducing computational complexity. Meanwhile, a frequency-separated self-attention mechanism is adopted in the feature fusion stage to improve the feature interaction module HL-AIFI, thereby mitigating background interference. Then, a cross-level multi-scale information aggregation feature pyramid network CMIAFPN is proposed to optimize feature transmission paths, combined with a gating module to improve feature retention efficiency and prevent detail loss in high-level features. Furthermore, by incorporating the scaling factor of Focal Loss into Wise-IoU, a novel Focal-WIoU loss function is developed to dynamically adjust the weighting of hard and easy samples, thereby enhancing the detection accuracy of small targets. Experimental results demonstrate that LG-DETR achieves a 6.94 percentage point improvement in and 23.9% reduction in parameters on a high-voltage overhead transmission line hazard dataset, verifying the effectiveness of the proposed improvements.
  • Wang Ruixuan, Li Yan, Zhong Jinghua, Yao Dengfeng, Xu Cheng, Ren Tianyu
    Accepted: 2025-10-17
    hinese Braille is a kind of script used by people with visual impairment in China and it is an important part of the National Commonly-Used Language and Script. At present, although there are some methods have been developed for the automatic translation from Chinese text to Braille text, there are still shortcomings. Braille word segmentation is a crucial step in Chinese-Braille translation, which seriously affects the final translation result. It is also an important task in the research of Braille informationization. Although pre-trained models have been widely used in the field of Chinese natural language processing, they are currently less commonly used in the study of Braille informationization. Braille and Chinese characters are expressions of the same language in different writing systems, and there are similarities and transferability between the two. Pre-trained models have great potential for development in the field of Braille informationization.This paper introduces the BERT pre-trained model into Braille word segmentation task. We used BERT to extract feature vectors and decoded them using CRF combining the whole-word masking strategy. A word segmentation model BERT-CRF-wwm of encoder-decoder structure is implemented. To address the issue that the original Chinese word segmentation information of the BERT model may interfere with Braille word segmentation, a new Braille embeddings is concatenated at the embedding layer and finally the BeBERT-CRF-wwm model is implemented. On the Chinese-Braille Corpus, it ultimately achieves a precision rate of 98.80% and a recall rate of 98.71%. Compared with existing Braille word segmentation methods, it achieves better results in various evaluation.
  • Huang Yinglai, Xiong Xueshan, Wan Langyi, He Yang, Yang Liusong
    Computer Engineering. https://doi.org/计 算 机 工 程
    Accepted: 2025-10-17
    Accurate classification of brain tumors is essential in medical imaging diagnosis. However, conventional approaches that heavily rely on expert experience suffer from low efficiency, while existing deep learning approaches struggle with modeling long-range dependencies and balancing global modeling with local feature extraction, resulting in suboptimal recognition accuracy. To address these issues, a Hierarchical Collaborative Residual Transformer Network (HCR-TNet) is proposed. First, a Conv-Pool-Transformer Composite Block (CPT-Block) is introduced to enhance local feature extraction and cross-level contextual modeling, thereby improving the representation of heterogeneous tumor regions. Second, the High-frequency Feature Extraction module (HFFE) module is incorporated to better capture textual details at tumor boundaries and subtle lesion characteristics while effectively suppressing noise. Finally, a Multi-scale residual block (MSRB) is designed to perform residual fusion with the CPT-Block, enabling cross-scale feature optimization from macro to micro structures. Experimental results on a public brain tumor MRI dataset show that the proposed method achieves a classification accuracy of 98.26%, a Kappa coefficient of 97.52%, and an MCC score of 97.52%. Compared to the ViT model, the accuracy is improved by 1.48% and the Kappa coefficient by 2.08%. Ablation studies and comparative experiments confirm the effectiveness of HCR-Net in brain tumor classification tasks, providing valuable methods and ideas for medical image analysis and automatic diagnosis systems.
  • Lin Hai, Yu Guo, Yin Zeming, Xu Xianchong, Liu Yuhai
    Accepted: 2025-10-17
    In long-context and high-concurrent scenarios, large language models (LLMs) encounter significant challenges during inference due to the quadratic growth of memory footprint caused by key-value (KV) cache in self-attention mechanisms, leading to excessive GPU memory consumption and limited throughput. Although KV cache sparsification have been proposed to address this issue, existing approaches still suffer from deficiencies in memory footprint, complexity of sliding window design, and computation-memory access overhead. This paper proposes DoubleSparse++, a triple-optimization framework that addresses these limitations through three innovative techniques: (1) A ring buffer-based sliding window decouples KV cache size from text length while reducing buffer update complexity from O(L) to O(1); (2) An exponential decay sparse equilibrium strategy dynamically allocates token sparsity according to layer indices, achieving progressive sparsification across layers; (3) Optimize the sparse inference kernel by implementing operator fusion and asynchronous device stream pipelines, achieving overlapped computation and memory access in long-context inference scenario, which significantly enhances computational intensity while reducing memory access frequency. Experimental validations conducted on domestic accelerators and mainstream LLMs (including OPT-6.7B, Vicuna-7B-v1.5, LLaMA-2-7B, LLaMA-3.1-8B, Qwen-2.5-7B) demonstrate that DoubleSparse++ achieves 1.31X inference speedup and 0.72X memory footprint reduction compared to DoubleSparse for 4K token generation tasks. Especially, in 13K token scenarios, the memory footprint further reduces to 0.56X of the baseline. Comprehensive performance analysis confirms that DoubleSparse++ constitutes an efficient KV cache sparse method, demonstrating strong applicability for LLM long-context inference and streaming deployment.
  • Li Shiyou, Lian Demeng, Zhou Xin, Han Mengzhi
    Accepted: 2025-10-17
    The CUDA sparse matrix template library (CUTLASS-Sparse) in the CUDA linear algebra template library (CUTLASS) is used to build customizable and high-performance sparse matrix-dense matrix multiplication (SpMM) kernels, which play an important role in many fields such as scientific computing and deep learning. However, it is only implemented and optimized for NVIDIA GPUs and cannot be applied to domestic accelerators. To solve this problem, a transplantation and optimization scheme for CUTLASS-Sparse for domestic accelerators is proposed. In the transplantation stage, the data access module, data computation module and data write-back module are adapted to the hardware architecture of domestic accelerators. In the optimization stage, two shared memory data reordering algorithms, a data pipeline strategy based on data prefetching and register double buffering, and a data write-back strategy based on data aggregation are proposed to address the problems of high conflict rate of shared memory physical storage units (bank), low shared memory bandwidth utilization, low data pipeline parallelism and low data write-back efficiency. Experimental results show that all three optimization methods significantly improve the performance of the transplanted CUTLASS-Sparse. For TF32 and FP16 data types, the overall performance of the optimized CUTLASS-Sparse increases by an average of 30% and 115% compared to the unoptimized version, respectively. It reaches an average of 76% and 60% of the performance of CUTLASS-Sparse on NVIDIA GPU L20, respectively. Under two hardware versions, the performance of the transplanted and optimized CUTLASS-Sparse is on average 2.36 times and 3.09 times that of the SPARSE math library on domestic accelerator platforms, respectively. The experimental results verify the effectiveness of the transplantation and optimization scheme.
  • Yue Minghui, He Yuxuan, Ren Yuanxin, ZHANG Liye
    Accepted: 2025-10-16
    Video understanding tasks face two major challenges: insufficient computational resources and video datasets scarcity. Current video models are massive and computationally intensive, relying on expensive equipment support and lengthy training period, the scarcity dataset also restricts models to train and generalize adequately. To address these problems, an efficient transfer learning method is introduced: the adapter training strategy. By freezing all the weights of the pre-trained Vision Transformer (ViT) model and only fine-tuning the parameters in the adapter, resource consumption can be significantly reduced while fully retaining the representational advantages of the pre-trained model. Based on the adapter training strategy, a hierarchical adapter and ViT backbone network are designed to jointly construct the Video ViT Adapter (VVA) model. The hierarchical adapter employs three spatiotemporal convolutions with different dimensions, which helps to balance the spatiotemporal relationships between details and the global context. Additionally, the Contrastive Language–Image Pre-training (CLIP) model, which possesses strong cross-modal learning capabilities, is introduced as the pre-trained model. This provides the VVA model with rich feature representations, facilitating effective fusion across different data modalities. VVA achieved excellent results on three standard action recognition datasets, with only 9.50M training parameters. Accuracy rates of 79.32% on Kinetics-400, 97.77% on UCF101, and 81.78% on HMDB51 were obtained. Such performance fully demonstrates that the adapter's efficiency and convenience can effectively address and properly resolve the challenges faced.
  • DING Lin, YANG Yang, GUO Caili, GUO JianZhang, LI Zheng
    Accepted: 2025-10-16
    The text-to-SQL task aims to automatically convert natural language queries into structured query language (Structured Query Language), serving as a key technology to enable non-technical users to access databases efficiently, thereby significantly improving data utilization.To address the challenge of large language models insufficiently understanding database schema information in prompts for text-to-SQL tasks, this paper proposes a table creation information-based fine-tuning method for large language models. Existing approaches often rely on complex, lengthy prompt templates or extensive fine-tuning data, facing two major bottlenecks: (1) The inclusion of complete prompt content in the templates dilutes the few critical cues, leading to attention dispersion in long-context understanding and consequently reducing inference performance; (2) The method requires manual collection and processing of tens of thousands of samples for large-scale fine-tuning to enable the model to achieve stable comprehension capability in text-to-SQL tasks after fine-tuning. To mitigate these issues, we propose a hybrid text-to-SQL generation strategy that integrates prompt engineering with fine-tuning. This method selects semantically relevant table creation information based on question similarity and combines it with concise prompt templates to construct a lightweight, manually curated fine-tuning dataset. Through supervised fine-tuning, the dataset guides large language models to better comprehend table schema information in prompts, enhancing their ability to capture relationships between tables and queries, thereby generating more accurate SQL statements. Experimental results demonstrate that the proposed method effectively reduces the model's reliance on extraneous information in prompt templates and mitigates attention dispersion during reasoning. The generated SQL queries achieve an execution accuracy of 83.37% , representing a 0.49 percentage point improvement over the baseline approach.
  • He Guangcheng, Li Deshi
    Accepted: 2025-10-16
    With the development of the industrial Internet, the traditional best-effort forwarding mode can no longer meet the needs of deterministic delay communication, and the IEEE 802.1 working group proposes the cyclic queue forwarding mechanism to achieve deterministic transmission. However, due to fixed-granularity slot forwarding, there are problems such as excessive resource occupation and limited deterministic delay range. Therefore, for time-triggered traffic scheduling with strict latency requirements, a hierarchical cyclic queuing and forwarding mechanism is proposed to reduce the time-triggered traffic delay and reduce resource occupation through fast forwarding. An optimization model to maximize network throughput was constructed to determine the forwarding mode and the injection time slot of the flows. Due to the NP-hard nature, a heuristic priority iterative incremental scheduling algorithm is proposed, which adopts traffic clustering, priority order update and incremental scheduling to realize the calculation of large-scale deterministic traffic. Experimental results show that compared with the CQF mechanism, the scheduling ability of this proposed mechanism is enhanced, and the lower bound of deterministic delay is reduced by half compared with the original mechanism. Resource occupation decreased by 25.77% on average. In multiple sets of experiments involving various topologies, different traffic characteristics and scales, the proposed algorithm is better than the four comparison schemes in terms of network throughput, and the average increase is 3.52%、2.04% and 51.77% compared with the Tabu Search、IRFS and Naive.
  • Yang Hongju , Liu Na , Li Yao Cao Fuyuan
    Accepted: 2025-10-16
    Sketch-guided image inpainting holds significant application value in photo restoration and creative editing but faces dual challenges of scarce user sketch data and restoration distortion caused by geometric deviations. Existing methods rely on edge detection to generate pseudo-sketches while neglecting user-drawn deviations (e.g., hand tremors, stroke breaks), leading to structural misalignment and detail blurring in complex scenes. To address these challenges, this study proposes an innovative framework combining a deformable sketch generation network with dual-stage guided inpainting. First, a deformable sketch generation network is constructed to model typical hand-drawn deviations, generating a large-scale sketch-image paired dataset with realistic geometric deformation features, effectively alleviating data scarcity. Second, a two-stage inpainting framework is designed: the first stage corrects geometric misalignment and repairs structural breaks in input sketches to optimize the sketches, while the second stage effectively integrates the optimized sketch information into the inpainting network to achieve collaborative optimization of global structural constraints and local texture generation. Experiments on benchmark datasets validate the method's effectiveness, achieving a peak signal-to-noise ratio (PSNR) of 25.78 dB and a structural similarity index (SSIM) of 0.852 on the CelebA-HQ dataset. The results fully demonstrate that this method effectively addresses the challenges of scarce user sketch data and geometric deviations while significantly improving the structural accuracy and perceptual quality of sketch-guided image inpainting.
  • SUN Wei, CHEN Jun Jie
    Accepted: 2025-10-13
    Maize is a vital economic crop, widely used in industry, animal husbandry, and grain-oil processing. Timely identification of maize diseases is crucial for ensuring stable yield. Currently, deep learning methods such as Convolutional Neural Networks (CNNs) have been widely applied to disease recognition. However, most existing methods rely solely on image information, overlooking features from other modalities. Moreover, their large parameter sizes and high deployment costs hinder practical applications. To address these challenges, we propose a lightweight image-text multimodal cache model, MF-cache, which contains only 0.061M parameters, achieving both low computational cost and high recognition accuracy. The model leverages the multimodal pre-trained model CLIP to extract image and text features, which are fused in parallel to form a key-value cache structure enriched with domain knowledge. Additionally, a weighted two-stage fusion mechanism is introduced to dynamically adjust the contribution of each modality to the classification outcome, enhancing both stability and interpretability. To improve robustness, various data augmentation strategies are employed to increase sample diversity and mitigate overfitting in low-data scenarios. Experimental results on a self-constructed dataset CornI&T and the public PlantVillage dataset demonstrate the effectiveness of the proposed method, achieving 99.72% and 98.80% accuracy, respectively. These results indicate that the method achieves excellent recognition performance while maintaining low computational overhead, offering an efficient and practical solution for crop disease detection. Furthermore, it highlights the potential of combining multimodal pre-trained models with few-shot learning in intelligent agricultural applications.
  • JIANG Yuhong, JIANG Qingquan, Zhang Rui, XI Huijuan, WU Jiongtao
    Accepted: 2025-10-13
    In e-commerce platforms, the volume of user click data is experiencing a rapid increase. Accurately modeling long-term behavior sequences of e-commerce users is crucial for capturing their preferences in recommendation systems. Currently, two-stage Click-through Rate (CTR) prediction models are widely used to forecast the CTR of users with long behavioral sequences. Specifically, the first stage employs approximate retrieval to filter subsequences related to the target item from massive historical behaviors, while the second stage performs fine-grained interest modeling on these subsequences. However, the two-stage model has two key issues: first, the second-stage process pays insufficient attention to the trend characteristics of user behavior; second, there exists a cross-stage semantic mismatch, which causes the second-stage subsequences to fail in fully conveying the users’ true interest structure. To address these issues, we propose a trend-aware probabilistic attention architecture. This model captures temporal trends in user behaviors and unifies interest representations across stages, significantly improving CTR prediction accuracy for long sequences. Experiments on two real-world e-commerce datasets show that our model outperforms state-of-the-art baselines, achieving up to 1.14% improvement in AUC and 4.2% in Logloss. This demonstrates that the model not only can identify the trend characteristics and dynamic preference structures in user behavior, but also verifies the optimization value of cross-stage semantic consistency.
  • YANG Chunxia, WANG Xin'ao, WANG Yulong
    Accepted: 2025-10-11

    High-accuracy air pollution prediction is crucial for environmental management and public health protection. To address the issues of spatiotemporal heterogeneity and multi-feature coupling in prediction tasks, this paper proposes a Multi- Decoupled Spatio-Temporal Dynamic Graph Convolutional Network (MD-STDGCN). The model aims to precisely capture the specific temporal patterns of local pollutant emissions and the dynamic interactions of cross-regional pollutant transport. The model first employs a dual-path self-supervised masked pretraining strategy for feature enhancement. The temporal path improves the ability to extract temporal features through local subsequence reconstruction, while the spatial path captures spatial heterogeneity via node sequence reconstruction. This mitigates the issue of representation degradation caused by distribution shift and heterogeneity. Second, the model introduces a multi-level residual decomposition and hierarchical prediction framework to progressively extract global temporal patterns, local spatiotemporal patterns, and short-term disturbances from the spatiotemporal series. The framework integrates channel-independent convolutions and multi-scale causal temporal attention for long-term trend modeling, an adaptive weight gating with dynamic graph convolution for directional and lagged transport, and GRUs for short-term fluctuations. Finally, multi-branch predictions are fused with dual-path enhanced representations to achieve end-to-end multi-step forecasting. Experimental results show that MD-STDGCN outperforms all baseline models with significant improvements in prediction accuracy across all datasets: on KnowAir, Yangtze River Delta, and KnowAir_V2, the average MAE is reduced by 7.34%、1.88% and 12.57%, and the RMSE is reduced by 7.64%、2.44% and 11.29%, respectively. By leveraging dual-path feature enhancement, multi-level decoupling, and dynamic graph learning, MD-STDGCN effectively alleviates the impact of feature entanglement and heterogeneity, improving both prediction accuracy and robustness. It can provide reliable support for air quality monitoring and governance decision-making.

  • FENG Guoping, CHEN Zhijian, Lin Zhiyu, HONG Liang
    Accepted: 2025-10-11
    This study explores automatic term recognition in the electric power domain, addressing challenges faced during its digital transformation, such as data silos and knowledge utilization. To improve the identification of specialized and new terms, a dynamic graph-assisted method combining large and small models is proposed. The approach enhances recall and precision through candidate term extraction and term classification. An initial knowledge graph is built using existing term databases. Target text-related nodes are queried and filtered with term features. A retrieval-augmented large language model extracts candidate terms, followed by adversarial training to develop a deep learning model for term classification. The dynamic term knowledge graph is iteratively updated based on classification results, forming a positive feedback loop. Experimental results show that the method's accuracy, recall, and F1 score improve over iterations, reaching 0.8647, 0.8565, and 0.8542, respectively, demonstrating superior performance compared to other term recognition methods.
  • LI Guang , ZHOU Yiqiang, GAO Xindan
    Accepted: 2025-09-29
    RGB-T (RGB-Thermal) semantic segmentation is a solution that enables reliable semantic scene understanding under poor lighting conditions or in complete darkness. Thermal imaging captures object infrared radiation features, providing stable edge detection under low-light conditions. This effectively compensates for the loss of texture details in RGB images under such environments. However, existing RGB-T semantic segmentation methods fail to fully utilize effective cross-modal information during multi-level interactions, leading to inaccurate predictions. To address this issue, this work constructs CMFANet (Cross-Modal Fusion Attention Network). First, it designs a cross-modal fusion module to establish complementary relationships between RGB and thermal features. Second, considering the importance of multi-dimensional and multi-scale information, a multi-dimensional attention module is introduced at the encoder to enhance deep feature extraction, while a multi-scale feature aggregation module is added at the decoder to capture texture details and contour information. Finally, the decoder integrates wavelet transforms with convolutional operations to improve segmentation accuracy. On the MFNet dataset, CMFANet achieves 73.8% in mean accuracy (mAcc) and 59.0% in mean intersection-over-union (mIoU). On the PST900 dataset, it attains 90.71% mAcc and 85.15% mIoU. Compared with existing cutting-edge methods, the model performs particularly well on key targets (such as cars, persons and bikes in MFNet, and survivors and backpacks in PST900). Visualization results verify its ability to effectively fuse RGB and thermal imaging modality information, restore texture details and target contours in low-light scenarios, and demonstrate better segmentation performance and strong generalization capabilities.
  • Xu Dai, Zhang Xiuzai, Yang Changjun, Zhong Yang, Guo Lin
    Accepted: 2025-09-29
    Accurate identification of water bodies in plateau lakes with high-resolution remote sensing images is of great significance for regional ecological protection and water resources management. Aiming at the insufficient multi-scale feature fusion and high-frequency detail attenuation caused by the low proportion of water bodies and easy loss of detailed features in the plateau scene, which leads to boundary blurring, omission of fine water bodies and mis-segmentation of complex scenes, we propose a two-branch multilevel fusion network based on the frequency domain-space domain synergy (Wavelet-ResNet-Swin Network (WRS-Net)). The low-frequency contour and high-frequency detail features of the water body are extracted by Adaptive Wavelet Decomposition, while a multi-stage ResNet50 is used to enhance the texture response by high-frequency gating units at the end of each stage to capture the spatial semantic information, Then the Cross Attention Fusion Module is designed to achieve the cooptimization of multi-scale semantics and details, combined with the Feature Alignment Module to solve the cross-layer feature misalignment problem; finally, the global context modeling is performed by Swin Transformer. Experiments on the self-constructed plateau lake dataset show that the Acc and mIoU metrics of WRS-Net are 96.52% and 93.44%, respectively, which are better than other comparative networks, and improve the accuracy of recognizing the water bodies of plateau lakes in remote sensing images.
  • LI Jie, LI Linsen
    Accepted: 2025-09-29
    With the development of logistics business, the collaborative delivery of unmanned aerial vehicle (UAV) swarms has become a key solution for cost reduction and efficiency improvement. In response to the demands of traditional delivery services and the constraints of UAVs themselves, a green collaborative delivery mechanism for UAV swarms under time window constraints is proposed. Firstly, a multi-task point delivery scenario is constructed, with parameters such as task time windows, task priorities, UAV payload capacity, and flight attitude-related energy consumption set. A multi-constraint model is established with the optimization goals of maximizing task benefits and minimizing energy consumption. Then, by discretizing the Zebra Optimization Algorithm, it is adapted to the discrete problems of UAV swarm path planning and task allocation. An individual coding rule is designed to guide the population to efficiently search in the solution space and generate delivery plans. Finally, simulation environments are built under different task scales and constraint conditions to systematically test and comparatively verify the proposed mechanism. Experimental results show that the proposed mechanism significantly outperforms IGCPA, AGA, and ACO algorithms in terms of energy consumption control, task benefits, and convergence speed. It can enhance delivery efficiency and reduce energy consumption while meeting complex task constraints, demonstrating promising engineering application prospects.
  • ZHANG Lina, ZHANG Chenyu, WANG Boyi, JIANG Tian, SHEN Tengfei
    Accepted: 2025-09-29
    The global spread of cardiovascular disease has made electrocardiogram (ECG) signal analysis a key tool for clinical diagnosis. However, the multi-label classification of ECG signals relies on the complete 12 leads, and faces challenges such as insufficient fusion of spatio-temporal features between leads and category imbalance.. To this end, an end-to-end deep learning model based on a few leads is proposed. The time domain features of ECG signals are extracted by a lightweight multi-scale inverse residual feature extraction module, and the time sequence dependence in the signals is captured by a sequential convolutional network and a bidirectional gated loop unit to improve the modeling ability of the model for complex spatio-temporal features. In order to optimize the feature fusion process, a bidirectional time-temporal cross-attention module is designed, which adaptively fuses multi-lead spatio-temporal information. To solve the problem of class imbalance, a dynamic weighted focus loss function is designed to enhance the ability of minority class recognition by dynamically adjusting sample weights. Experimental results on the CPSC-2018 dataset showed that the mean F1-score of the model reached 0.841 when only I, II and V1 lead signals were used, among which F1-score for atrial fibrillation and left/right bundle branch block were 0.942, 0.906 and 0.951, respectively. The experimental results on the PTB-XL dataset also perform well, confirming its application potential in resource-constrained environments and providing new ideas for ECG multi-label classification under reduced leads.
  • Zhang Dong, Peng Changgen, Tan Weijie, Cai Chuanda
    Accepted: 2025-09-29
    The proposal of searchable encryption provides an effective solution for encrypted search of cloud data, effectively alleviating the problem of limited local storage and computing resources. However, most current solutions mainly rely on keyword frequency statistics or single semantic retrieval, and cannot support retrieval tasks with both keywords and semantics; and most solutions generally adopt a tree storage structure, which is not efficient for retrieval of large-scale data sets. Therefore, this paper proposes an efficient hybrid ciphertext retrieval scheme based on the Milvus vector database and its built-in Hierarchical Navigable Small World (HNSW) data structure. The scheme uses the third-generation general text embedding model (BAAI General Embedding Model v3, BGE-M3) launched by Beijing Zhiyuan Research Institute to extract high-quality document semantic vectors and keyword vectors, encrypts the original vectors through cryptographic techniques such as AES, HMAC-based Extract-and-Expand Key Derivation Function (HKDF) and random matrix transformation, and uses the encrypted vectors to construct HNSW indexes and store them in the Milvus vector database. During retrieval, the semantic and keyword retrieval results are reordered through dynamic weighted fusion sorting, achieving real-time and efficient ciphertext retrieval in a large-scale data environment. At the same time, the scheme supports dynamic insertion, update and deletion operations and has good scalability. Experimental results on real data sets show that the proposed scheme improves retrieval efficiency and retrieval accuracy while ensuring data security and reducing computational overhead.
  • LIU Haonan, ZHOU Gang, LIU Jiangtao, JIA Zhenhong, WANG Jiajia
    Accepted: 2025-09-25
    Population dynamics of various insects during cotton growth directly impact agricultural decisions, making accurate population density data for different insect types a key basis for scientific cotton farming and pest management. In the pest detection task, Although the current small object detection algorithms can effectively detect small object insects , they often fail when dealing with larger insects. For this reason, this study proposes the MSDSR-YOLO(Multi-scale Dynamic Super-Resolution Reconstruction YOLO) object detection model, which utilizes the organic combination of image super-resolution technology and dynamic convolution to enhance the detection capability of small objects while further optimizing the detection performance of other scale objects. The model designs a new feature map super-resolution reconstruction network named SMAR-SRNet (Self-Modulated Attention-Residual Super-Resolution Network) and embeds it into the YOLOv11 model in conjunction with the P5-to-P3 feature fusion strategy, which realizes the accurate reconstruction of the deep features of the backbone and cross-layer fusion with the original shallow features, and enhances the detection ability of small object samples as well as the capture ability of both local and non-local features. Then, in this paper introduced omni-dimensional dynamic convolution (ODConv) into the backbone and neck structures of the network, and constructed the C3K2-OD module by combining with the C3K2 block, which improves the model's ability to capture rich contextual cues through an omni-dimensional dynamic convolution kernel and enhances the robustness of the network to multi-scale insect detection. Finally, this study constructed a yellow sticky board cotton field insect dataset XJ-CottonPest2024 in Xinjiang region containing seven different scales of cotton field insects. Experiments show that the proposed method achieves the best mAP50 values on both the self-built dataset and public dataset. And the comparative analysis of insect detection effects at different scale, it is further proved that the proposed network has the advantages in insect detection with small objects as the main focus and multi-scale coexistence, which is conducive to its application in the field of smart agriculture.
  • JiaKun LI, YanQing LIU, Fang DU, ZhenHua YU, Yu FENG, Hui Wang, XianHao HUO
    Accepted: 2025-09-25
    To address the challenges faced by general-purpose medical large language models (LLMs) in the field of brain tumor care—namely the scarcity of domain-specific data, limited clinical adaptability, and insufficient accuracy of generated content—this paper proposes BrainTumorLLM, a specialized large language model tailored for brain tumor diagnosis and treatment. Built upon the Meta-Llama-3-8B-Instruct foundation model, BrainTumorLLM is optimized through Supervised Fine-tuning (SFT) and Reinforcement Learning with Human Feedback (RLHF), and trained using a self-constructed, high-quality dataset named BrainTumorQA. This dataset comprises 11,000 question-answer pairs, encompassing both macro-level medical knowledge (symptoms, diagnostic methods, treatment strategies) and micro-level clinical cases, including 1,252 de-identified real-world brain tumor MRI reports, with privacy safeguarded via anonymization and information constraint strategies. From a technical perspective, Low-Rank Adaptation (LoRA) is employed to enhance training efficiency. A two-tier prompting framework is designed to guide the model in generating domain-specific responses at both macro and micro levels. Furthermore, human feedback learning is integrated through an expert preference-driven optimization mechanism and the Proximal Policy Optimization (PPO) algorithm, reinforcing the clinical consistency of the generated content. Experimental results demonstrate that BrainTumorLLM significantly outperforms both general-purpose and medical-domain models on brain tumor-related question answering tasks. In automatic evaluations, it achieves BLEU-1 and BLEU-2 scores of 0.3383 and 0.2684, respectively, and ROUGE-1, ROUGE-2, and ROUGE-L scores of 0.3237, 0.1466, and 0.2611. Moreover, the model’s perplexity is substantially reduced from 20.362 (base model) to 7.674, highlighting its domain-specific precision, professional accuracy, and potential for clinical application. BrainTumorLLM offers a robust AI-powered tool to support brain tumor diagnosis, treatment planning, and medical research.
  • GENG Yongkang, PANG Chunying, Li Jia, Zhou Weikun, Ma Shengzhe
    Accepted: 2025-09-23
    In recent years, multimodal magnetic resonance imaging (MRI) technology has demonstrated significant advantages in brain disease diagnosis and brain network analysis. However, challenges remain in effectively optimizing and correlating multi-modal data such as rs-fMRI (resting-state functional connectivity) and DTI (white matter fiber structure), while extracting low-dimensional brain network features with strong topological representation capabilities. To address the issues of feature optimization and topological information capture and utilization in multimodal brain network analysis (rs-fMRI/DTI), this paper proposes a joint optimization framework. First, to mitigate feature distribution shifts and modal heterogeneity, we propose a SPAMS-based multimodal dictionary learning data enhancement strategy. By jointly optimizing functional connectivity brain networks and diffusion tensor brain networks, we construct shared sparse dictionaries to generate anatomically-functional consistent enhanced data, thereby improving inter-group similarity and feature quality. Second, to effectively capture complex topological information in brain networks, we introduce a Riemannian manifold-constrained loss autoencoder (RM-Loss AE). This model constructs feature space as a positive definite matrix manifold and incorporates optimization reconstruction losses such as the log-euclidean metric. Comprehensive experiments on the ADNI (Alzheimer's Disease) and ABIDE-II (Autism Spectrum Disorder) datasets demonstrate that our proposed method significantly enhances key metrics including feature separability (Fisher Score), classification performance (AUC), and the coupling strength between rs-fMRI and DTI modalities. This breakthrough establishes a novel paradigm for multimodal brain network representation learning, advancing its application in precision medicine.
  • YAN Ping, YANG Jielong, HUANG Daoyuan, ZHONG Shifeng
    Accepted: 2025-09-19
    Reinforcement learning faces the challenge of difficulty in designing reward functions in robot control, while imitation learning, although it avoids the problem of reward engineering, relies on high-cost expert motion data. To this end, the research proposes a zero-motion imitation learning framework for robotic arms based on Predictive-Collaborative Optimization. This method integrates model predictive control (MPC) with Bayesian correction of the maximum a posteriori (MAP), achieving precise control of the robotic arm through multi-step action sequence optimization, while eliminating the reliance on expert action data and manual reward design. The core of the framework is to utilize the rolling optimization mechanism of MPC, aiming to minimize multi-step state errors, dynamically adjust the action sequence, and enhance robustness against noise and prediction uncertainties. During this process, the MAP method is introduced into single-step optimization, where each action is corrected through prior distribution and likelihood, thereby enhancing the local rationality and efficiency of action optimization. Unlike traditional methods, this framework relies only on expert states rather than expert actions. It generates the target state through a prediction model, avoiding the difficulty of collecting expert action data and simultaneously overcoming the problem of accumulated prediction errors. The experimental results show that this method outperforms the existing baseline methods in various robotic arm simulation tasks, with an average return increase of approximately 45.8% and a prediction error reduction of approximately 50.7%. It demonstrates higher action execution accuracy and adaptability to complex environments, and has achieved stable control on a real robotic arm platform, verifying the potential for cross-platform engineering.
  • CHEN Ziliang, ZHONG Yuan, LI Ping
    Accepted: 2025-09-19
    Under the federated learning framework, participants collaborate to train global models by sharing model parameters instead of raw data, and this distributed training approach brings new security challenges while protecting data privacy. Because distributed local training is difficult to supervise, federated learning systems are more vulnerable to model poisoning attacks. Most existing model poisoning attack methods operate on all parameters of the model, and significant changes to the model can be detected more easily through statistical similarity checking. In order to further analyze the possible stealthy ways of this type of attack methods, a model poisoning attack method (FedMSP) for federated learning sensitive parameter perturbation is investigated. This method accurately identifies the sensitive parameters that have a significant impact on the model performance by analyzing the gradient change of the model parameters and applies perturbations to these sensitive parameters to improve the anti-detectability of locally-poisoned models and reduce the overall model performance. In addition, an attack mechanism based on distance and direction invariance is proposed. By keeping the distance and direction of the attack vectors invariant, this mechanism enables the attacker to effectively circumvent the existing defense mechanisms and significantly improves the success rate of the model poisoning attack. The experimental results show that, constructing the federal prediction model for Fashion-MNIST and CIFAR-100 datasets, when there is no defense condition, the attack method reduces the test accuracy of the model from the original 99.48% and 61.37% to 14.43% and 8.27%, respectively; after adding the defense mechanism, the accuracy of the model is rebounded to 15.75%, 10.87%, but still significantly lower than the normal level. In addition, FedMSP demonstrates optimal or near-optimal attack effects in multiple security aggregation algorithms, which fully proves its ability to reduce model performance and slow down convergence speed, and provides new perspectives and challenges for the security research of federated learning.
  • YAN Yan, WANG Long, KOU Xinyu
    Accepted: 2025-09-19
    Addressing the issues of low trajectory utility and inadequate privacy protection in existing trajectory privacy protection methods, this paper proposes a generative adversarial network-based trajectory privacy protection scheme utilizing Peephole LSTM. This scheme designs a generator model that integrates a peephole link mechanism, enabling each gate unit to adaptively adjust based on the real-time values of cell states, thereby more effectively perceiving contextual information and capturing dependencies within trajectory sequences; the discriminator uses a long short-term memory network to determine the authenticity of synthesized trajectories. Through adversarial training between the generator and discriminator, trajectory data that aligns with the original statistical features is generated, reducing the probability of attackers identifying users and thereby enhancing the privacy protection of user trajectory information. Given the multidimensional nature of trajectory generation tasks, a new trajectory loss function is designed to measure the similarity loss between synthetic and real trajectories in terms of spatial, temporal, and point-of-interest category dimensions. Experiments conducted on the real-world semantic trajectory dataset Foursquare NYC, including trajectory-user linking tasks, demonstrate that compared to models such as LSTM-TrajGAN and TCAC-GAN, the synthetic trajectories generated by this approach not only reduce the probability of re-identification but also better preserve the spatial, temporal, and POI category attribute features of the original trajectories. This effectively balances the privacy and utility of trajectory data, ensuring its effectiveness in spatio-temporal analysis and geospatial applications.