Just Accepted

Select

Super-Resolution-Driven Climate Downscaling via Implicit Neural Representation

Li Haoxuan, Zhang Zhiyuan, Liu Rui, Xu Peihua, Tian Xin

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252854

Accepted: 2025-11-27

Abstract (5) Download PDF (2)

Knowledge map

Save

High-resolution climate data is crucial for local and regional-scale production and livelihoods, while deep learning-based downscaling techniques can effectively bridge the gap between existing low-resolution climate data and application requirements. Deep learning-based downscaling methods that can generate high-resolution climate data hold considerable significance for both local and regional production activities. However, existing methods are often constrained by fixed scaling factors, leading to high training costs in multi-scale scenarios. Meanwhile, their results in climate data are usually blurred and inaccurate in high-frequency details. To address these limitations, this study proposes a deep learning super-resolution network that fuses implicit neural representation and adaptive feature encoding for arbitrary-scale climate downscaling. In detail, the method designs the dynamic pixel feature aggregation module to dynamically adjust the feature encoding process through a learnable modulator, which can adapt to different scaling factors. Besides, the implicit neural representation for the images is designed to predict continuous-domain pixel values by fusing coordinate linear differences features and neighborhood nonlinear features via an attention mechanism. Finally, combined with a high-order degradation training strategy, experiments on the ECMMWF HRES and ERA5 datasets demonstrate that the proposed method achieves a PSNR improvement of at least 0.7 dB at ×2 scaling factor compared to fixed-ratio methods, and outperforms existing arbitrary-ratio methods by at least 0.48 dB under the same scaling condition. These quantitative results demonstrate that our approach is superior to existing methods, as it provides a more flexible and efficient solution for meteorological data processing.

Select

Multi-Target Path Planning Method for Intelligent Mobile Patrol Systems

SONG Chengqun, ZHANG Ke, YANG Mengjie, CHENG Jun

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252655

Accepted: 2025-11-26

Abstract (11) Download PDF (4)

Knowledge map

Save

To address the inefficiency and safety risks of manual patrols in large facilities and complex venues, this study aims to balance global coverage and the prioritization of high-risk areas while improving the efficiency and robustness of path planning. We propose a risk-aware Intelligent Patrol Strategy (IPS): (i) model patrol as a combination of comprehensive and single patrols; (ii) build static/dynamic risk heat map via a Gaussian Mixture Model (GMM); and (iii) design a tanh-based target-point updating method to suppress clustering and balance risk and spatial distribution. For path generation, we develop a Multi-Target Rapidly-exploring Random Tree (MT-RRT) algorithm comprising Multi-Target Feasible Path Planning (MTFPP) and Information Subset Optimization (ISO). MTFPP estimates feasible inter-point costs with an improved RRT-Connect and determines the visiting order using Ant Colony Optimization (ACO), yielding a single feasible path through all targets. ISO samples within an ellipse-shaped informed subset and applies RRT*-style rewiring to iteratively refine that path into a shorter and smoother one. Simulations show that, compared with Euclidean-distance baselines, our method significantly reduces final path length and improves success rate and convergence under limited iterations; it achieves full-area coverage while assigning higher patrol frequency to high-risk regions, making it suitable for industrial plants, hazardous-material warehouses, and large public buildings.

Select

A Review of Anomaly in LLM-Based Multi-Agent Systems

ZHANG Longyao, Wen Dongxin, MA Zhuangyu, SHU Yanjun, LI Qing, LIU Mingyi, ZUO Decheng

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252754

Accepted: 2025-11-26

Abstract (5) Download PDF (1)

Knowledge map

Save

Large Language Model-based Multi-Agent Systems have demonstrated significant potential in handling complex tasks. Their distributed nature and interaction uncertainty can lead to diverse anomalies, threatening system reliability. To systematically identify and classify such anomalies, this study conducts a comprehensive review. The research selected seven representative multi-agent systems and their corresponding datasets, collecting 13,418 operational traces, and employed a hybrid data analysis method combining preliminary LLM analysis with expert manual validation. A fine-grained, four-level anomaly classification framework was constructed, encompassing Model Understanding and Perception Anomalies, Agent Interaction Anomalies, Task Execution Anomalies, and External Environment Anomalies, and typical cases were analyzed to reveal the underlying logic and external causes of each type of anomaly. Statistical analysis indicates that Model Understanding and Perception Anomalies account for the highest proportion, with "Context Hallucination" and "Task Instruction Misunderstanding" being the primary issues. Agent Interaction Anomalies represent 16.8%, primarily caused by "Information Concealment." Task Execution Anomalies make up 27.1%, mainly characterized by "Repetitive Decision Errors." External Environment Anomalies constitute 18.3%, with "Memory Conflicts" as the predominant factor. In addition, model perception and understanding anomalies often act as root causes, triggering anomalies at other levels, highlighting the importance of enhancing the fundamental capabilities of the model. These classification and root cause analysis aims at providing theoretical support and practical reference for building highly reliable LLM-based multi-agent systems.

Select

Research on Watermarking Attack Technology of Computer Vision Models

Wang Wen, Yang Kuiwu, Tong Songsong, Wei Jianghong, Xue Yan, Zhou Rongkui

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428. 0252743

Accepted: 2025-11-26

Abstract (4) Download PDF (3)

Knowledge map

Save

Model intellectual property protection has become an issue that cannot be ignored in model security. Watermarking technology, as the core means of model traceability, provides technical support for copyright verification by embedding special identifiers into model parameters or generated content. However, the trained watermarked models are very easy to be copied and spread, which enables attackers to destroy or remove the watermarks embedded in DNN models through specific technical means such as fine-tuning, pruning, or adversarial sample attacks, making it impossible to verify the model ownership. To gain a deeper understanding of model watermarking attack methods, this paper first introduces model watermarking attacks, then classifies the model watermarking attack methods into two categories: white-box watermarking attacks and black-box watermarking attacks, based on the attacker's access rights and information acquisition capabilities to the target model. It also sorts out and analyzes the motives, hazards, attack principles, and specific implementation methods of DNN model watermarking attacks. Meanwhile, it compares and summarizes the existing research on model watermarking attacks from the aspects of attacker capabilities and performance impacts. Finally, it further explores the potential positive role of neural network model watermarking attacks in future research and provides suggestions for in-depth research in the fields of model security and intellectual property protection.

Select

A Survey of Post-Training Quantification Methods

ZHANG Junna, WANG Hongzun, DING Chuntao

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252721

Accepted: 2025-11-25

Abstract (7) Download PDF (5)

Knowledge map

Save

Post-Training Quantization (PTQ) is an efficient model compression method that converts the parameters of high-precision floating-point models into low-bit integer representations without the need for retraining, using only a small amount (or no) unlabeled calibration data. This method significantly reduces storage and computational overhead while maximizing the retention of the original model's inference accuracy, making it widely recognized and adopted in both academia and industry. This paper systematically summarizes the research progress of PTQ from four dimensions: quantization steps, method classification, tool ecosystem, and application advancements.First, a clear framework for the quantization process is constructed, covering steps such as dynamic range statistics, quantization parameter calculation, weight and activation quantization, error optimization, and model generation. Second, a complete classification system for quantization methods is proposed, which includes quantization granularity, bit width, calibration methods, and structure-guided quantization. Third, the tool ecosystem supporting the large-scale application of PTQ is analyzed, discussing its value in hardware adaptation and engineering deployment. Finally, the paper summarizes the integration and application progress of PTQ methods and highlights the challenges faced in practice, especially those related to cross-modal consistency, extremely low-bit semantic collapse, and hardware adaptation. These practical challenges not only reveal the limitations of current technologies but also provide important directions for future research. This review provides a reference framework for PTQ methods for both academia and industry, facilitating the widespread application of artificial intelligence in resource-constrained scenarios.

Select

Multi-hop Graph Convolutional Networks Based on Semi-decoupling Technique and Knowledge Jump Connections

ZHANG Ke, CHEN Jiahao

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252683

Accepted: 2025-11-21

Abstract (32) Download PDF (13)

Knowledge map

Save

Multi-Hop Graph Convolutional Network (Multi-Hop GCN) has achieved certain results in alleviating the over-compression problem. However, the multi-hop propagation design has specific parametric information compression loss during the information aggregation process and is sensitive to the local topological structure, which makes it difficult for this type of model to achieve an ideal prediction effect when performing node classification tasks. To address the above problems, this paper starts from the intra-layer and inter-layer perspectives of the multi-hop graph convolutional model, and uses a decoupling-based technique inspired by predictive propagation decoupling and a knowledge jump module to solve the above issues, thereby constructing a new type of multi-hop graph convolutional network—the Knowledge-Semi-Decoupled Multi-Hop Network DrJK-Net. Firstly, a semi-decoupling technique that retains the activation function is proposed to simplify the intra-layer structure of the multi-hop propagation layer. By removing the linear layer in the hidden layer, the number of feature changes during the multi-hop propagation process is reduced, and the parametric information compression loss is decreased. Then, a knowledge jump connection is added between the propagation layers. By connecting all hidden layer embeddings, the model's adaptive selection ability of hidden layer embeddings is improved, and the sensitivity to the local topological structure is reduced. Subsequently, the multi-hop graph convolutional skeleton is combined with the semi-decoupling technique for simplifying intra-layer information propagation and the knowledge jump connection module for establishing inter-layer information channels, proposing a model framework DrJK-Net with lower parametric information compression loss and stronger adaptability to the local topological structure. Finally, comparative experiments and ablation experiments are carried out on multiple public paper networks such as Citeseer, CoraFull, and Actor, as well as social network datasets. The results of the comparative experiments show that DrJK-Net surpasses most cutting-edge models in node classification accuracy and has a significant advantage in running speed. The results of the ablation experiments further verify the effectiveness of the proposed semi-decoupling technique and the introduced knowledge jump connection mechanism, providing new ideas and methods for the development of multi-hop graph convolutional networks.

Select

Audio and Video Emotion Recognition Based on Multiscale Attention and Multi-Expert Coordinated Decision Making

NIU Yan, SUN Yang, LI Jun

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252724

Accepted: 2025-11-21

Abstract (20) Download PDF (5)

Knowledge map

Save

Multimodal emotion recognition aims to understand complex human emotion expressions, however, existing methods generally face the challenges of insufficient accuracy and robustness when dealing with nuances of emotion expressions and complex inter-modal interactions. Specifically, traditional speech feature extraction methods are difficult to comprehensively capture emotion information across multiple time scales, and existing fusion strategies are limited in their efficiency in integrating complementary information and dealing with complex inter-modal associations, while category imbalance and boundary sample problems often lead to degradation of model performance. Aiming at the above problems, this paper proposes a new method for multimodal emotion recognition using speech and facial images. The method firstly introduces a multiscale attention mechanism in the speech feature extraction stage, replacing the traditional multilayer perceptron, which can adaptively focus and capture the emotion features from microscopic phoneme changes to macroscopic rhythmic patterns, and realize a more comprehensive emotion information extraction; secondly, a adaptive multi-expert collaborated decision making architecture is designed, which can be used to recognize the emotion information through expert networks and an adaptive multimodal expert coordination network. Adaptive Multimodal Expert Coordination Network, which efficiently integrates complementary information of different modalities and handles complex interactions between modalities; finally, a boundary

Select

Steel surface defect detection based on multi-scale interaction and dynamic collaboration

Guo Wei, Meng Qiaoqiao, Jin Haibo, Tian Congcong

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252887

Accepted: 2025-11-20

Abstract (11) Download PDF (3)

Knowledge map

Save

In the field of industrial quality inspection, there are common problems in the detection of steel surface defects, such as insufficient fusion of target features, missed detection of fine edge defects, and unbalanced sample classification. Therefore, a steel surface defect detection algorithm based on multi-scale interaction and dynamic collaboration is proposed. In the backbone network, by fusing the shifted sparse convolution and inverted residual structure, the interactive fusion of defect features under different receptive fields is strengthened, and the feature expression ability of multi-scale defects is improved. Introduce the large separation kernel attention mechanism to dynamically enhance the feature response to fine defect areas and reduce the missed detection rate of cracks and inclusions. In the neck network, by combining the DySample dynamic upsampling strategy, dynamic upsampling based on defect content is achieved, which not only improves the clarity of the defect contour of small targets but also reduces computational redundancy, adapting to the deployment of edge devices. In addition, an EMASlideLoss loss function integrating exponential moving average and sliding threshold mechanisms is designed to dynamically balance the learning weights of difficult and easy samples, thereby improving the detection deviation caused by the uneven distribution of defect samples. Experiments on the NEU-DET dataset show that the mean mAP50% of the average accuracy of this algorithm reaches 84.4%, which is 5.8% higher than that of the original YOLO11n. While the precision and recall rates increase by 5.2% and 4.8% respectively, the computational load decreases by 8%. This algorithm not only optimizes the computational efficiency but also improves the detection accuracy, and is more capable of meeting the detection requirements in industrial scenarios.

Select

A Survey on Optimizing Log-Structured Merge-tree Based on Computational Storage Technology

LIU Ying, ZHANG Runyu , YANG Chaoshu

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252595

Accepted: 2025-11-20

Abstract (4) Download PDF (3)

Knowledge map

Save

The Log-Structured Merge tree (LSM-tree) has been widely adopted in key-value storage systems due to its high write performance enabled by sequential write operations. However, it also suffers from issues such as high read/write amplification, significant compaction overhead, and data redundancy. Traditional optimization approaches aim to improve system performance by modifying tree structures, refining compaction strategies, and adopting key-value separation mechanisms. In the era of big data, the rapid growth of data volume leads to increasingly frequent write and compaction operations in LSM-tree systems, placing continuous pressure on CPU computing resources and gradually turning them into performance bottlenecks. Moreover, traditional solutions fail to fundamentally avoid the substantial I/O traffic between the host and storage devices, resulting in high overhead due to redundant data movement. Computational storage technology offers a promising solution to these challenges. By integrating computing resources at the storage layer, it enables task offloading to alleviate the CPU's workload and supports near-data processing to reduce the performance overhead caused by data migration. This survey focuses on optimization strategies for LSM-tree based on computational storage. First, the architecture of computational storage is reviewed. Then, in response to the major bottlenecks under the big data context, existing solutions are classified and compared from two perspectives: compaction optimization and data migration optimization. Finally, potential future research directions are suggested to provide insights in this field.

Select

DTN-DETR: Day-To-Night Domain Adaptation Nighttime Object Detection Transformer

Gong tong , Lu Xiaoli, Sang yu, Li Siman, Yu Bowen

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252921

Accepted: 2025-11-19

Abstract (12) Download PDF (13)

Knowledge map

Save

Nighttime object detection presents significant challenges due to the low luminance of targets and the high cost of manually annotating large-scale nighttime datasets, making supervised training difficult. To address these issues, a domain adaptation method DTN-DETR for object detection tailored to nighttime imagery based on improved RT-DETR is proposed. First, a Photometric Consistency Matching is designed to generate a synthetic dataset resembling the nighttime domain by aligning the photometric properties of the daytime source domain with the nighttime target domain. Second, a backbone network improved Bidomain Refinement Module (BRM) is proposed, which comprises two key components: the Feature Refinement Module (FRM) and the Bidomain Information Interaction (BII) module. The FRM eliminates redundant information in the feature channels. The BII module leverages the interaction between the frequency and spatial domains to handle glare and noise with inconsistent frequency characteristics, addressing the coupling phenomena of multiple local light sources in nighttime scenes. Finally, a P2 detection head is introduced, which enhances the perception of small objects in nighttime scenes through multi-level feature fusion. Experimental results on the public datasets BDD100K, SODA10M and Foggy Cityscapes demonstrate that the proposed method significantly outperforms existing state-of-the-art approaches in object detection tasks, validating its effectiveness and robustness.

Select

Coal Mine Underground Image Enhancement Method Integrating Convolution and MLLA

TAN Taizhe, YANG Yang, ZHAN Yinwei, YANG Zhuo

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252783

Accepted: 2025-11-14

Abstract (25) Download PDF (36)

Knowledge map

Save

The complex lighting environment underground in coal mines leads to low contrast and blurry details in images. Existing image enhancement algorithms have insufficient feature capture capabilities and inefficient fusion methods for semantic features at different levels. This paper proposes an underground coal mine image enhancement method (ICM) that combines convolution and MLLA (Mamba Like Linear Attention). In the convolution stage, multiple mixed expert modules with degradation perception are stacked to enable the model to adaptively restore local texture details lost during image enhancement, solving the problems of artifacts and unclear detail features. Using an MLLA module with background perception capability to model long-term dependencies in images to improve the global structural consistency and texture fidelity of output enhanced images. Introducing interactive fusion branches to encode the stage correlation between backbone features and reconstructed features, effectively utilizing local and global features to assist in image enhancement. The segmented loss function sets different loss objectives at different enhancement stages, enabling the network to adaptively optimize at each stage. Compared with recently excellent deep learning methods, the ICM method shows the best performance in evaluation metrics PSNR, SSIM, NIQE, and LPIPS, with values of 30.524dB, 0.946, 3.06, and 0.23, respectively. It can effectively improve the brightness, contrast, and clarity of low light images in coal mines, providing reliable visual support for mine safety monitoring and intelligent decision-making.

Select

Research on Cancer Survival Prediction Based on Multi-level Optimal Transport

Jie DUAN, Lijuan SONG, Zirui MA

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252923

Accepted: 2025-11-13

Abstract (27) Download PDF (20)

Knowledge map

Save

Deep learning–based survival prediction has advanced the integration of whole-slide images (WSI) and genomics, yet the ultra–high resolution of WSIs and the high dimensionality of transcriptomics pose substantial challenges for feature extraction and cross-modal fusion. Although prototype aggregation reduces computational burden by compressing tiles and gene expressions into morphological and pathway prototypes, two key bottlenecks remain: accurately capturing fine-grained interactions between modality-specific prototypes, and addressing the pronounced representational heterogeneity between WSI morphological prototypes and genomic pathway prototypes. To tackle these issues, we propose a weakly supervised survival prediction model based on multi-level optimal transport (MOTSurv), comprising three synergistic innovations: first, a dual-modality prototype encoder—integrating a Pyramid Position Encoding Generator (PPEG) in the pathology encoder and modeling intra-pathway dependencies in the pathways encoder—to strengthen intra-modality structure while preserving modality specificity; second, a cascaded multi-level optimal transport fusion mechanism that performs coarse global alignment followed by refined matching with error correction, balancing alignment accuracy and information preservation; and third, an Orthogonal Disentanglement Module (ODM) that enforces multi-level constraints—inter-modal specificity orthogonality, intra-modal specificity–shared orthogonality, and global specificity–shared orthogonality—to achieve explicit feature disentanglement and enhance interpretability. Experiments on the TCGA BLCA, BRCA, and LUAD datasets demonstrate that MOTSurv improves C-index by an average of 4.22% over state-of-the-art methods. Ablation studies further validate the independent and synergistic contributions of each module, highlighting the model’s comprehensive advantages in multimodal alignment, structured representation, and biological interpretability.

Select

Zero-Shot Skeleton Action Recognition via Dual Discriminators and Spatiotemporal Self-Calibration

WANG Zeyu , JI Genlin, ZHU Wei

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252717

Accepted: 2025-11-13

Abstract (28) Download PDF (6)

Knowledge map

Save

Zero-shot skeleton-based action recognition uses text label descriptions and skeleton action sequences to distinguish visible and unseen categories of actions. Existing methods are usually limited by the problem of low generation quality in visual feature, so we cannot accurately align semantic, resulting in poor performance in identifying similar actions. To address this issue, this paper proposes a method based on dual discriminators and spatiotemporal self-calibration (DD-STSC) to explore visual semantic alignment. This method combines variational autoencoders and generative adversarial networks, using discriminators and generators for adversarial training to mine the differential information among different features. At the same time, it better separates useful information from useless information during disentanglement, thereby further improving the quality of generated samples. In addition, this paper introduces action self- calibration module(ASCM). By learning the skeleton information in the spatiotemporal direction, the required key motion information can be obtained more effectively, so as to improve the accuracy of classification tasks. Experiments on several widely available datasets NTU60, NTU120, and pku51 demonstrate that the proposed method outperforms the existing mainstream methods.

Select

Research on Weakly Supervised Semantic Segmentation Methods Based on Multi-modal Contrastive Learning

XU Haizhe, HUANG Lingxiao, YAO Xinbo, GAO Yongzhan, ZHOU Kaiyuan

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252846

Accepted: 2025-11-13

Abstract (23) Download PDF (20)

Knowledge map

Save

The study addresses the critical challenges in weakly supervised semantic segmentation (WSSS) based on contrastive language-image pre-training (CLIP), such as inadequate fine-grained semantic alignment of images, limited perception of local details in text context, and insufficient local detail perception along with noise propagation in pseudo-label images. To tackle these issues, we propose the Feature Fusion Contrastive Learning framework (FFCLIP), a novel architecture that leverages a frozen CLIP model as the backbone and integrates three innovative modules—Panoramic Perception Attention (PPA), Rectangular Calibration Module (RCM), and Weighted Cross-modal Fusion (WFF)—to effectively enhance cross-modal semantic alignment, refine local boundary perception, and improve the quality of generated pseudo-labels. The multi-stage weakly supervised semantic segmentation training framework based on the CLIP backbone network achieved mIoU scores of 76.9% and 77.5% on the VOC2012 validation and test sets, respectively, surpassing the mainstream method CTI by 2.8% and 4.3%. On the COCO2014 dataset, it attains an mIoU of 47.1%, significantly outperforming baseline models like CPAL. Experimental results demonstrate that FFCLIP substantially enhances semantic segmentation accuracy under weak supervision while maintaining low computational overhead, with only 6M additional parameters and a peak GPU memory consumption of 6.2GB, thereby offering a novel direction for integrating multi-modal learning with weakly supervised segmentation. Code link: https://github.com/xuwudang/FFCLIP

Select

Log Anomaly Detection Method Based on Dual-Granularity Spatio-Temporal Modeling

SU Na, PEI Houqing, XU Li , WANG Jingjun , JI Shujuan

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252847

Accepted: 2025-11-11

Abstract (26) Download PDF (16)

Knowledge map

Save

Existing log anomaly detection techniques often neglect temporal contextual information in semantic modeling, exhibit insufficient modality fusion capabilities, and generally over-rely on log parsing. These limitations make it difficult for models to capture complex patterns where sudden semantic content changes coexist with temporal behavioral anomalies. To address these challenges, this paper proposes a model that operates without log parsing (Log Spatio-Temporal Fusion, LogSTF). This model employs a dual-branch architecture for semantic and temporal processing. The semantic branch extracts context-aware semantic features, while the temporal branch models both local bursts and global evolution through dual-granularity at temporal and sequence levels. Building upon this foundation, bidirectional cross-attention achieves modal fusion, explicitly establishing fine-grained dependencies between semantics and time. This enhances the model’s ability to represent and discern complex log behaviors. Experiments conducted on three public log datasets—HDFS, BGL, and Thunderbird— results show LogSTF achieves F1 scores of 99.64%, 98.45%, and 99.67% respectively across the three datasets. Compared to the two state-of-the-art models LAnoBERT and LogFormer, LogSTF demonstrates average relative F1 improvements of 5.20% and 2.03%. Ablation experiments validate the critical role of temporal information and modality collaboration in performance enhancement. Robustness testing under lightweight semantic perturbations validated LogSTF’s stability and generalization capabilities under suboptimal log conditions. This approach achieves high-precision detection of complex anomaly patterns without requiring log parsing.

Select

Research on Ship Trajectory Prediction Based on Geographical Constraints and Multi-Method Fusion

Li Xu, Luo Dezhe, Wang Hongjun

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252642

Accepted: 2025-11-10

Abstract (21) Download PDF (12)

Knowledge map

Save

With the rapid development of global maritime transportation, ship trajectory prediction plays an important role in shipping safety and management. However, achieving high-precision and physically feasible continuous trajectory prediction remains a key challenge due to the large-scale ship trajectory data and the uncertainty of complex maritime environments. Traditional prediction methods have limitations in handling complex maritime environments and large-scale dynamic data. To address these challenges, this paper proposes a geographically constrained multi-method fusion ship trajectory prediction model. The model introduces a geographical constraint loss function to optimize the accuracy, heading stability, and physical feasibility of trajectory predictions. Additionally, a multi-method fusion network structure is designed, incorporating bidirectional gated recurrent units, attention mechanisms, and multi-scale convolutions, which enhances the ability to extract temporal features and integrate multi-scale information. Experimental results demonstrate that the proposed model achieves lower prediction errors across multiple maritime datasets, with particularly significant advantages in long-term predictions compared to existing models. The study confirms that this model offers high accuracy and stability in ship trajectory prediction, providing effective support for practical applications in the maritime field.

Select

Robustness Enhancement of Recommender Systems Based on a Two-Stage Defense Framework

YAO Xun, HE Yuan, HU Xinrong, YANG Jie

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252806

Accepted: 2025-11-10

Abstract (19) Download PDF (12)

Knowledge map

Save

Sequential recommender systems excel at capturing users' dynamic interests, yet their open nature makes them highly vulnerable to data poisoning attacks. Attackers can effectively manipulate recommendation outcomes by altering the textual descriptions of items, posing a severe challenge to model robustness. Existing defense strategies, which primarily rely on static rules or fixed-intensity perturbations, struggle to counter the growing complexity and variability of semantic-level textual attacks.To address this challenge, we propose RADAR, a two-stage collaborative defense framework. This framework synergizes robustness enhancement at the training stage with real-time protection at the inference stage. First, during training, it employs dynamic adversarial training to bolster the model's intrinsic resilience against unknown textual perturbations. Second, at inference, it leverages a Large Language Model (LLM) for precise semantic-level anomaly detection and content restoration.Experimental results demonstrate the superior defense performance of RADAR. In attack tests on the Scientific dataset, compared to the strongest baseline model（Cert-LLM）, RADAR reduces the exposure increase of malicious items from 3.1796% to just 0.9921%. This powerfully validates the framework's effectiveness in enhancing the security and robustness of sequential recommender systems.

Select

A double quantum image encryption algorithm based on chaotic system

GUO Yang, SUN Jing-yu

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252605

Accepted: 2025-11-07

Abstract (23) Download PDF (22)

Knowledge map

Save

With the development of quantum computing technology, traditional image encryption algorithms are facing the challenge of insufficient quantum attack resistance, while existing quantum image encryption algorithms have problems such as high quantum bit consumption and limited parameter space of chaotic systems. To address the above unsolved problems, this paper proposes a dual-quantum image encryption algorithm based on a chaotic system, aiming to achieve a balance between low resource consumption and high security. Firstly, a dual-bit-plane quantum image representation model (DBRQI) is proposed, which only requires 2n+4 quantum bits to store a grayscale image, reducing quantum bit consumption by 50% compared with the BRQI model. Secondly, a 3D hyperchaotic system (3D-CHCMM) is constructed: the parameter space of its 4 control parameters is increased by 33% compared with existing systems, and its 3 Lyapunov exponents are all positive. Moreover, the system has passed 15 NIST tests, enabling it to generate pseudorandom sequences with high randomness. The algorithm maps quantum states through DBRQI, scrambles pixel information via odd-even bit-plane scrambling and random row-column scrambling, and then performs an XOR operation with the pseudorandom sequences to generate ciphertext. Experimental results show that the horizontal correlation of the encrypted image is as low as 0.0041, the information entropy reaches 7.9993, and the NPCR is 99.6251%, indicating that the algorithm’s attack resistance and anti-interference capability are significantly enhanced. The algorithm in this paper provides an efficient solution for image encryption in current scenarios with limited quantum hardware.

Select

CAFR-YOLO: A Multi-Scale Object Detection Algorithm Based on YOLOv8

Zhang Yao, Zhang Junsan, Ma Junpeng, Yao Zongquan, Liu Tianyi

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252821

Accepted: 2025-11-07

Abstract (29) Download PDF (31)

Knowledge map

Save

This paper proposes an improved YOLOv8-based model named CAFR-YOLO to address the issues of insufficient cross-level feature interaction and limited feature representation capability in multi-scale object detection under complex scenes. First, a novel cross-scale feature reorganization pipeline was designed, constructing the Channel Attention-guided Feature Reorganization (CAFR) module. By using a specific layer as the fusion backbone and incorporating scale alignment, attention-weighted fusion, and feature subset splicing strategies, it alleviates insufficient cross-level interaction in traditional feature pyramid structures. Secondly, at the local level, the method introduces the C2f_DCNv3 module into the backbone network, significantly enhancing the model's geometric adaptability by exploiting the dynamic sampling characteristics of deformable convolution. From a global perspective, the C2f_SAConv module is constructed by combining Switchable Atrous Convolution (SAC) with the C2f module, optimizing multi-scale semantic feature fusion through dynamic atrous rate adjustment. These two approaches enhance the model's robustness to complex scenes. Finally, SPDConv replaces traditional convolution structures, strengthening feature representation through spatial-channel reorganization while reducing computational complexity. Experimental results demonstrate that CAFR-YOLO achieves 86.3% mAP@0.5 and 67.2% mAP@0.5:0.95 on the PASCAL VOC dataset with comparable computational costs to the original model. On the MS COCO dataset, it improves mAP@0.5 and mAP@0.5:0.95 by 3.5% and 3.9%, respectively. Compared to existing state-of-the-art methods, CAFR-YOLO exhibits significant advantages across multiple metrics. The proposed CAFR-YOLO model substantially enhances multi-scale object detection accuracy and robustness while maintaining computational efficiency, providing a novel solution for real-time object detection tasks.

Select

An Improved Algorithm for Small Object Detectionin UAV Aerial Images Based on RT-DETR

TIAN Hongpeng, LI Zhiqiang, YANG Sai

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252661

Accepted: 2025-11-05

Abstract (47) Download PDF (31)

Knowledge map

Save

In lightweight small UAV image object detection tasks,there are common challenges such as low detection accuracy, complex backgrounds, large variations in target scale, dense target distribution, and a relatively large number of model parameters. Therefore, this paper proposes a novel improved RT-DETR object UAV object detection algorithm. First, an enhanced C2f-Heat-Lsk module is developed through integrating the HeatBlock thermal conduction module and LskBlock spatial selective attention mechanism into the C2f structure. This modified module collaborates with the original C2f module to redesign the RT-DETR backbone network, which improves spatial feature extraction while reducing model parameters Second, a novel feature fusion structure SOFEP replaces the original feature pyramid to mitigate detail loss in small objects and enhance their feature representation. Third, a combined Focaler-MPDIoU loss function is constructed by integrating Focaler-IoU and MPDIoU loss mechanisms, which improves bounding box regression accuracy and reduces miss detection rates. Experimental results on the VisDrone test set show that the improved model reduces parameter count by 16.9% compared to RT-DETR, while achieving improvements of 2.6% in mAP0.5 and 1.9% in mAP0.5:0.9. The model also outperforms RT-DETR on the DOTAv1.0 and HIT-UAV datasets. These advancements demonstrate that the proposed method achieves higher detection accuracy with reduced computational complexity, effectively meeting the requirements for small object detection in UAV aerial images.

Select

Multi-Seasonal Multi-Behavior Pattern Learning Method for Temporal Recommendation

LiangShichao, WenWen, FengYali, ZhengJiabi, HaoZhifeng

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252622

Accepted: 2025-11-05

Abstract (25) Download PDF (10)

Knowledge map

Save

How to model and learn user’s behavior patterns is a crucial issue in temporal recommendation. However, the majority of existing research primarily centers on pattern learning within a single type of behavior. This limitation restricts the ability to take full advantage of the user's diversified behavior patterns revealed by various types of behaviors, such as clicking, purchasing, marking as favorite, and so on. As a result, the potential for enhancing recommendation performance remains underexplored. To address this gap, this research delves into the multi-seasonal sequential dependencies of individual behaviors and the intricate dependencies among different types of behaviors over time. Specifically, we propose a novel model, named multi-seasonal multi-behavior (MSMB) model, for learning temporal patterns across multiple behaviors. In the proposed model, a dual-channel sequence encoder is employed, which incorporates a multi-scale exponential moving average (EMA) mechanism to effectively capture the multi-seasonal temporal dependencies within individual behavior sequences. Additionally, a cross-behavior dependency module is introduced to account for different periodic granularities, thereby enabling the model to effectively capture the time-variant dependencies across various types of behaviors. Extensive experiments conducted on three benchmark datasets demonstrate the effectiveness and superiority of the proposed MSMB model in enhancing temporal recommendation performance.

Select

Learning Disentangled Representation for Time Series Segmentation

CHEN Haozhi, CAI Ruichu, LI Zijian, HAO Zhifeng

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252573

Accepted: 2025-11-05

Abstract (24) Download PDF (11)

Knowledge map

Save

Time series segmentation, an important task in time series analysis, has been widely applied in fields such as biological behavior analysis and physical system analysis. However, most existing time series segmentation methods fail to account for the nonstationary dynamics of time series induced by distribution shifts, thereby limiting their ability to achieve accurate segmentation in nonstationary regimes. To solve this problem, this paper first proposes a data causal generation process hypothesis based on real-world scenarios. Under this hypothesis, the latent variables underlying the observed data can be decomposed into stationary and non-stationary latent variables. Here, the stationary variables represent information that is unchanged or changes periodically, while the nonstationary variables represent dynamically changing information. Secondly, based on this causal generation process hypothesis, a Stationary Nonstationary Disentangle Model (SNDM) is designed. This model disentangles stationary and nonstationary variables, thus enabling enhanced focus on non-stationary dependencies in the time series. Moreover, in order to accurately disentangle and extract variables, the evidence lower bound (ELBO) of variational inference is used to construct the loss function of the model. Leveraging this ELBO, this study introduces stationary and nonstationary prior neural network modules to improve latent variable disentanglement accuracy. Finally, through experiments, we validate that our model outperforms several state-of-the-art time series segmentation methods on various benchmark datasets, thereby highlighting its advantages in practical scenarios.

Select

XiRang: Scalable High-Performance Address Mapping Structure

Zhao Weiyue, Wu Jingya, Lu Wenyan, Li Xiaowei and Yan Guihai

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252422

Accepted: 2025-11-05

Abstract (25) Download PDF (10)

Knowledge map

Save

Emerging applications in datacenters have introduced a significant amount of large-granularity RDMA communication requirements. RDMA relies on physical addresses, and, when accessing large-granularity data, the Page Table Entries (PTEs) required for address translation exceed the cache capacity of hardware devices. Current high-performance commercial solutions store PTEs in the host memory. However, this architecture requires large-granularity communication to be executed only after fetching the PTEs from the host memory, which introduces PCIe traversal and host memory access latency, severely degrading address translation efficiency and increasing host CPU overhead. To achieve efficient large-granularity RDMA, this paper designs a configurable high-performance address mapping structure: XiRang. XiRang efficiently extends the access granularity through a streaming prefetch mechanism and a hierarchical cache design, and implements flexible and high-throughput address translation performance through a configurable address translation array. The XiRang prototype is implemented based on a DPU. Experiments show that: 1) XiRang effectively offloads the address translation load of the RDMA data plane, decoupling it from the host CPU; 2) The streaming prefetch extension mechanism used by XiRang effectively reduces storage overhead, with cache consumption at only the 10-byte level under concurrent modes, and concurrent storage overhead being negligible; 3) Under a high number of concurrent memory access requests, XiRang maintains a translation table entry query hit rate close to 100%, reducing the idle time of the translation engine by 2 to 3 orders of magnitude compared to the RNIC architecture; 4) The translation throughput of XiRang is more than 60 times that of the RNIC translation architecture and more than 3.5 times that of the basic DPU address mapping structure; 5) In performance enhancement mode, XiRang's address translation speed can support a data transfer bandwidth of 1.4 TB/s.

Select

Improved YOLOv8-Based Detection Of Endangered Animals In Complex Contexts

Jia Xinglong, Qin Junping, Yan Kai, Liu Zheng, Wang Dan, Shao Xinran, Shao Zezhou

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252811

Accepted: 2025-11-05

Abstract (29) Download PDF (9)

Knowledge map

Save

In order to solve the problem of insufficient accuracy in identifying endangered animals in complex backgrounds in the wild, this study improved the YOLOv8 model. First, the Dynamic Snake Convolution (DSConv) was introduced in the backbone network to enhance the detection performance of the model under occlusion. Secondly, the global attention mechanism (GAM) was introduced in the neck network to improved the model's attention to information related to endangered animals, suppress irrelevant features such as the environment, and reduce redundant information. Then, a small target detection head was designed in the head network to fuse shallow feature maps to improved the network's perception and positioning capabilities for small targets. Finally, the bounding box loss regression function based on the minimum point distance (MPDIoU) was used to replace the traditional CIoU algorithm, thereby improving the convergence speed and positioning accuracy of the algorithm. The experimental results show that the detection accuracy and average precision of the proposed model for endangered animals in complex backgrounds are 96.2% and 97.2%, respectively, which are 2.1 and 2.4 percentage points higher than the basic YOLOv8n detection accuracy and average precision, respectively. Using the same data set to conduct comparative experiments on different target detection models, the average precision is increased by 28.7, 22.5, 3.5, and 2.4 percentage points compared with Faster-RCNN, SSD, YOLOv5, YOLOv7 and other models, respectively. The experiment proves that the improved YOLOv8 model can provide a theoretical basis for the detection of endangered animals in complex backgrounds.

Select

Multi-Parameter Optimization WiFi Fingerprint Positioning Method Driven by Few Shot Learning

WU Shixun, TANG Peiyao, LAN Zhangli, Xu Kai, ZHANG Miao

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252666

Accepted: 2025-11-04

Abstract (21) Download PDF (12)

Knowledge map

Save

WiFi fingerprint positioning based on received signal strength indication (RSSI) has gained wide attention due to its ease of deployment and cost-effectiveness. However, existing fingerprinting methods typically rely on large-scale training data, while data augmentation often produces virtual samples of uneven quality, thereby limiting positioning accuracy and generalization. To address these issues, this study proposes a multi-parameter optimization WiFi fingerprinting method driven by few-shot learning (FSL). The method integrates an attention-enhanced convolutional neural network (CNN) with a meta-learning framework to enable rapid adaptation under limited data, while particle swarm optimization (PSO) is employed for automated data selection and joint hyperparameter tuning under physical constraints. Experimental results demonstrate that the proposed method achieves average positioning errors of 0.52 m on the CJU dataset and 6.88 m on the public Tampere dataset, improving accuracy by at least 49.5% and 8.7% compared with baseline methods. In addition, a generalization test on the CJU-2024 dataset shows that the model adapts effectively to new environments with only a small amount of data, achieving an average positioning error of 2.17 m and an accuracy improvement of at least 26.7%. These results confirm that the proposed method significantly improves indoor positioning accuracy while maintaining strong generalization capability.

Select

Person Re-Identification Method Integrating Data Augmentation and Feature Purification

YANG Yingying , CHE Jin , BAI Xuebing, XIAO Long, JIAN Liqiong

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252842

Accepted: 2025-11-04

Abstract (15) Download PDF (6)

Knowledge map

Save

Existing unsupervised person Re-ID methods focus only on pedestrians’ global features, causing global feature bias and insufficient data diversity that impair recognition accuracy.To address this, this paper proposes an innovative ViT-based method(DAFP) integrating Multi-level Data Augmentation (MDAM) and Feature Purification (FP). Firstly, the MDAM—including geometric spatial transformations, appearance feature perturbations, and occlusion simulation—expands training sample diversity and enhances the model’s cross-camera robustness. Additionally, the FP module divides the local features output by the Transformer into upper and lower parts according to spatial positions, performs adaptive weighted fusion with global features via a multi-view distance matrix, and generates high-quality pseudo-labels with DBSCAN, effectively alleviating similar pedestrian misclustering caused by over-reliance on single global features in traditional methods. Finally, a global-local clustering contrastive loss dynamically updates global and local clustering centers to strengthen fine-grained feature learning. Experimental results on Market1501, DukeMTMC-reID, and MSMT17 show that its mAP/Rank-1 reaches 90.5%/96.0%, 77.6%/87.6%, and 64.5%/86.0%, respectively, significantly surpassing the current state-of-the-art methods and fully verifying the superior performance of this method.

Select

A zero-shot learning agent for TTP extraction

Tang Weilin, Wang Junfeng, Ge Wenhan, Zhang Chengcheng, Zhan Weilu

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252621

Accepted: 2025-11-04

Abstract (20) Download PDF (51)

Knowledge map

Save

Cyber Threat Intelligence (CTI) plays a pivotal role in mitigating the asymmetry between cyber attacks and defenses. However, current extraction methods for Tactics, Techniques, and Procedures (TTPs) predominantly rely on supervised language models with manual annotation, which suffer from inefficiency and inconsistency issues. Although the MITRE ATT&CK framework has mitigated TTP description problems through standardized classification, existing NLP-based approaches still face three major challenges: insufficient generalization capabilities, delayed version adaptation, and poor interpretability. To address this, DetecTTive is proposed—a zero-shot learning-based TTP extraction method for large language models that combines the prior knowledge of large language models with external trustworthy knowledge. This framework innovatively utilizes the ATT&CK official knowledge base as an external knowledge source, combining vector-based semantic retrieval and graph-enhanced association reasoning, along with agent workflow to achieve automated white-box reasoning. This enhances zero-shot performance while ensuring result traceability. Experiments demonstrate that the proposed zero-shot approach achieves an F1 score of 80.02% and a recall of 83.46% in benchmark datasets. This method effectively addresses the data bias and version adaptation issues inherent in conventional models, providing an interpretable and cost-efficient solution for TTP extraction in dynamic threat environments.

Select

A Survey on Text-Level Stance Detection

Fan Qinlong, Sun Yepeng, Lu Jicang, Zhu Taojie and Liu Yilin

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252679

Accepted: 2025-11-04

Abstract (16) Download PDF (7)

Knowledge map

Save

With the popularization and development of the internet, the massive volume of user-generated comments on trending topics and their widespread dissemination profoundly influence the progression and development of real-world events. Consequently, mining public stances and attitudes toward trending topics holds significant practical value for domains such as online public opinion monitoring and social security governance. Stance detection technology aims to identify user attitudes toward specific targets from user-generated texts. Although numerous studies have proposed diverse task scenarios and technical methodologies, a unified classification framework for stance detection tasks remains elusive. First, this paper presents a comprehensive review of stance detection tasks from two dimensions: task scenarios and technical methodologies, systematically organizing the current research landscape and development trends. From the task scenario perspective, we classify stance detection into three paradigms: target-specific, target transfer, and target generalization, highlighting the field's evolution from domain-specific applications toward broader adaptability. From the methodological perspective, we categorize stance detection approaches into three primary classes: model-based engineering, knowledge-driven engineering, and data-centric engineering, analyzing the strengths and limitations of each. Additionally, we conduct statistical and experimental analyses of publicly available resources across multiple dimensions, revealing key characteristics and developmental trajectories of these benchmark datasets. Finally, the paper concludes with a summary and outlines prospective research directions and persistent challenges.

Select

MFG-FS:Partial Label Feature Selection based on Multi-source Fuzzy Granulation

Wu Qiannan, Ding Weiping, Fan Xiaoxue, Ju Hongrong, Zhou Linlin, Wang Jing

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252706

Accepted: 2025-10-31

Abstract (21) Download PDF (14)

Knowledge map

Save

Feature selection can effectively identify informative features from complex data to improve information processing efficiency. However, in partially labeled data scenarios, traditional feature selection methods face significant challenges due to inherent label ambiguity, complex inter-sample relationships, and difficulties in feature importance evaluation. To address these challenges, this paper proposes MFG-FS, an effective feature selection framework for partially labeled datasets. First, to tackle label ambiguity, we design an end-to-end disambiguation method based on the MLP-Mixer model and contrastive learning, which optimizes the feature representation space to enhance discriminative power and obtain more reliable label confidence distributions. Second, to accurately characterize complex sample relationships in partially labeled data, we construct fuzzy similarity relations and information granules that integrate multi-source information, effectively combining local feature-space structures, global correlations from disambiguated labels, and label constraints. Subsequently, based on the constructed fuzzy information granules, we define and employ a fuzzy mutual information measure for feature evaluation, which quantifies the relevance between feature subsets and labels while assessing internal redundancy, thereby providing a robust basis for high-quality feature subset selection. Finally, extensive experiments on five synthetic and four real-world datasets demonstrate that MFG-FS can select more discriminative and robust feature subsets, achieving superior performance in partial label disambiguation and classification accuracy.

Select

A Multi-Scale Object Detection Algorithm Oriented to Autonomous Driving

HUANG Yuqi, YANG Xiaoxia, YANG Ronghao , LIAO Fangzhou, YAN Le, GUO Junqiang, LI Minghan

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252697

Accepted: 2025-10-30

Abstract (32) Download PDF (8)

Knowledge map

Save

Object detection for autonomous driving perception aims to locate and identify traffic participants such as motor vehicles, non-motor vehicles, and pedestrians within onboard camera views in real time, providing accurate input for the environmental perception module to support decision-making and control in autonomous driving systems. The perception system suffers from false and missed detection rates due to complex road backgrounds, diverse object shapes, and large scale variations. Specific challenges include low accuracy in detecting deformed objects, insufficient multi-scale detection, and weak global perception. To address these issues, an improved algorithm named YOLOv8-DDL based on YOLOv8n is proposed. First, deformable attention is introduced to improve the C2f module in the backbone network, which dynamically learns feature offsets to enhance the capture capability for various object shapes in traffic scenes, improving the model's adaptability to complex spatial distributions and effectively reducing false detections. Second, large separable kernel attention is integrated to enhance the spatial pyramid pooling fast module, expanding the receptive field through large-kernel convolution to strengthen global context modeling and robustness in complex backgrounds. Finally, a dynamic multi-scale adaptive fusion module and a dynamic feature pyramid network are designed to reconstruct the neck network, dynamically fusing high-level and low-level features to enhance multi-scale feature representation and improve multi-scale object detection performance. Experimental results on the public SODA10M dataset show that compared to YOLOv8n, YOLOv8-DDL improves precision, recall, F1-score, and mean average precision by 5.9%, 1.3%, 3%, and 1.5%, respectively. Additional validation on the public BDD100K dataset confirms improvements of 2%, 0.6%, 1%, and 2% in these metrics, respectively.

Select

Trapdoor Hash-Based Data Confirmation Scheme For Rights-Controllable

CHEN Junhong, ZHOU Feng, TIAN Youliang, YANG Kedi, ZHANG Qijia

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252063

Accepted: 2025-10-29

Abstract (41) Download PDF (6)

Knowledge map

Save

As the demand for data training across industries increases, data has become a key factor of production. Data rights confirmation can clarify data ownership and allocate benefits, preventing unauthorized use. However, the existing schemes have problem such as uncontrollable rights and low efficiency of rights confirmation in rights collection, storage and use. To address in these challenges, this paper proposes a trapdoor hash-based data confirmation scheme for rights-controllable. First, in order to prevent the loss of data right during data transfer, this paper constructs a right confirmation model with the separation of holding, management, and usage rights, thus achieving a refined allocation of rights. Second, Aiming at the problem of uncontrollable generation of management rights of existing correlation algorithms, a data confirmation algorithms based on trapdoor hash is proposed, which realizes controllable generation of data management rights with changes and improves the efficiency of correlation at the same time. In addition, combined with blockchain technology, this paper designs a data transaction mechanism for authorization-traceable, which realizes the non-repudiation and traceability of data transactions by finely controlling the collection and access of data and uploading the corroboration information. Finally, through the security analysis and performance analysis, it is concluded that compared with the traditional scheme, the proposed scheme has advantages in terms of computation and storage overhead while ensuring that the rights signatures cannot be forged.

Select

Research and Application of Group Intelligence Emergency Decision Making Method Based on Large Language Model

Gao Jianwei, Zhao Shutong, Huang Ningbo

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252386

Accepted: 2025-10-28

Abstract (23) Download PDF (12)

Knowledge map

Save

Under the background of rapid development of artificial intelligence, a group intelligent emergency decision-making method based on large language model and retrieval enhancement generation technology is proposed to address the problems of insufficient public participation and strong dependence on specialized knowledge in current emergency decision-making. It aims to integrate social media public data and domain knowledge base, construct a public-expert collaborative multi-attribute decision-making model, improve the scientific and response effectiveness of disaster response, and apply it to emergency management. Firstly, we use Python crawler tool to obtain public comments from microblogging platform to form the emergency disaster demand database; secondly, we integrate the emergency management professional database based on RAG technology to enhance the model generating ability, guide the topic classification through cue word engineering, construct the topic word co-occurrence network, adopt Louvain algorithm clustering, and combine with the expert checking and optimization, to generate attribute sets of emergency decision-making; and then, we integrate the importance and cohesiveness of the public-expert collaborative multi-attribute decision-making model, and apply it to the emergency management. , synthesize the importance and cohesion factors to construct the attribute weight measurement model; finally, consider the psychological behavior of decision makers, and use TODIM method to sort and optimize the alternative emergency solutions. Taking the 7-20 Henan rainstorm event as an example, the experimental results show that the method proposed in this paper is able to generate emergency decision-making topics that meet the public demand, and performs well in the consistency and diversity of the topics, which are 0.583 and 0.943, respectively, verifying the scientificity and effectiveness of the method proposed in this paper.

Select

Approximate Shapley Value-Based Cooperative Supply Strategy in Edge Environments

ZHAO Shuxu, CHEN Yanhong, WANG Xiaolong, JIANG Kaijun

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252581

Accepted: 2025-10-28

Abstract (22) Download PDF (20)

Knowledge map

Save

】To address issues such as resource mismatch, load bottlenecks, and service instability caused by demand fluctuations and large-scale bursty tasks in mobile edge computing, a cooperative supply strategy based on approximate Shapley values (ASVC) is pro posed. First, a task allocation model based on bidirectional preference matching is constructed, which considers both the performance requirements of user tasks and the resource status of edge nodes. The Gale-Shapley algorithm is used to achieve optimal supply-demand matching. Second, to reduce the computational complexity of Shapley value estimation during coalition formation, an adaptive sam pling-based optimization scheme is introduced. This approach significantly reduces the computation time of Shapley values while maintaining accuracy. Finally, task data is allocated according to the proportional contribution of each node, improving system fairness and resource utilization efficiency. Simulation results show that, compared with existing algorithms, the proposed ASVC algorithm improves service quality, delay control, task completion rate, and system load balancing by approximately 27.8%, 31.0%, 30.8%, and 21%, respectively.

Select

Differential Low-Rank Adaptation-based Sensitive Information Protection for Large Language Model Training

Yanli Lv, Yiwen Jiang, Hanyu Feng, Zhenqi Guo, Sheng Xiang

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.00252845

Accepted: 2025-10-28

Abstract (23) Download PDF (7)

Knowledge map

Save

As generative AI technologies become increasingly integrated into sensitive industries, the over-reliance of large generative models on memorizing training data during fine-tuning poses a growing risk of privacy leakage, where user identities, behavioral traces, and other sensitive information may be reconstructed during inference. To address this issue, a novel fine-tuning approach combining Differential Privacy (DP) with Low-Rank Adaptation (LoRA) is proposed. This method freezes the parameters of the pre-trained model and updates only the inserted LoRA modules. Additionally, Differential Privacy Stochastic Gradient Descent (DP-SGD) is introduced, implementing gradient norm clipping and Gaussian noise injection on a per-sample basis to minimize the model’s dependence on individual training samples. Based on the Qwen2-1.5B language model, a task-specific fine-tuning dataset incorporating user profiles is constructed, and adversarial samples targeting typical sensitive fields—such as identity markers, behavioral characteristics, and location data—are developed to evaluate the anti-leakage capabilities of traditional full-parameter fine-tuning versus the DP-LoRA approach. Experimental results demonstrate that fully fine-tuned models exhibit a high sensitive-information match rate of 73.07% across 130 adversarial samples, indicating severe privacy vulnerabilities. In contrast, the DP-LoRA fine-tuned models achieve a significantly reduced match rate of only 1.5%, with generated content showing minimal correlation to original training data. This approach effectively mitigates the risk of sensitive information disclosure, offering a cost-efficient and highly adaptable training strategy for deploying generative models in real-world scenarios with stringent data security requirements.

Select

Review on Deployment Problems of Resource Public Key Infrastructure

Guozheng Yang, Dongzhen Qi, Pan Chen, Zhaobin Shen, Pengyu Yin, Yanlin Huo

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252551

Accepted: 2025-10-27

Abstract (40) Download PDF (2)

Knowledge map

Save

Resource Public Key Infrastructure (RPKI) is an important mechanism to safeguard BGP routing security, which realizes the legitimacy verification of BGP announcements by Route Origin Authorization (ROA) and Route Origin Validation (ROV). As RPKI continues to advance globally, its deployment status and actual defense effect have become the focus of research. In recent years, researchers have carried out a great deal of researches about ROA configuration problems and ROV deployment measurements, portraying the operational status and protection capability of RPKI in real networks from different dimensions. Current RPKI-related surveys mainly focus on the theoretical research of the RPKI system itself, emphasizing its architectural vulnerabilities, without systematically organizing and deeply summarizing the key challenges and related studies encountered in the actual deployment of RPKI. This review systematically summarizes recent studies on deployment issues of the RPKI system. It focuses on classifying common types of errors in ROA configuration, including benign ROA conflicts and loose ROA registrations, providing a systematic analysis that reveals their causes and impacts on routing security. Finally, this review outlines future research directions in the field of RPKI deployment issues, providing a theoretical foundation and methodological reference for subsequent research in the directions of RPKI deployment optimization, security assessment and strategy research. This will help promote the widespread adoption of RPKI and enhance the defense against BGP prefix hijacking.

Select

An Empirical Study of Redundant Dependencies in Open-Source Java Projects

Liu Meigui, Zhang Neng, Li Jiale, Zhao Yuqi, Li Zengyang

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.00 EC0252768

Accepted: 2025-10-27

Abstract (13) Download PDF (6)

Knowledge map

Save

Redundant dependencies in software projects can lead to increased build size, performance overhead, and long-term maintenance burden. Although existing studies have investigated redundant dependencies in the Maven ecosystem, there remains a lack of analysis regarding their distribution across different dependency scopes (e.g., compile and test), their evolutionary patterns, and their impact on project popularity. To address this gap, we select 2,214 Java Maven open-source projects from GitHub as our study subjects. We employ a mvn command to identify dependencies that are declared but not actually used, and conduct a quantitative analysis of redundancy ratios based on their scopes. Furthermore, we apply the Mann-Kendall non-parametric trend test on 3,817 historical versions from 698 projects to identify trends in the evolution of redundant dependencies. To assess the relationship between redundant dependencies and project popularity or community activity, we construct five GitHub-based popularity and activity metrics, including star growth rate, fork growth rate, and issue closing rate, and perform Pearson correlation analysis. Experimental results show that redundant dependencies are primarily concentrated in the compile and test scopes, with median redundancy ratios of 33.33% and 30.00%, respectively. In terms of evolutionary trends, 48.1% of the projects maintained a stable redundancy ratio, 36.2% exhibited fluctuations, and a small proportion showed an increasing or decreasing trend. In the correlation analysis, only the issue closing rate shows a significantly weak negative correlation with the redundancy ratio. These findings provide developers with a detailed perspective on dependency management and can help optimize project configurations and improve software maintainability.

Select

A Physical World Mapping Discrepancy Engine Based on Multi-Source Sensor Data

GAO Song, GAO Bo-lin, LU Jian, WU Yue-long, WANG He, XU Yue-yun

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252536

Accepted: 2025-10-27

Abstract (24) Download PDF (5)

Knowledge map

Save

Quantifying the discrepancy between different sensor perception algorithms' mapping of the physical world and identifying boundary data is a key challenge in automating the extraction of high-value boundary data. This paper proposes a discrepancy engine based on multi-source sensor data for the autonomous discovery of boundary data. The engine consists of two main modules: the discrepancy cognition module and the discrepancy rate calculation module. In the discrepancy cognition module, a discrepancy rate was defined, and an association model linking the discrepancy rate with perception mapping discrepancies was established. The average discrepancy rate of a dataset was used as the baseline discrepancy rate to quantify mapping discrepancies and identify boundary data. In the experiments, the baseline discrepancy rates of LiDAR, millimeter-wave radar, and vision-based perception algorithms were calculated as 0.17, 0.23, and 0.19, respectively. In the discrepancy rate calculation module, a 2D pixel distance matching strategy combining the chi-square distribution and Welsh loss was used to match camera-detected objects with those detected by LiDAR, millimeter-wave radar, and other cameras. Compagred to a fusion algorithm that used only a 3D distance matching strategy, the proposed approach achieved discrepancy rates of 0.16 and 0.14 relative to the ground truth on the test dataset, demonstrating that the improved matching strategy significantly enhanced the accuracy of the fusion algorithm. The results indicate that the discrepancy engine achieves average recognition accuracies of 0.85, 0.74, and 0.82 for the boundary data of LiDAR, millimeter-wave radar, and vision-based perception algorithms. Validation in real-world road scenarios, including straight urban roads, simple intersections, and complex intersections, confirms the engine's effectiveness in identifying perception boundary data.

Select

Frequency Multi-scale Networks for Extremely Large-Scale MIMO Channel Estimation

Yu Chengwen, Xie Bin, Zhou BoBo, Li Xiang

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252577

Accepted: 2025-10-27

Abstract (16) Download PDF (1)

Knowledge map

Save

Extremely Large-scale Multiple-Input Multiple-Output (XL-MIMO) systems are considered as one of the key technologies to realize 6G communications. However, due to the significant increase in the number of antennas in XL-MIMO systems, the channel exhibits hybrid field characteristics, thus posing a great challenge to channel estimation. To address this problem, this paper proposes a deep learning-based Adaptive Frequency Filter Parallel Joint Convolutional Network (AFF-PJCN) channel estimation algorithm. Firstly, the received signal is processed by the adaptive frequency filter network, which is equipped with learnable filters that can automatically optimize the filtering parameters according to the input data, enabling adaptive signal analysis and modeling within the frequency domain, and effectively filtering out noise interference. Then, through the parallel joint convolutional network, the multi-scale convolutional operation of the parallel structure can effectively capture the global and local features of the received signal, further enhancing the channel estimation performance. To enhance the generalization ability of the model, a segmented hybrid data training strategy is adopted. The training set is constructed by independently sampling randomly in different signal-to-noise ratio intervals, ensuring that the model maintains robust performance under diverse channel conditions. The experimental results show that the proposed AFF-PJCN algorithm not only achieves superior estimation accuracy but also demonstrates stronger generalization and robustness compared with other existing channel estimation schemes in the hybrid field channel model of XL-MIMO systems.

Select

Two-Stage Retinex Weld Seam Low-Light Image Enhancement and Defect Detection

FAN Zhengwei, CHANG Daofang, MAN Xingyu, WANG Chongwen

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252732

Accepted: 2025-10-21

Abstract (25) Download PDF (10)

Knowledge map

Save

X-ray inspection, as an intuitive means of nondestructive testing (NDT) of pipeline weld defects, plays a key role in the prevention of pipeline safety accidents. However, it remains challenging to accurately identify tiny defects in low-grayscale, low-contrast, and dark-toned X-ray images. Therefore, an innovative method is proposed to optimize the display effect of X-ray images of pipe welds under low-light conditions, and to achieve a certain improvement in the accuracy of defect detection. Firstly, the improved network framework of Retinex-Net is introduced, and the attention mechanism residual block is added to the network to restore illumination and enhance details of low-light X-ray images, suppress noise and artifacts, and output natural and obvious distortion enhancement images, providing high-quality input for subsequent detection. Secondly, a weld positioning and feature extraction algorithm based on drift Gaussian algorithm is designed, which adaptively tracks irregular long welds and automatically crops the weld area, which significantly reduces background interference and improves processing efficiency. Finally, the welding defect detection algorithm based on cross-layer feature fusion is optimized, and a feature codec architecture based on RSU module is constructed, and the attention mechanism is integrated in the feature extraction stage to strengthen cross-layer multi-scale feature fusion, so as to improve the detection accuracy and reduce the missed detection rate.The results show that the proposed method significantly improves the performance indicators in the public GDXray dataset, which not only effectively enhances the image quality, but also realizes the high degree of automation and fast response ability of weld defect detection, which proves its efficiency and accuracy in practical application scenarios.

Select

Automatic Heap Memory Layout Manipulation Method for Software Vulnerability based on Large Language Models

ZHANG Bin, LI Run-hao, FENG Chao

Computer Engineering. https://doi.org/基于大语言模型的软件漏洞堆内存自动布局方法

Accepted: 2025-10-20

Abstract (0) Download PDF (0)

Knowledge map

Save

Automatic heap memory layout manipulation is the core technology for realizing exploit code generation of software memory corruption vulnerabilities, with the goal of constructing the necessary memory layout conditions for vulnerability exploitation by precisely controlling the allocation state of heap memory. However, existing memory automatic layout manipulation methods based on search and solving exhibit significant limitations in terms of efficiency. To address these challenges, this paper innovatively proposes a Large Language Model (LLM)-based approach for automatic memory layout manipulation. This method first leverages LLMs to automatically learn from the target heap manager's public documentation, source code comments, and analysis materials to acquire the allocator's operational mechanisms and key characteristics. Building on this foundation, the approach employs the powerful reasoning and feedback-driven thinking capabilities of LLMs to adopt an iterative layout strategy of "plan-verify-replan." By continuously incorporating feedback from debugger execution results to refine the layout planning strategy, it ultimately achieves automated memory layout. Experimental validation demonstrates that this solution successfully achieves precise memory layout in 12 real-world Linux user-space vulnerabilities and attains a 94.54% layout success rate on a benchmark comprising 3,735 test samples across six different heap managers. Compared to the search-based Gollum system, it improves layout manipulation speed by 2.33 times. Relative to the solving-based MAZE and BAGUA systems, it reduces the heap allocator behavior learning time from weeks to an average of 7.3 minutes without significantly compromising layout speed. These results verify that the proposed solution balances high efficiency and scalability, offering a new technical paradigm for LLM-based research on automated vulnerability exploitation.

Select

Dual-Target Cross-Domain Recommendation Method Based on Het-erogeneous Graphs and Hierarchical Preference Disentanglement

Bojia Chen, Tingnian He, Lianjie Zhang, Shu'an Chen

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252504

Accepted: 2025-10-20

Abstract (30) Download PDF (50)

Knowledge map

Save

Cross-domain recommendation systems are widely applied in e-commerce and content platforms. Although the dual-target cross-domain recommendation (DTCDR) proposed in recent years has achieved a breakthrough in simultaneously improving the performance of both domains, it still faces two major challenges: 1) the generated user-item representations lack sufficient correlation and diversity; 2) the semantic noise mixed in the shared preferences leads to negative transfer problems. To address these issues, a dual-target cross-domain recommendation model based on heterogeneous graph and hierarchical preference disentanglement (HGPD-DTCDR) is proposed. Its core innovations include: 1) a heterogeneous graph collaborative learning framework is proposed to integrate user-item interactions, user social networks, and item attribute similarities, constructing a multi-relation heterogeneous graph, and generating high-order semantic representations through a relation graph convolutional network (R-GCN) to enhance the diversity and correlation of the representations; 2) a two-stage decoupling process is designed, first separating domain-specific and shared preferences through a variational graph encoder, and then introducing a semantic filtering network to optimize the quality of shared preferences. Experiments on five real cross-domain datasets show that the performance improvement of this model stems from the synergistic effect of heterogeneous graph modeling and hierarchical decoupling mechanisms. Compared with the best baseline, it achieves average improvements of 3.55%, 7.27%, and 15.57% in hit rate, normalized discounted cumulative gain, and mean reciprocal rank, respectively. In data-sparse scenarios, the performance improvement is even more significant, with an average gain of 10.35%. Ablation studies further verify the effectiveness of each technical component and their synergistic effects.

Select

Improved Real-Time Detection Method for Fault Hazards in Overhead Transmission Lines

Xu Haoyu, Zhang Jing, Zhang Jiamin

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252508

Accepted: 2025-10-20

Abstract (22) Download PDF (6)

Knowledge map

Save

To address the challenges of small target scale, complex background, and insufficient feature representation in the detection of potential hazards on high-voltage overhead transmission lines, this paper proposes an improved lightweight real-time detection model, LG-DETR. First, a lightweight backbone network, ResNet-WT, is designed by introducing wavelet transform convolution to enhance multi-scale feature extraction while reducing computational complexity. Meanwhile, a frequency-separated self-attention mechanism is adopted in the feature fusion stage to improve the feature interaction module HL-AIFI, thereby mitigating background interference. Then, a cross-level multi-scale information aggregation feature pyramid network CMIAFPN is proposed to optimize feature transmission paths, combined with a gating module to improve feature retention efficiency and prevent detail loss in high-level features. Furthermore, by incorporating the scaling factor of Focal Loss into Wise-IoU, a novel Focal-WIoU loss function is developed to dynamically adjust the weighting of hard and easy samples, thereby enhancing the detection accuracy of small targets. Experimental results demonstrate that LG-DETR achieves a 6.94 percentage point improvement in and 23.9% reduction in parameters on a high-voltage overhead transmission line hazard dataset, verifying the effectiveness of the proposed improvements.

Select

Chinese Braille Word Segmentation System Based on BERT

Wang Ruixuan, Li Yan, Zhong Jinghua, Yao Dengfeng, Xu Cheng, Ren Tianyu

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252691

Accepted: 2025-10-17

Abstract (29) Download PDF (72)

Knowledge map

Save

hinese Braille is a kind of script used by people with visual impairment in China and it is an important part of the National Commonly-Used Language and Script. At present, although there are some methods have been developed for the automatic translation from Chinese text to Braille text, there are still shortcomings. Braille word segmentation is a crucial step in Chinese-Braille translation, which seriously affects the final translation result. It is also an important task in the research of Braille informationization. Although pre-trained models have been widely used in the field of Chinese natural language processing, they are currently less commonly used in the study of Braille informationization. Braille and Chinese characters are expressions of the same language in different writing systems, and there are similarities and transferability between the two. Pre-trained models have great potential for development in the field of Braille informationization.This paper introduces the BERT pre-trained model into Braille word segmentation task. We used BERT to extract feature vectors and decoded them using CRF combining the whole-word masking strategy. A word segmentation model BERT-CRF-wwm of encoder-decoder structure is implemented. To address the issue that the original Chinese word segmentation information of the BERT model may interfere with Braille word segmentation, a new Braille embeddings is concatenated at the embedding layer and finally the BeBERT-CRF-wwm model is implemented. On the Chinese-Braille Corpus, it ultimately achieves a precision rate of 98.80% and a recall rate of 98.71%. Compared with existing Braille word segmentation methods, it achieves better results in various evaluation.

Select

Brain tumor Classification Method Based On Improved Swin Transformer

Huang Yinglai, Xiong Xueshan, Wan Langyi, He Yang, Yang Liusong

Computer Engineering. https://doi.org/计算机工程

Accepted: 2025-10-17

Abstract (0) Download PDF (0)

Knowledge map

Save

Accurate classification of brain tumors is essential in medical imaging diagnosis. However, conventional approaches that heavily rely on expert experience suffer from low efficiency, while existing deep learning approaches struggle with modeling long-range dependencies and balancing global modeling with local feature extraction, resulting in suboptimal recognition accuracy. To address these issues, a Hierarchical Collaborative Residual Transformer Network (HCR-TNet) is proposed. First, a Conv-Pool-Transformer Composite Block (CPT-Block) is introduced to enhance local feature extraction and cross-level contextual modeling, thereby improving the representation of heterogeneous tumor regions. Second, the High-frequency Feature Extraction module (HFFE) module is incorporated to better capture textual details at tumor boundaries and subtle lesion characteristics while effectively suppressing noise. Finally, a Multi-scale residual block (MSRB) is designed to perform residual fusion with the CPT-Block, enabling cross-scale feature optimization from macro to micro structures. Experimental results on a public brain tumor MRI dataset show that the proposed method achieves a classification accuracy of 98.26%, a Kappa coefficient of 97.52%, and an MCC score of 97.52%. Compared to the ViT model, the accuracy is improved by 1.48% and the Kappa coefficient by 2.08%. Ablation studies and comparative experiments confirm the effectiveness of HCR-Net in brain tumor classification tasks, providing valuable methods and ideas for medical image analysis and automatic diagnosis systems.

Select

Efficient KV Cache Sparsification via Ring Buffer-Based Sliding Window and Hierarchical Sparsity Enhancement

Lin Hai, Yu Guo, Yin Zeming, Xu Xianchong, Liu Yuhai

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252452

Accepted: 2025-10-17

Abstract (28) Download PDF (4)

Knowledge map

Save

In long-context and high-concurrent scenarios, large language models (LLMs) encounter significant challenges during inference due to the quadratic growth of memory footprint caused by key-value (KV) cache in self-attention mechanisms, leading to excessive GPU memory consumption and limited throughput. Although KV cache sparsification have been proposed to address this issue, existing approaches still suffer from deficiencies in memory footprint, complexity of sliding window design, and computation-memory access overhead. This paper proposes DoubleSparse++, a triple-optimization framework that addresses these limitations through three innovative techniques: (1) A ring buffer-based sliding window decouples KV cache size from text length while reducing buffer update complexity from O(L) to O(1); (2) An exponential decay sparse equilibrium strategy dynamically allocates token sparsity according to layer indices, achieving progressive sparsification across layers; (3) Optimize the sparse inference kernel by implementing operator fusion and asynchronous device stream pipelines, achieving overlapped computation and memory access in long-context inference scenario, which significantly enhances computational intensity while reducing memory access frequency. Experimental validations conducted on domestic accelerators and mainstream LLMs (including OPT-6.7B, Vicuna-7B-v1.5, LLaMA-2-7B, LLaMA-3.1-8B, Qwen-2.5-7B) demonstrate that DoubleSparse++ achieves 1.31X inference speedup and 0.72X memory footprint reduction compared to DoubleSparse for 4K token generation tasks. Especially, in 13K token scenarios, the memory footprint further reduces to 0.56X of the baseline. Comprehensive performance analysis confirms that DoubleSparse++ constitutes an efficient KV cache sparse method, demonstrating strong applicability for LLM long-context inference and streaming deployment.

Select

Transplantation and Optimization of Sparse Matrix Template Library for Domestic Accelerators

Li Shiyou, Lian Demeng, Zhou Xin, Han Mengzhi

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252670

Accepted: 2025-10-17

Abstract (16) Download PDF (6)

Knowledge map

Save

The CUDA sparse matrix template library (CUTLASS-Sparse) in the CUDA linear algebra template library (CUTLASS) is used to build customizable and high-performance sparse matrix-dense matrix multiplication (SpMM) kernels, which play an important role in many fields such as scientific computing and deep learning. However, it is only implemented and optimized for NVIDIA GPUs and cannot be applied to domestic accelerators. To solve this problem, a transplantation and optimization scheme for CUTLASS-Sparse for domestic accelerators is proposed. In the transplantation stage, the data access module, data computation module and data write-back module are adapted to the hardware architecture of domestic accelerators. In the optimization stage, two shared memory data reordering algorithms, a data pipeline strategy based on data prefetching and register double buffering, and a data write-back strategy based on data aggregation are proposed to address the problems of high conflict rate of shared memory physical storage units (bank), low shared memory bandwidth utilization, low data pipeline parallelism and low data write-back efficiency. Experimental results show that all three optimization methods significantly improve the performance of the transplanted CUTLASS-Sparse. For TF32 and FP16 data types, the overall performance of the optimized CUTLASS-Sparse increases by an average of 30% and 115% compared to the unoptimized version, respectively. It reaches an average of 76% and 60% of the performance of CUTLASS-Sparse on NVIDIA GPU L20, respectively. Under two hardware versions, the performance of the transplanted and optimized CUTLASS-Sparse is on average 2.36 times and 3.09 times that of the SPARSE math library on domestic accelerator platforms, respectively. The experimental results verify the effectiveness of the transplantation and optimization scheme.

Select

Video ViT Adapter for Action Recognition

Yue Minghui, He Yuxuan, Ren Yuanxin, ZHANG Liye

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252608

Accepted: 2025-10-16

Abstract (43) Download PDF (6)

Knowledge map

Save

Video understanding tasks face two major challenges: insufficient computational resources and video datasets scarcity. Current video models are massive and computationally intensive, relying on expensive equipment support and lengthy training period, the scarcity dataset also restricts models to train and generalize adequately. To address these problems, an efficient transfer learning method is introduced: the adapter training strategy. By freezing all the weights of the pre-trained Vision Transformer (ViT) model and only fine-tuning the parameters in the adapter, resource consumption can be significantly reduced while fully retaining the representational advantages of the pre-trained model. Based on the adapter training strategy, a hierarchical adapter and ViT backbone network are designed to jointly construct the Video ViT Adapter (VVA) model. The hierarchical adapter employs three spatiotemporal convolutions with different dimensions, which helps to balance the spatiotemporal relationships between details and the global context. Additionally, the Contrastive Language–Image Pre-training (CLIP) model, which possesses strong cross-modal learning capabilities, is introduced as the pre-trained model. This provides the VVA model with rich feature representations, facilitating effective fusion across different data modalities. VVA achieved excellent results on three standard action recognition datasets, with only 9.50M training parameters. Accuracy rates of 79.32% on Kinetics-400, 97.77% on UCF101, and 81.78% on HMDB51 were obtained. Such performance fully demonstrates that the adapter's efficiency and convenience can effectively address and properly resolve the challenges faced.

Select

Fine-Tuning Large Language Models for Text-to-SQL Using Table Creation Information

DING Lin, YANG Yang, GUO Caili, GUO JianZhang, LI Zheng

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252484

Accepted: 2025-10-16

Abstract (31) Download PDF (5)

Knowledge map

Save

The text-to-SQL task aims to automatically convert natural language queries into structured query language (Structured Query Language), serving as a key technology to enable non-technical users to access databases efficiently, thereby significantly improving data utilization.To address the challenge of large language models insufficiently understanding database schema information in prompts for text-to-SQL tasks, this paper proposes a table creation information-based fine-tuning method for large language models. Existing approaches often rely on complex, lengthy prompt templates or extensive fine-tuning data, facing two major bottlenecks: (1) The inclusion of complete prompt content in the templates dilutes the few critical cues, leading to attention dispersion in long-context understanding and consequently reducing inference performance; (2) The method requires manual collection and processing of tens of thousands of samples for large-scale fine-tuning to enable the model to achieve stable comprehension capability in text-to-SQL tasks after fine-tuning. To mitigate these issues, we propose a hybrid text-to-SQL generation strategy that integrates prompt engineering with fine-tuning. This method selects semantically relevant table creation information based on question similarity and combines it with concise prompt templates to construct a lightweight, manually curated fine-tuning dataset. Through supervised fine-tuning, the dataset guides large language models to better comprehend table schema information in prompts, enhancing their ability to capture relationships between tables and queries, thereby generating more accurate SQL statements. Experimental results demonstrate that the proposed method effectively reduces the model's reliance on extraneous information in prompt templates and mitigates attention dispersion during reasoning. The generated SQL queries achieve an execution accuracy of 83.37% , representing a 0.49 percentage point improvement over the baseline approach.

Select

Research on Hierarchical Cyclic Queuing Forwarding Mechanism and Flow Scheduling Algorithm

He Guangcheng, Li Deshi

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252617

Accepted: 2025-10-16

Abstract (20) Download PDF (2)

Knowledge map

Save

With the development of the industrial Internet, the traditional best-effort forwarding mode can no longer meet the needs of deterministic delay communication, and the IEEE 802.1 working group proposes the cyclic queue forwarding mechanism to achieve deterministic transmission. However, due to fixed-granularity slot forwarding, there are problems such as excessive resource occupation and limited deterministic delay range. Therefore, for time-triggered traffic scheduling with strict latency requirements, a hierarchical cyclic queuing and forwarding mechanism is proposed to reduce the time-triggered traffic delay and reduce resource occupation through fast forwarding. An optimization model to maximize network throughput was constructed to determine the forwarding mode and the injection time slot of the flows. Due to the NP-hard nature, a heuristic priority iterative incremental scheduling algorithm is proposed, which adopts traffic clustering, priority order update and incremental scheduling to realize the calculation of large-scale deterministic traffic. Experimental results show that compared with the CQF mechanism, the scheduling ability of this proposed mechanism is enhanced, and the lower bound of deterministic delay is reduced by half compared with the original mechanism. Resource occupation decreased by 25.77% on average. In multiple sets of experiments involving various topologies, different traffic characteristics and scales, the proposed algorithm is better than the four comparison schemes in terms of network throughput, and the average increase is 3.52%、2.04% and 51.77% compared with the Tabu Search、IRFS and Naive.

Select

Deformable Sketch-Guided Image Inpainting Method

Yang Hongju , Liu Na , Li Yao Cao Fuyuan

Computer Engineering. https://doi.org/10.19678/j.issn.1000-3428.0252645

Accepted: 2025-10-16

Abstract (23) Download PDF (5)

Knowledge map

Save

Sketch-guided image inpainting holds significant application value in photo restoration and creative editing but faces dual challenges of scarce user sketch data and restoration distortion caused by geometric deviations. Existing methods rely on edge detection to generate pseudo-sketches while neglecting user-drawn deviations (e.g., hand tremors, stroke breaks), leading to structural misalignment and detail blurring in complex scenes. To address these challenges, this study proposes an innovative framework combining a deformable sketch generation network with dual-stage guided inpainting. First, a deformable sketch generation network is constructed to model typical hand-drawn deviations, generating a large-scale sketch-image paired dataset with realistic geometric deformation features, effectively alleviating data scarcity. Second, a two-stage inpainting framework is designed: the first stage corrects geometric misalignment and repairs structural breaks in input sketches to optimize the sketches, while the second stage effectively integrates the optimized sketch information into the inpainting network to achieve collaborative optimization of global structural constraints and local texture generation. Experiments on benchmark datasets validate the method's effectiveness, achieving a peak signal-to-noise ratio (PSNR) of 25.78 dB and a structural similarity index (SSIM) of 0.852 on the CelebA-HQ dataset. The results fully demonstrate that this method effectively addresses the challenges of scarce user sketch data and geometric deviations while significantly improving the structural accuracy and perceptual quality of sketch-guided image inpainting.

Just accepted

Please choose a citation manager

Content to export

模态框（Modal）标题

Just accepted

Please choose a citation manager

Content to export