Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Most Downloaded

  • Published in last 1 year
  • In last 2 years
  • In last 3 years
  • All
  • Most Downloaded in Recent Month
  • Most Downloaded in Recent Year

Please wait a minute...
  • Select all
    |
  • Research Hotspots and Reviews
    SUN Lijun, MENG Fanjun, XU Xingjian
    Computer Engineering. 2025, 51(11): 1-21. https://doi.org/10.19678/j.issn.1000-3428.0069543

    In the context of ongoing advancements in educational informatization, constructing precise and efficient curriculum knowledge graphs has become key to promoting personalized education development. As a structured knowledge representation model, curriculum knowledge graphs reveal complex relations between curriculum content and learning objectives to optimize the allocation of educational resources, and tailoring personalized learning paths for learners. This survey presents a discussion around the techniques used to construct curriculum knowledge graphs, starting with an explanation of the basic concepts; intrinsic connections; and significant differences among general, educational, and curriculum knowledge graphs. It then delves into the key technologies used for building curriculum knowledge graphs, covering aspects such as curriculum ontology design, entity extraction, and relation extraction, and provides a detailed analysis and summary of their evolution, key features, and limitations. Furthermore, it explores the application value of curriculum knowledge graphs in scenarios such as learning resource recommendation, learner behavior profile and modeling, and multimodal curriculum knowledge graph construction. Finally, it focuses on the challenges in constructing curriculum knowledge graphs, such as data diversity and heterogeneity, difficulties in quality evaluation, and the lack of cross-curriculum integration, and provides future-oriented insights based on cutting-edge technologies such as deep learning and Large Language Models (LLMs).

  • Large Language Models and Generative Artificial Intelligence
    WANG Heqing, WEI Jie, JING Hongyu, SONG Hui, XU Bo
    Computer Engineering. 2026, 52(2): 383-392. https://doi.org/10.19678/j.issn.1000-3428.0070415

    Large Language Models (LLMs) have made significant progress in dialogue, reasoning, and knowledge retention. However, they still face challenges in terms of factual accuracy, knowledge updates, and a lack of high-quality domain datasets for handling knowledge-intensive tasks in the electricity sector. This study aims to address these challenges by introducing an improved Retrieval-Augmented Generation (RAG) strategy. This strategy combines hybrid retrieval with a fine-tuned generative model for efficient knowledge capturing and updating. The Metadata-driven RAG framework (Meta-RAG) is proposed for knowledge Question Answering (QA) tasks in the electricity domain. This includes data preparation, model fine-tuning, and reasoning retrieval stages. The data-preparation stage involves document conversion, metadata extraction and enhancement, and document parsing. These processes ensure efficient indexing and structured processing of power regulation documents. The Electricity Question Answering (EleQA) dataset, consisting of 19 560 QA pairs, is constructed specifically for this sector. The model fine-tuning stage uses multi-question generation, chain-of-thought prompting, and supervised instruction fine-tuning to optimize the reasoning abilities in specific tasks. The retrieval reasoning stage employs mixed encoding and re-ranking strategies, combining retrieval and generation modules to improve answer accuracy and relevance. Experiments validate the effectiveness of Meta-RAG. Compared to baseline models such as Self-RAG, Corrective-RAG, Adaptive-RAG, and RA-ISF, Meta-RAG shows higher answer accuracy and retrieval hit rates. Meta-RAG with the Qwen1.5-14B-Chat model achieves an overall accuracy of 0.804 3, surpassing the other methods. Ablation and document recall experiments indicate that document retrieval significantly impacts the framework performance, with a 0.292 8 drop in accuracy when the retrieval capability is lost.

  • Mobile Internet and Communication Technology
    WANG Huahua, HUANG Yexia, LI Ling, WANG Jiacheng
    Computer Engineering. 2025, 51(12): 255-267. https://doi.org/10.19678/j.issn.1000-3428.0069877

    When implementing Federated Learning (FL) in a cell-free network environment, user scheduling and resource allocation strategies are crucial for optimizing system time overhead, improving user reachability, and accelerating FL convergence rate. To address the issue of uneven resource allocation, this study designs an optimization scheme that combines user scheduling, CPU processing frequency, and power allocation. This scheme aims to achieve fair resource allocation by maximizing the minimum user rate in the system, thus enhancing FL performance. The joint optimization problem is decomposed into two subproblems: user scheduling and power allocation. For user scheduling, this study proposes a greedy scheduling algorithm based on k-means clustering to comprehensively evaluate channel conditions and data "value" of users and categorize users into different groups. Subsequently, for the resource occupation situation, a personalized CPU processing frequency allocation plan is developed for users within each group based on their resource occupancy. Finally, by independently executing user scheduling within each group, user selection is performed efficiently and precisely, and the complexity of user selection is effectively reduced via early grouping. For power allocation, this study introduces a Bisection Method-based Power Allocation (BM-PA) algorithm. This algorithm not only considers fairness among users but also prioritizes resource-constrained users to ensure that they can obtain superior resource allocation. The BM-PA algorithm achieves fast convergence of power allocation using a low-complexity iterative optimization process, significantly improving the resource utilization efficiency without deteriorating the system performance. In this study, a reasonable user scheduling strategy serves as the foundation for obtaining optimal solutions for the power allocation subproblem. This study adopts an alternating iteration method that allows independent optimization in each subproblem while considering the solution of the other subproblem. Via multiple rounds of iterative optimization, this interdependent relationship ensures that power resources are reasonably allocated to users who need them the most or are most likely to effectively utilize them, thus enhancing the overall system performance. This study realizes joint optimization solutions that significantly improve overall system performance. Simulation results show that compared with the baseline algorithm, the proposed algorithm exhibits outstanding performance in terms of downlink achievable rates-the average improvement reaches up to 103.34% under optimal conditions. Additionally, the uplink achievable rates improve by up to 102.78%. Furthermore, the proposed algorithm can save 67.44% of the FL task training time on average compared to the baseline algorithm, particularly when the FL learning model accuracy reaches 90%, wherein the time overhead of the proposed algorithm is minimal.

  • AI-Enabled Vehicular Edge Computing
    QIN Minhao, SUN Weiwei
    Computer Engineering. 2025, 51(9): 1-13. https://doi.org/10.19678/j.issn.1000-3428.0069416

    Traffic signal control plays an important role in alleviating traffic congestion and improving urban commuting efficiency. In recent years, breakthroughs have been made in traffic signal control algorithms based on deep reinforcement learning using real-time traffic data as input. However, traffic data in real-world scenarios often involve data distortion. Traditional solutions use reinforcement learning algorithms to control signal lights after repairing distorted data. However, on the one hand, the dynamic phases of traffic signal introduces additional uncertainty to distortion repair, and on the other hand, distortion repair is difficult to combine with deep reinforcement learning frameworks to improve performance. To address these issues, a distorted traffic signal control model based on hidden state prediction, HCRL, is proposed. The HCRL model comprises encoding, control, and encoding prediction sub-models. By introducing a hidden state representation mechanism for signalized intersections, the HCRL model can adapt better to deep reinforcement learning frameworks and effectively express the control state of signalized intersections. In addition, the HCRL model uses a special transfer training method to avoid data distortion interference in the control sub-model. Two real datasets are used to verify the impact of data distortion on the intelligent signal light control algorithms. The experimental results show that the HCRL model outperforms the distortion-completion-based traffic signal control models in all distortion scenarios and distortion rates; further, it demonstrates strong robustness against data distortion when compared with other baseline models.

  • Research Hotspots and Reviews
    ZHANG Jin, CHEN Zhu, CHEN Zhaoyun, SHI Yang, CHEN Guanjun
    Computer Engineering. 2025, 51(7): 1-11. https://doi.org/10.19678/j.issn.1000-3428.0068870

    Simulators play an indispensable role in an array of scientific fields involving research and development. Particularly in architectural design, simulators provide a secure and cost-effective virtual environment, enabling researchers to conduct rapid experimental analyses and evaluations. Simultaneously, simulators facilitate the acceleration of the chip design and verification processes, thereby conserving time and reducing resource expenditure. However, with the evolutionary advances in processor architectural designs—specifically, the flourishing diversifications featured in dedicated processors—the key role played by simulators in providing substantial feedback for architectural design exploration has gained prominence. This discourse provides an overview of the current developments and applications of architectural simulators, accentuating a few illustrative examples. Analyzing the techniques employed by simulators dedicated to various processors allows for a deeper understanding of the focal points and technical complexities under different architectures. Moreover, this discourse deliberates speculative assessments and critiques of vital aspects of future architectural simulator developments, aspiring to forecast their prospects in the field of processor design research.

  • Research Hotspots and Reviews
    PENG Long, GAO Yuanjun, LIU Xiaodong, YU Jie
    Computer Engineering. 2025, 51(10): 37-52. https://doi.org/10.19678/j.issn.1000-3428.0069708

    Advances in computational power and network technologies have driven robots toward miniaturization, swarm intelligence, and autonomous capabilities. Robot software deployed on robotic hardware must integrate diverse modules from low-level device drivers and controls to high-level motion planning and reasoning, resulting in increasingly complex architectures. A communication and programming framework for multi-robot systems—focusing on standardization, modularization, and platformization—can alleviate the complexity of programming robotic software. The development trends in robotic software and hardware architecture show that a swarm robotic system is a multi-domain, heterogeneous, and distributed system composed of computing nodes, actuators, sensors, and other hardware devices interconnected through wired or wireless networks. The heterogeneity of hardware devices makes it difficult to integrate software components into a single framework. This survey summarizes and analyzes existing robotic communication frameworks in terms of ease of use and portability, comparing their core features, such as programming models, heterogeneous hardware support, communication and coordination mechanisms between components, and programming languages. The survey then highlights the technical trends of advanced topics such as real-time virtualization, component orchestration, and fault tolerance. Moreover, this survey focuses on building a next-generation framework on a meta Operating System (OS) foundation, aiming to build a ubiquitous and integrated multi-robot software architecture for human-machine-object interactions.

  • Research Hotspots and Reviews
    DI Qinbo, CHEN Shaoli, SHI Liangren
    Computer Engineering. 2025, 51(11): 35-44. https://doi.org/10.19678/j.issn.1000-3428.0069780

    As multivariate time series data become increasingly prevalent across various industries, anomaly detection methods that can ensure the stable operation and security of systems have become crucial. Owing to the inherent complexity and dynamic nature of multivariate time series data, higher demands are placed on anomaly detection algorithms. To address the inefficiencies of existing anomaly detection methods in processing high-dimensional data with complex variable relations, this study proposes an anomaly detection algorithm for multivariate time series data, based on Graph Neural Networks (GNNs) and a diffusion model, named GRD. By leveraging node embedding and graph structure learning, GRD algorithm proficiently captures the relations between variables and refines features through a Gated Recurrent Unit (GRU) and a Denoising Diffusion Probabilistic Model (DDPM), thereby facilitating precise anomaly detection. Traditional assessment methods often employ a Point-Adjustment (PA) protocol that involves pre-scoring, substantially overestimating an algorithm's capability. To reflect model performance realistically, this work adopts a new evaluation protocol along with new metrics. The GRD algorithm demonstrates F1@k scores of 0.741 4, 0.801 7, and 0.767 1 on three public datasets. These results indicate that GRD algorithm consistently outperforms existing methods, with notable advantages in the processing of high-dimensional data, thereby underscoring its practicality and robustness in real-world anomaly detection applications.

  • Computer Engineering. 2025, 51(5): 0-0.
  • Service Computing in the Era of Large Language Models
    ZHANG Junna, WANG Hongzun, DING Chuntao
    Computer Engineering. 2026, 52(1): 33-60. https://doi.org/10.19678/j.issn.1000-3428.0252721

    Post-Training Quantization (PTQ) is an efficient model compression method that converts the parameters of high-precision floating-point models into low-bit integer representations without requiring retraining, using only a small amount of unlabeled calibration data. This method significantly reduces storage and computational overhead while maximizing the retention of the original model's inference accuracy; therefore, it is widely recognized and adopted in both academia and industry. This paper systematically summarizes the progress of research on PTQ from four dimensions: quantization steps, method classification, tool ecosystem, and application advancements. First, a clear framework for the quantization process is constructed, covering steps such as dynamic range statistics, quantization parameter calculation, weight and activation quantization, error optimization, and model generation. Second, a complete classification system for quantization methods is proposed, which includes quantization granularity, bit width, calibration methods, and structure-guided quantization. Third, the tool ecosystem supporting the large-scale application of PTQ is analyzed, and its value in hardware adaptation and engineering deployment is discussed. Finally, this paper summarizes the progress in the integration and application of PTQ methods and highlights practical challenges, particularly those related to cross-modal consistency, extremely low-bit semantic collapse, and hardware adaptation. These challenges not only reveal the limitations of current technologies but also provide important directions for future research. This review provides a reference framework for PTQ methods in academia and industry, thereby facilitating the widespread application of artificial intelligence in resource-constrained scenarios.

  • Artificial Intelligence and Pattern Recognition
    YUAN Yinghua, JIN Yingran, GAO Yun
    Computer Engineering. 2025, 51(12): 96-108. https://doi.org/10.19678/j.issn.1000-3428.0069871

    The Siamese tracking network is a popular target tracking framework that includes three modules: backbone, fusion, and positioning networks. The Transformer is a relatively new and effective implementation method for fusion network modules. The encoder and decoder of the Transformer use a self-attention mechanism to enhance the features of the Convolutional Neural Network (CNN). However, the self-attention mechanism can only enhance features in the spatial dimension without considering feature enhancement in the channel dimension. To enable the self-attention network of the Transformer to enhance features both in the spatial and channel dimensions and provide accurate correlation information for the target localization network, a Transformer tracker based on dual-dimensional feature enhancement is proposed to improve the Transformer fusion network. First, using the third- and fourth-stage features of the backbone network as inputs, channel dimension feature enhancement is performed via CAE-Net in the self-attention module of the Transformer encoder and decoder to enhance the importance of the channel. Subsequently, two-stage feature-weighted fusion and linear transformation are performed via SAE-Net to obtain the self-attention factors Q, K, and V. Finally, spatial dimension feature enhancement is performed via a self-attention operation. Experiments conducted on five widely used public benchmark datasets reveal that the improved Transformer feature fusion module can improve the tracking performance of the tracker with minimal reduction in speed of tracking.

  • Research Hotspots and Reviews
    LI Jiangxin, WANG Peng, WANG Wei
    Computer Engineering. 2025, 51(7): 47-58. https://doi.org/10.19678/j.issn.1000-3428.0069406

    Industrial time-series forecasting is critical for optimizing production processes and enhancing decision-making. Existing deep learning-based methods often underperform in this context due to a lack of domain knowledge. Prior studies have proposed using mechanistic models to guide deep learning; however, these approaches typically consider only a single mechanistic model, ignoring scenarios with multiple time-series prediction mechanisms in industrial processes and the inherent complexity of industrial time-series (e.g., multiscale dynamics and nonlinearity). To address this issue, this study proposes a Multi-Mechanism-guided Deep Learning for Industrial Time-series Forecasting (M-MDLITF) framework based on attention mechanisms. This framework embeds multiple mechanistic models into a deep industrial time-series prediction network to guide training and integrate the strengths of different mechanisms by focusing on final predictions. As an instantiation of the M-MDLITF, the Multi-mechanism Deep Wiener (M-DeepWiener) method employs contextual sliding windows and a Transformer-encoder architecture to capture complex patterns in industrial time-series. Experimental results from a simulated dataset and two real-world datasets demonstrate that M-DeepWiener achieves high computational efficiency and robustness. It significantly outperforms the single-mechanism Deep Wiener (DeepWiener), classical Wiener mechanistic models, and purely data-driven methods, reducing the prediction error by 20% compared to DeepWiener-M1 on the simulated dataset.

  • Research Hotspots and Reviews
    LU Yue, ZHOU Xiangyu, ZHANG Shizhou, LIANG Guoqiang, XING Yinghui, CHENG De, ZHANG Yanning
    Computer Engineering. 2025, 51(10): 1-17. https://doi.org/10.19678/j.issn.1000-3428.0070575

    Traditional machine learning algorithms perform well only when the training and testing sets are identically distributed. They cannot perform incremental learning for new categories or tasks that were not present in the original training set. Continual learning enables models to learn new knowledge adaptively while preventing the forgetting of old tasks. However, they still face challenges related to computation, storage overhead, and performance stability. Recent advances in pre-training models have provided new research directions for continual learning, which are promising for further performance improvements. This survey summarizes existing pre-training-based continual learning methods. According to the anti-forgetting mechanism, they are categorized into five types: methods based on prompt pools, methods with slow parameter updating, methods based on backbone branch extension, methods based on parameter regularization, and methods based on classifier design. Additionally, these methods are classified according to the number of phases, fine-tuning approaches, and use of language modalities. Subsequently, the overall challenges of continual learning methods are analyzed, and the applicable scenarios and limitations of various continual learning methods are summarized. The main characteristics and advantages of each method are also outlined. Comprehensive experiments are conducted on multiple benchmarks, followed by in-depth discussions on the performance gaps among the different methods. Finally, the survey discusses research trends in pre-training-based continual learning methods.

  • Artificial Intelligence and Pattern Recognition
    PENG Juhong, ZHANG Chi, GAO Qian, ZHANG Guangming, TAN Donghua, ZHAO Mingjun
    Computer Engineering. 2025, 51(7): 152-160. https://doi.org/10.19678/j.issn.1000-3428.0069283

    Steel surface defect detection technology in industrial scenarios is hindered by low detection accuracy and slow convergence speed. To address these issues, this study presents an improved YOLOv8 algorithm, namely a YOLOv8n-MDC. First, a Multi-scale Cross-fusion Network (MCN) is added to the backbone network. Establishing closer connections between the feature layers promotes uniform information transmission and reduces semantic information loss during cross-layer feature fusion, thereby enhancing the ability of the model to perceive steel defects. Second, deformable convolution is introduced in the module to adaptively change the shape and position of the convolution kernel, enabling a more flexible capture of the edge features of irregular defects, reducing information loss, and improving detection accuracy. Finally, a Coordinate Attention (CA) mechanism is added to embed position information into channel attention, solving the problem of position information loss and enabling the model to perceive the position and morphological features of defects, thereby enhancing detection precision and stability. Experimental results on the NEU-DET dataset show that the YOLOv8n-MDC algorithm achieves mAP@0.5 of 81.0%, which is 4.2 percentage points higher than that of the original baseline network. The algorithm has a faster convergence speed and higher accuracy; therefore, it meets the requirements of practical industrial production.

  • Graphics and Image Processing
    WANG Shumeng, XU Huiying, ZHU Xinzhong, HUANG Xiao, SONG Jie, LI Yi
    Computer Engineering. 2025, 51(9): 280-293. https://doi.org/10.19678/j.issn.1000-3428.0069353

    In Unmanned Aerial Vehicle (UAV) aerial photography, targets are usually small targets with dense distribution and unobvious features, and the object scale varies greatly. Therefore, the problems of missing detection and false detection are easy to occur in object detection. In order to solve these problems, a lightweight small object detection algorithm based on improved YOLOv8n, namely PECS-YOLO, is proposed for aerial photography. By adding P2 small object detection layer in the Neck part, the algorithm combines shallow and deep feature maps to better capture details of small targets. A lightweight convolution, namely PartialConv, is introduced to a new structure of Cross Stage Partial PartialConv (CSPPC), to replace Concatenation with Fusion (C2f) in the Neck network to realized lightweight of the model. By using a model of Spatial Pyramid Pooling with Efficient Layer Aggregation Network (SPPELAN), small object features can be captured effectively. By adding Squeeze-and-Excitation (SE)attention mechanism in front of each detection head in the Neck part, the network can better focus on useful channels and reduce the interference of background noise on small object detection tasks in complex environments. Finally, EfficiCIoU is used as the boundary frame loss function, and the shape difference of the boundary frame is also taken into account, which enhances the detection ability of the model for small targets. Experimental results show that, compared YOLOv8n, the mean Average Precision at Intersection over Union (IoU) of 0.5 (mAP@0.5) and the mean Average Precision at IoU of 0.5∶0.95 (mAP@0.5∶0.95) of PECS-YOLO object detection algorithm on VisDrone2019-DET dataset are increased by 3.5% and 3.7% respectively, the number of parameters is reduced by about 25.7%, and detection speed is increased by about 65.2%. In summary, PECS-YOLO model is suitable for small object detection in UAV aerial photography.

  • PU Zhenyu, LIU Zhiwei, HUANG Bo, HE Shufeng, CHEN Nanxi, HAO Wenzeng
    Accepted: 2025-04-25
    In the modern industrial sector, the perception and analysis of text data have become essential for promoting intelligent manufacturing and optimizing production processes. However, industrial text data is typically characterized by high specialization, diversity, and complexity, along with high annotation costs, making traditional large-scale annotation methods unsuitable. Existing few-shot named entity recognition(NER) methods often use prototypical networks to classify entities, where the prototype is the average of the features of all samples belonging to the same category. These methods, however, are highly sensitive to support set data and prone to sample selection bias. To address this, we propose a few-shot named entity recognition model based on distribution calibration—DC-NER(Distribution Calibration-based Named Entity Recognition). The model innovatively decomposes the task into two phases: span detection and entity classification. During the entity classification phase, a precise distance measurement function is employed to identify similar categories between the source domain and the target domain. Based on this, the distribution of samples in the target domain is corrected to generate more accurate class prototypes. Experimental results on both in-domain dataset (Few-NERD) and cross-domain dataset (Cross-NER) demonstrate that DC-NER significantly outperforms comparative models in terms of F1 score, validating its effectiveness in few-shot named entity recognition.
  • Research Hotspots and Reviews
    LIU Yanghong, FU Yangyouran, DONG Xingping
    Computer Engineering. 2025, 51(10): 18-26. https://doi.org/10.19678/j.issn.1000-3428.0070569

    The generation of High-Definition (HD) environmental semantic maps is indispensable for environmental perception and decision making in autonomous driving systems. To address the modality discrepancy between cameras and LiDARs in perception tasks, this paper proposes an innovative multimodal fusion framework, HDMapFusion, which significantly improves semantic map generation accuracy via feature-level fusion. Unlike traditional methods that directly fuse raw sensor data, our approach innovatively transforms both camera images and LiDAR point cloud features into a unified Bird's-Eye-View (BEV) representation, enabling physically interpretable fusion of multimodal information within a consistent geometric coordinate system. Specifically, this method first extracts visual features from camera images and 3D structural features from LiDAR point clouds using deep learning networks. Subsequently, a differentiable perspective transformation module converts the front-view image features into a BEV space and the LiDAR point clouds are projected into the same BEV space through voxelization. Building on this, an attention-based feature fusion module is designed to adaptively integrate the two modalities using weighted aggregation. Finally, a semantic decoder generates high-precision semantic maps containing lane lines, pedestrian crossings, road boundary lines, and other key elements. Systematic experiments conducted on the nuScenes benchmark dataset demonstrate that HDMapFusion significantly outperforms existing baseline methods in terms of HD map generation accuracy. These results validate the effectiveness and superiority of the proposed method, offering a novel solution to multimodal fusion in autonomous driving perception.

  • FENG Guoping, CHEN Zhijian, Lin Zhiyu, HONG Liang
    Accepted: 2025-10-11
    This study explores automatic term recognition in the electric power domain, addressing challenges faced during its digital transformation, such as data silos and knowledge utilization. To improve the identification of specialized and new terms, a dynamic graph-assisted method combining large and small models is proposed. The approach enhances recall and precision through candidate term extraction and term classification. An initial knowledge graph is built using existing term databases. Target text-related nodes are queried and filtered with term features. A retrieval-augmented large language model extracts candidate terms, followed by adversarial training to develop a deep learning model for term classification. The dynamic term knowledge graph is iteratively updated based on classification results, forming a positive feedback loop. Experimental results show that the method's accuracy, recall, and F1 score improve over iterations, reaching 0.8647, 0.8565, and 0.8542, respectively, demonstrating superior performance compared to other term recognition methods.
  • Artificial Intelligence and Pattern Recognition
    HUANG Kun, QI Zhaojian, WANG Juanmin, HU Qian, HU Weichao, PI Jianyong
    Computer Engineering. 2025, 51(5): 133-142. https://doi.org/10.19678/j.issn.1000-3428.0069026

    Pedestrian detection in crowded scenes is a key technology in intelligent monitoring of public space. It enables the intelligent monitoring of crowds, using object detection methods to detect the positions and number of pedestrians in videos. This paper presents Crowd-YOLOv8, an improved version of the YOLOv8 detection model, to address the issue of pedestrians being easily missed owing to occlusion and small target size in densely populated areas. First, nostride-Conv-SPD is introduced into the backbone network to enhance its capability of extracting fine-grained information, such as small object features in images. Second, small object detection heads and the CARAFE upsampling operator are introduced into the neck part of the YOLOv8 network to fuse features at different scales and improve the detection performance in the case of small targets. Experimental results demonstrate that the proposed method achieves an mAP@0.5 of 84.3% and an mAP@0.5∶0.95 of 58.2% on a CrowdedHuman dataset, which is an improvement of 3.7 and 5.2 percentage points, respectively, compared to those of the original YOLOv8n. On the WiderPerson dataset, the proposed method achieves an mAP@0.5 of 88.4% and an mAP@0.5∶0.95 of 67.4%, which is an improvement of 1.1 and 1.5 percentage points compared to those of the original YOLOv8n.

  • Graphics and Image Processing
    WANG Guoming, JIA Daiwang
    Computer Engineering. 2025, 51(12): 294-303. https://doi.org/10.19678/j.issn.1000-3428.0070027

    Deep learning-based object detection has significantly improved the detection of medium and large targets. However, when detecting small objects, traditional algorithms often face challenges such as missed detections and false positives owing to the inherent issues of small scale and complex backgrounds. Therefore, this study aims to enhance the accuracy of small object detection by improving the YOLOv8 model. First, the convolutional module in the backbone is replaced with the RFAConv module, which enhances the ability of the model to process complex images. Second, a Mixed Local Channel Attention (MLCA) mechanism is introduced in the neck part, allowing the model to fuse features from different layers more efficiently while maintaining computational efficiency. Third, the Detect head of YOLOv8 is replaced with the Detect_FASFF head to address the inconsistency between different feature scales and improve the ability of the model to detect small objects. Finally, the Complete Intersection over Union (CIoU) loss function is replaced with the Focaler-IoU loss function, enabling the model to focus more on small objects that are difficult to locate precisely. Experimental results show that the improved model increases mAP@0.5 by 4.8 percentage points and mAP@0.5:0.95 by 3.0 percentage points on the FloW-Img dataset, which is sparse in small objects. On the VisDrone2019 dataset which has a high density of small objects, mAP@0.5 increases by 5.9 percentage points and mAP@0.5:0.95 improves by 4.0 percentage points. In addition, generalization comparison experiments are conducted on the low-altitude dataset AU-AIR and the pedestrian-dense detection dataset WiderPerson. The optimized model significantly improves the accuracy of small object detection compared with the original model and expands its applicability.

  • Artificial Intelligence and Pattern Recognition
    DENG Zexian, ZHANG Yungui, ZHANG Lin
    Computer Engineering. 2025, 51(5): 154-165. https://doi.org/10.19678/j.issn.1000-3428.0069143

    Multi-dimensional time series classification is widely used in industry, medical treatment, finance and other fields; it plays an important role in industrial product quality control, disease prediction, financial risk control and so on. Aiming at the problem that time dependence and spatial dependence of multi-dimensional time series are equally important, and that traditional multi-dimensional time series models only focus on a certain dimension of time or space, this paper proposes a multi-dimensional time series classification model based on the pre-trained recursive Transformer-Mixer PRTMMTSC. The model is based on a Transformer-Mixer module that can fully learn the temporal and spatial correlations of multi-dimensional time series. To further improve the classification performance, inspired by the anomaly detection model, the proposed model combines the pre-trained hidden layer features and the residual features, and uses the PolyLoss loss function for training. To reduce the number of model training parameters, the Transformer-Mixer module in the model is constructed recursively, so that the number of multi-layer trainable parameters is only the number of single-layer Transformer-Mixer parameters. The experimental results on the UEA datasets show that the performance of the proposed model is better than that of the contrast models. Compared with the TARNet model and the RLPAM model, the accuracy of proposed model has increased by 3.03% and 4.69%, respectively. Ablation experiments on the UEA and the IF steel inclusions defect classification further illustrate the effectiveness of the proposed pre-trained method, Transformer-Mixer module, residual information, and the PolyLoss loss function.

  • Research Hotspots and Reviews
    LIAO Niuyu, TIAN Yun, LI Yansong, XUE Haifeng, DU Changkun, ZHANG Guohua
    Computer Engineering. 2025, 51(12): 1-17. https://doi.org/10.19678/j.issn.1000-3428.0253230

    In recent years, Large Language Models (LLMs) such as GPT, LLaMA, Qwen, and DeepSeek, have achieved significant breakthroughs in natural language processing, computer vision, multimodal learning, and other fields. However, constrained by factors such as their reasoning mechanisms, parameter scales, and the inherent knowledge contained within their training data, these models often suffer from issues like ″hallucinations″—characterized by inaccurate answers and even factual deviations—when handling complex tasks, addressing questions from professional domains, or generating time-sensitive content. These limitations severely hinder their application in high-reliability scenarios. The ″tool learning″ paradigm is attracting increasing attention as a promising solution to these capability bottlenecks. Its primary objective is to enable LLMs to understand and utilize external tools to complete specific tasks. By invoking external tools, such as databases, search engines, and mathematical tools, LLMs can transcend their parameterized knowledge; enhance their reasoning, decision-making, and execution capabilities; and mitigate hallucination problems. This paper systematically reviews the development context and technical advancements in LLM tool learning, analyzes the expansion of LLM capabilities through tools, summarizes tool invocation mechanisms ranging from in-context learning to fine-tuning training, and discusses key issues including performance optimization and adaptive tool generation. The paper also analyzes evaluation methods for LLM tool invocation, summarizes the current challenges in tool learning, and outlines future research directions.

  • Research Hotspots and Reviews
    ZHAO Kai, HU Yuhuan, YAN Junqiao, BI Xuehua, ZHANG Linlin
    Computer Engineering. 2025, 51(8): 1-15. https://doi.org/10.19678/j.issn.1000-3428.0069147

    Blockchain, as a distributed and trusted database, has gained significant attention in academic and industrial circles for its effective application in the domain of digital copyright protection. Traditional digital copyright protection technologies suffer from issues such as difficulties in tracking infringements, complexities in copyright transactions, and inadequate protection of legitimate rights, which severely hampering the development of digital copyright protection endeavors. The immutability, traceability, and decentralization inherent in blockchain technology provide a highly reliable, transparent, and secure solution to mitigate the risks associated with digital copyright infringement. This overview starts with an introduction to the fundamental principles of blockchain technology. Then, it discusses the latest research findings on the integration of blockchain with traditional copyright protection technologies to address the problems inherent in traditional copyright protection schemes. Further, an evaluation of the practical applications and potential of blockchain is conducted, emphasizing its positive impact on the copyright protection ecosystem. Finally, this overview delves into the challenges and future trends related to blockchain based copyright protection, ultimately aiming to establish a more robust and sustainable blockchain copyright protection system.

  • Artificial Intelligence and Pattern Recognition
    WU Donghui, WANG Jinfeng, QIU Sen, LIU Guozhi
    Computer Engineering. 2025, 51(8): 107-119. https://doi.org/10.19678/j.issn.1000-3428.0070202
    Sign language recognition has received widespread attention in recent years. However, existing sign language recognition models face challenges, such as long training times and high computational costs. To address this issue, this study proposes a hybrid deep learning method that integrates an attention mechanism with an Expanded Wide-kernel Deep Convolutional Neural Network (EWDCNN) and a Bidirectional Long Short-Term Memory (BiLSTM) network based on data obtained from a wearable data glove, EWBiLSTM-ATT model. First, by widening the first convolutional layer, the model parameter count is reduced, which enhances computational speed. Subsequently, by deepening the EWDCNN convolutional layers, the model's ability to automatically extract features from sign language is improved. Second, BiLSTM is introduced as a temporal model to capture the dynamic temporal information of sign language sequential data, effectively handling temporal relationships in the sensor data. Finally, the attention mechanism is employed to map the weighted sum and learn a parameter matrix that assigns different weights to the hidden states of BiLSTM, allowing the model to automatically select key time segments related to gesture actions by calculating the attention weights for each time step. This study uses the STM32F103 as the main control module and builds a data glove sign language acquisition platform with MPU6050 and Flex Sensor 4.5 sensors as the core components. Sixteen dynamic sign language actions are selected to construct the GR-Dataset data training model. Under the same experimental conditions, compared to the CLT-net, CNN-GRU, CLA-net, and CNN-GRU-ATT models, the recognition rate of the EWBiLSTM-ATT model is 99.40%, which is increased by 10.36, 8.41, 3.87, and 3.05 percentage points, respectively. Further, the total training time is reduced to 57%, 61%, 55%, and 56% of the comparison models, respectively.
  • Space-Air-Ground Integrated Computing Power Networks
    LI Bin, SHAN Huimin
    Computer Engineering. 2025, 51(5): 1-8. https://doi.org/10.19678/j.issn.1000-3428.0069423

    To address the challenges of insufficient computing capacity of end users and the unbalanced distribution of computing power among edge nodes in computing power networks, this study proposes an Unmanned Aerial Vehicle (UAV)-assisted Device-to-Device (D2D) edge computing solution based on incentive mechanisms. First, under constraints involving computing resources, transmission power, and the unit pricing of computing resources, a unified optimization problem is formulated to maximize system revenue. This problem aims to optimize the task offloading ratio, computing resource constraints, UAV trajectory, as well as the transmission power and unit pricing of computing resources for both UAVs and users. The Proximal Policy Optimization (PPO) algorithm is employed to establish user offloading and purchasing strategies. In addition, an iterative strategy is implemented at each time step to solve the optimization problem and obtain the optimal solution. The simulation results demonstrate that the PPO-based system revenue maximization algorithm exhibits superior convergence and improves overall system revenue compared to the baseline algorithm.

  • Frontier Perspectives and Reviews
    QIN Yingxin, ZHANG Kejia, PAN Haiwei, JU Yahao
    Computer Engineering. 2026, 52(2): 46-68. https://doi.org/10.19678/j.issn.1000-3428.0069826

    Deep learning has driven the development of artificial intelligence, which is widely used in computer vision. It provides breakthroughs and remarkable results in complex tasks such as image recognition, object detection, object tracking, and face recognition, demonstrating its excellent recognition and prediction capabilities. However, vulnerabilities and loopholes in deep learning models have been gradually exposed. Deep learning techniques, represented by convolutional neural networks, are extremely sensitive to well-designed adversarial examples, which can easily affect the security and privacy of the models. This paper first summarizes the concept of adversarial attacks, reasons for generating adversarial examples, and related terms. It outlines several types of classical adversarial attack strategies in the digital and physical domains and analyzes their advantages and disadvantages. Second, it focuses on computer vision and summarizes the latest research in adversarial attacks during tasks such as object detection, face recognition, object tracking, monocular depth estimation, and optical flow estimation, from both the digital and physical domains, as well as the various datasets commonly used in the study. It also briefly introduces the current stage of adversarial example defense and detection methods, summarizes the advantages and disadvantages of these methods, and describes examples of the applications of adversarial sample defense for various visual tasks. Finally, based on the summary of adversarial attack methods, it explores and analyzes the deficiencies and challenges of existing computer vision adversarial attacks.

  • Research Hotspots and Reviews
    PANG Xin, GE Fengpei, LI Yanling
    Computer Engineering. 2025, 51(6): 1-19. https://doi.org/10.19678/j.issn.1000-3428.0069005

    Acoustic Scene Classification (ASC) aims to enable computers to simulate the human auditory system in the task of recognizing various acoustic environments, which is a challenging task in the field of computer audition. With rapid advancements in intelligent audio processing technologies and neural network learning algorithms, a series of new algorithms and technologies for ASC have emerged in recent years. To comprehensively present the technological development trajectory and evolution in this field, this review systematically examines both early work and recent developments in ASC, providing a thorough overview of the field. This review first describes application scenarios and the challenges encountered in ASC and then details the mainstream frameworks in ASC, with a focus on the application of deep learning algorithms in this domain. Subsequently, it systematically summarizes frontier explorations, extension tasks, and publicly available datasets in ASC and finally discusses the prospects for future development trends in ASC.

  • Artificial Intelligence and Pattern Recognition
    WANG Shuai, SHI Yancui
    Computer Engineering. 2025, 51(8): 190-202. https://doi.org/10.19678/j.issn.1000-3428.0069636

    The sequence recommendation algorithm dynamically models the user's historical behavior to predict the content they may be interested in. This study focuses on the application of contrastive Self Supervised Learning (SSL) in sequence recommendation, enhancing the model's representation ability in sparse data scenarios by designing effective self supervised signals. First, a personalized data augmentation method incorporating user preferences is proposed to address the issue of noise introduced by random data augmentation. This method guides the augmentation process based on user ratings and combines different augmentation methods for short and long sequences to generate augmented sequences that align with user preferences. Second, a mixed-augmentation training approach is designed to address the issue of imbalanced feature learning during training. In the early stages of training, augmentation sequences are generated using randomly selected methods to enhance the model performance and generalization. In the later stages, augmentation sequences with high similarity to the original sequences are selected to enable the model to comprehensively learn the actual preferences and behavior patterns of users. Finally, traditional sequence prediction objectives are combined with SSL objectives to infer user representations. Experimental verification is performed using the Beauty, Toys, and Sports datasets. Compared with the best result in the baseline model, the HR@5 indicator of the proposed method increases by 6.61%, 3.11%, and 3.76%, and the NDCG@5 indicator increases by 11.40%, 3.50%, and 2.16%, respectively, for the aforementioned datasets. These experimental results confirm the rationality and validity of the proposed method.

  • Artificial Intelligence and Pattern Recognition
    CHANG Ru, LIU Yujie, SUN Haojie, DONG Liwei
    Computer Engineering. 2025, 51(9): 110-119. https://doi.org/10.19678/j.issn.1000-3428.0069711

    Aiming at non-affine nonlinear multi-Agent systems with full-state constraints, this study investigates an event-triggered formation control strategy with prescribed performance. The study proposes a barrier function-based nonlinear mapping technique to transform full-state constraints into the boundedness of mapped variables, thereby eliminating feasibility conditions in the controller design. Then, it introduces a shift function and a prescribed time-convergent performance function to constrain the formation tracking error. Consequently, the restriction that the initial value of the formation tracking error must be within the performance constraint range is eliminated, thus improving formation performance. The study also designs an event-triggered prescribed performance formation controller to guarantee that Agents achieve the desired formation within a prescribed time and maintain it thereafter, while significantly reducing controller—actuator signal transmissions. Lyapunov stability analysis proves that all signals in the system are semi-globally, uniformly, and ultimately bounded. The theoretical analysis rules out the possibility of Zeno behavior occurring. Finally, numerical simulations verify the effectiveness of the proposed method.

  • Artificial Intelligence and Pattern Recognition
    ZHOU Shixiang, YU Kai
    Computer Engineering. 2025, 51(11): 144-151. https://doi.org/10.19678/j.issn.1000-3428.0069721

    With the development of social networks, people are increasingly expressing their emotions through multimodal data, such as audio, text, and video. Traditional sentiment analysis methods struggle to process emotional expressions in short videos effectively, and existing multimodal sentiment analysis techniques face issues such as low accuracy and insufficient interaction between modes. To address these problems, this study proposes a Multimodal Sentiment Analysis method based on Dense Co-Attention (DCA-MSA). First, it utilizes the pre-trained Bidirectional Encoder Representations from Transformers (BERT) model, OpenFace 2.0 model, and COVAREP tool to extract features from text, video, and audio, respectively. It then employs a Bidirectional Long Short-Term Memory (BiLSTM) network to model the temporal correlations within different features separately. Finally, it integrates different features through a dense co-attention mechanism. The experimental results show that the model proposed in this paper is competitive in multimodal sentiment analysis tasks compared to some baseline models: on the CMU-MOSEI dataset, the highest increase in binary classification accuracy is 3.7 percentage points, and the highest increase in F1 value is 3.1 percentage points; on the CH-SIMS dataset, the highest increase in binary classification accuracy is 4.1 percentage points, the highest increase in three-classification accuracy is 2.8 percentage points, and the highest increase in F1 value is 3.9 percentage points.

  • Multimodal Information Fusion
    LI Jianlang, WU Xindian, CHEN Ling, YANG Bo, TANG Wensheng
    Computer Engineering. 2026, 52(2): 299-310. https://doi.org/10.19678/j.issn.1000-3428.0070113

    This study proposes a Common and Differential Cross-Attention Module-Bird's-Eye View (CDCAM-BEV) algorithm that combines 4D millimeter-wave radar and vision fusion to improve target detection accuracy for pedestrian and vehicle target recognition and localization in autonomous driving scenarios. First, a radar cylinder network is designed to encode the 4D radar point cloud into a pseudo image and convert the monocular image into a Bird's-Eye View (BEV) feature through Orthogonal Feature Transformation (OFT). Second, based on the cross-attention mechanism, a Common Information Extraction Module (CICAM) and a Differential Information Extraction Module (DICAM) are used to fully explore the common and differential information between radar and images. Finally, a BEV feature fusion module is designed based on CICAM and DICAM to achieve feature-level fusion of image and radar information in the BEV space. Experiments are conducted on the VOD dataset, and the CDCAM-BEV algorithm is compared with five other 3D object detection algorithms. The experimental results show that CDCAM-BEV achieves better detection performance in multiple modes. In the 3D mode, the average detection accuracy of CDCAM-BEV is 3.65 percentage points higher than that of the second ranked Part-A2; in the BEV mode, it is 5.04 percentage points higher than that of the second ranked PointPillars; in the Average Directional Similarity (AOS) mode, it is 2.62 percentage points higher than that of the second ranked Part-A2. These results show that CDCAM-BEV exhibits excellent performance in all modes, effectively fusing images and 4D radar point cloud features, which significantly improves the accuracy and reliability of object detection.

  • Artificial Intelligence and Pattern Recognition
    SHEN Sitong, WANG Yaowu, XIE Zaipeng, TANG Bin
    Computer Engineering. 2025, 51(6): 102-115. https://doi.org/10.19678/j.issn.1000-3428.0070739

    Multi-Agent Reinforcement Learning (MARL) plays a crucial role in solving complex cooperative tasks. However, traditional methods face significant limitations in dynamic environments and information nonstationarity. To address these challenges, this paper proposes a Role learning-based Multi-Agent reinforcement learning framework (RoMAC). The framework employs role division based on action attributes and uses a role assignment network to dynamically allocate roles to agents, thereby enhancing the efficiency of multiagent collaboration. The framework adopts a hierarchical communication design, including inter-role communication based on attention mechanisms and inter-agent communication guided by mutual information. In interrole communication, it leverages attention mechanisms to generate efficient communication messages for coordination between role delegates. In inter-agent communication, it uses mutual information to generate targeted information and improve decision-making quality within role groups. Experiments conducted in the StarCraft Multi-Agent Challenge (SMAC) environment show that RoMAC achieves an average win rate improvement of approximately 8.62 percentage points, a reduction in convergence time by 0.92×106 timesteps, and a 28.18 percentage points average decrease in communication load. Ablation studies further validate the critical contributions of each module in enhancing the performance, demonstrating the robustness and flexibility of the model. Overall, the experimental results indicate that RoMAC offers significant advantages in MARL and cooperative tasks, providing reliable support to efficiently address complex challenges.

  • Artificial Intelligence and Pattern Recognition
    ZHANG Hong, LI Feng, MA Yanhong, JI Wenxuan, ZHENG Qipeng
    Computer Engineering. 2025, 51(10): 140-149. https://doi.org/10.19678/j.issn.1000-3428.0069489

    Accurate photovoltaic power prediction is crucial for enhancing grid stability and improving energy utilization efficiency. To address the limitations of existing methods, which struggle to simultaneously consider both long-term dependencies and short-term variation patterns of photovoltaic power, this study proposes a novel photovoltaic power prediction method named Solarformer. This method integrates a Pyramid Attention Module (PAM) with a Temporal Convolutional Network (TCN) to optimize the Transformer architecture. First, multiple feature selection mechanisms are employed to screen the input features, to enhance the model′s ability to characterize photovoltaic data features. Second, a coarse-grained construction module and PAM are utilized to optimize the Transformer encoder, capturing the long-term temporal dependency features of photovoltaic power at multiple scales. Third, a constraint mechanism based on the sunrise-sunset effect of photovoltaic power and the TCN are employed to optimize the Transformer decoder, strengthening the model′s ability to capture short-term variation features of photovoltaic power and better model its short-term variation patterns. Experimental results on the Sanyo dataset from Australia demonstrate that Solarformer can effectively improve photovoltaic power forecasting accuracy. Compared with the DLinear model, it reduces the Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Symmetric Mean Absolute Percentage Error (SMAPE) by approximately 7.45%, 6.99%, and 14.10%, respectively.

  • Artificial Intelligence and Pattern Recognition
    XU Lei, ZENG Yan, YUAN Junfeng, YUE Lupeng, YIN Yuyu, ZHANG Jilin, XUE Meiting, HAN Meng
    Computer Engineering. 2025, 51(11): 90-99. https://doi.org/10.19678/j.issn.1000-3428.0069678

    As core data for maritime traffic, ship trajectory data can be used for trajectory prediction, early warning, and other tasks with pronounced temporal characteristics. However, owing to factors such as harsh marine environments and poor communication reliability, missing ship trajectory data is a common problem. Learning from time series containing missing data can affect the accuracy of time series analysis significantly. The current mainstream solution is to approximate the imputation of missing data, mainly based on convolutional models, to reshape the time series along a timeline to capture its local features of the time series. However, the ability to capture the global features of long time series is limited. The Transformer enhances the ability of a model to capture the global features of a time series by capturing the relationships between various time points in the time series through its core self-attention mechanism. However, because its attention is calculated through matrix multiplication, it ignores the temporal nature of the time series, and the obtained global feature weights do not have a time span dependency. Therefore, to address the issue of capturing global features in long time series, this study proposes the GANet, a variant network based on the self-attention mechanism. GANet first obtains the basic global feature weight matrix from the time series points through the self-attention mechanism and then uses gated recurrent units to forget and update this global feature weight matrix on the timeline, thereby obtaining a global feature weight matrix with time span dependency, which is then used for data reconstruction to impute the missing data. GANet combines the self-attention mechanism and gating mechanism to capture global features while considering the impact of the time span on different time points, making the captured global feature time span dependent. Experimental results show that compared with existing models, such as Autoformer and Informer, GANet achieves better interpolation performance on Trajectory, ETT, and Electricity datasets.

  • Artificial Intelligence and Pattern Recognition
    LI Bowen, DING Muheng, FANG Meihua, ZHU Guiping, WEI Zhiyong, CHENG Wei, LI Yayun, BIAN Shuangshuang
    Computer Engineering. 2025, 51(10): 87-96. https://doi.org/10.19678/j.issn.1000-3428.0069857

    Driver fatigue is a major cause of traffic accidents, and driver fatigue state classification based on Electroencephalograms (EEGs) is an important task in the field of artificial intelligence. In recent years, deep learning models that incorporate attention mechanisms have been widely applied to EEG-based fatigue recognition. While these approaches have shown promise, several studies disregard the inherent features of EEG data itself. Additionally, the exploration of the mechanisms and effects of attention on the classifier is vague, which results in failure to explain the specific effects of different attention states on classification performance. Therefore, this study selects the SEED-VIG data as the research object and adopts the ReliefF feature selection algorithm to construct optimized models of Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM) network, and Support Vector Machine (SVM) based on self attention, multihead attention, channel attention, and spatial attention mechanisms. Experimental results on the EEG data included in the SEED-VIG dataset show that the performance of several neural network optimization models based on multimodal attention mechanisms has improved in terms of accuracy, recall rate, F1 score, and other indicators. Among them, the Convolutional Block Attention Module (CBAM)-CNN model, which can enhance spatial and channel information, achieves the best performance with 84.7% mean accuracy with 0.66 standard deviation.

  • Graphics and Image Processing
    SHA Yuyang, LU Jingtao, DU Haofan, ZHAI Xiaobing, MENG Weiyu, LIAN Xu, LUO Gang, LI Kefeng
    Computer Engineering. 2025, 51(7): 314-325. https://doi.org/10.19678/j.issn.1000-3428.0068674

    Image segmentation is a crucial technology for environmental perception, and it is widely used in various scenarios such as autonomous driving and virtual reality. With the rapid development of technology, computer vision-based blind guiding systems are attracting increasing attention as they outperform traditional solutions in terms of accuracy and stability. The semantic segmentation of road images is an essential feature of a visual guiding system. By analyzing the output of algorithms, the guiding system can understand the current environment and aid blind people in safe navigation, which helps them avoid obstacles, move efficiently, and get the optimal moving path. Visual blind guiding systems are often used in complex environments, which require high running efficiency and segmentation accuracy. However, commonly used high-precision semantic segmentation algorithms are unsuitable for use in blind guiding systems owing to their low running speed and a large number of model parameters. To solve this problem, this paper proposes a lightweight road image segmentation algorithm based on multiscale features. Unlike existing methods, the proposed model contains two feature extraction branches, namely, the Detail Branch and Semantic Branch. The Detail Branch extracts low-level detail information from the image, while the Semantic Branch extracts high-level semantic information. Multiscale features from the two branches are processed and used by the designed feature mapping module, which can further improve the feature modeling performance. Subsequently, a simple and efficient feature fusion module is designed for the fusion of features with different scales to enhance the ability of the model in terms of encoding contextual information by fusing multiscale features. A large amount of road segmentation data suitable for blind guiding scenarios are collected and labeled, and a corresponding dataset is generated. The model is trained and tested on the dataset. The experimental results show that the mean Intersection over Union (mIoU) of the proposed method is 96.5%, which is better than that of existing image segmentation models. The proposed model can achieve a running speed of 201 frames per second on NVIDIA GTX 3090Ti, which is higher than that of existing lightweight image segmentation models. The model can be deployed on NVIDIA AGX Xavier to obtain a running speed of 53 frames per second, which can meet the requirements for practical applications.

  • Frontier Perspectives and Reviews
    LI Yiyang, LU Shenglian, WANG Jijie, CHEN Ming
    Computer Engineering. 2026, 52(4): 62-81. https://doi.org/10.19678/j.issn.1000-3428.0069312
    Convolutional Neural Networks (CNNs) are widely used in the field of object detection, earning widespread acclaim in scholarly circles due to their precision and scalability. It has spawned numerous notable models, including those in the Region-based Convolutional Neural Networks (R-CNNs) (such as Fast R-CNN and Faster R-CNN) and You Only Look Once (YOLO) series. After the success of Transformers in the field of natural language processing, researchers began exploring their application in computer vision, leading to the development of visual backbone networks such as Visual Transformer (ViT) and Swin Transformer. In 2020, a Facebook research team unveiled DEtection TRansformer (DETR), an end-to-end object detection algorithm based on Transformers, designed to minimize the need for prior knowledge and postprocessing in object detection tasks. Despite the promise shown by DETR in object detection, it has limitations including low convergence speed, relatively low accuracy, and the ambiguous physical significance of target queries. These issues have spurred a wave of research aimed at refining and enhancing the algorithm. This paper aims to collate, scrutinize, and synthesize the various efforts aimed at improving DETR, assessing their respective merits and demerits. Furthermore, it presents a comprehensive overview of state-of-the-art research and specialized application domains that employ DETR and concludes with a prospective analysis of the future role of DETR in the field of computer vision.
  • Research Hotspots and Reviews
    Mayilamu Musideke, GAO Yuxin, ZHANG Situo, FENG Ke, Abudukelimu Abulizi, Halidanmu Abudukelimu
    Computer Engineering. 2025, 51(8): 16-38. https://doi.org/10.19678/j.issn.1000-3428.0070619

    With the rapid advancement of general artificial intelligence technology, the application of foundational models across various fields has gained increasing attention. In image segmentation, the Segment Anything Model (SAM), as a foundational model, demonstrates notable advantages in enhancing image comprehension and processing efficiency. While SAM achieves state-of-the-art performance in image segmentation, further optimization in power consumption, computational efficiency, and cross-domain adaptability is required. This review provides an in-depth exploration of the potential improvements to SAM across several crucial dimensions, such as enhancing speed and computational efficiency, improving model accuracy and robustness, increasing adaptability and generalization, optimizing prompt engineering, and boosting data utilization and transfer learning capabilities. With these enhancements, SAM is expected to sustain high efficiency in highly complex tasks and better meet requirements of various fields and application contexts. In addition, this review summarizes the practical applications of SAM in various fields, including medical imaging, remote sensing, and the mechanical industry, and demonstrates the suitability and challenges of the model in different scenarios. Moreover, this review provides a detailed overview of commonly used datasets and evaluation metrics in the field of image segmentation. Through experimental comparative analyses, the impact of Vision Transformer (ViT) variants on the performance of SAM is assessed, along with performance evaluations of enhanced models, such as EfficientSAM, EfficientViT-SAM, MobileSAM, and RobustSAM. The challenges faced by SAM and its improved models in real-world applications are also discussed, and future research directions are proposed. This review aims to provide researchers with a comprehensive understanding of the advancements and applications of SAM and its variants, offering insights that may inform the development of new models.

  • Space-Air-Ground Integrated Computing Power Networks
    WANG Kewen, ZHANG Weiting, SUN Tong
    Computer Engineering. 2025, 51(5): 52-61. https://doi.org/10.19678/j.issn.1000-3428.0069471

    In response to the increasing demand for fast response and large-scale coverage in application scenarios such as satellite data processing and vehicle remote control, this study focuses on utilizing hierarchical control and artificial intelligence technology to design a resource scheduling mechanism for space-air-ground integrated computing power networks. Air, space, and ground networks are divided into three domains, and domain controllers are deployed for resource management in the corresponding local domain. The areas are divided based on the coverage of satellites and drones to ensure that they can achieve effective service guarantees, efficient data transmission, and task processing. A multi-agent reinforcement learning-based scheduling algorithm is proposed to optimize resource utilization in space-air-ground integrated computing power networks, considering each domain controller is treated as an agent with task scheduling and resource allocation capabilities. Intelligent resource scheduling and efficient resource allocation for computing tasks are realized through collaborative learning and distributed decision-making with satisfactory delay and energy consumption constraints. Computing tasks are generated in different scenarios and processed in real time. Simulation results show that the proposed mechanism can effectively improve resource utilization and shorten task response time.

  • AI-Enabled Vehicular Edge Computing
    ZHU Siyuan, LI Jiasheng, ZOU Danping, HE Di, YU Wenxian
    Computer Engineering. 2025, 51(9): 14-24. https://doi.org/10.19678/j.issn.1000-3428.0069534

    Detecting defects on unstructured roads is important for road traffic safety; however, annotated datasets required for detection is limited. This study proposes the Multi-Augmentation with Memory (MAM) semi-supervised object detection algorithm to address the lack of annotated datasets for unstructured roads and the inability of existing models to learn from unlabeled data. First, a cache mechanism is introduced to store the positions of the bounding box regression information for unannotated images and images with pseudo annotations, avoiding computational resource wastage caused by subsequent matching. Second, the study proposes a hybrid data augmentation strategy that mixes the cached pseudo-labeled images with unlabeled images inputted into the student model, to enhance the model′s generalizability to new data and balance the scale distribution of images. The MAM semi-supervised object detection algorithm is not limited by the object detection model and better maintains the consistency of object bounding boxes, thus avoiding the need to compute consistency loss. Experimental results show that the MAM algorithm is superior to other fully supervised and semi-supervised learning algorithms. On a self-built unstructured road defect dataset, called Defect, the MAM algorithm achieves improvements of 6.8, 11.1, and 6.0 percentage points in terms of mean Average Precision (mAP) compared to those of the Soft Teacher algorithm in scenarios with annotation ratios of 10%, 20%, and 30%, respectively. On a self-built unstructured road pothole dataset, called Pothole, the MAM algorithm achieves mAP improvements of 5.8 and 4.3 percentage points compared to those of the Soft Teacher algorithm in scenarios with annotation ratios of 15% and 30%, respectively.

  • Space-Air-Ground Integrated Computing Power Networks
    MO Dingtao, JU Ying, LI Wenjin, ZHANG Yasheng, HE Ci, DONG Feihu
    Computer Engineering. 2025, 51(5): 9-19. https://doi.org/10.19678/j.issn.1000-3428.0069654

    Satellite networks have wide coverage, strong mobility, and ultralow power consumption, which allow them to act as an extension to ground communication networks, thereby promoting the construction of integrated space-ground networks. However, the opening and popularization of satellite services have increased network traffic and made it more complex, making their management and service scheduling challenging. Thus, designing an efficient network traffic classification method and allocating reasonable computing resources to different types of satellite network traffic have become critical to alleviating the pressure on satellite networks. Traditional network traffic classification methods based on ports, payloads, statistics, and behavior have issues concerning effectiveness and privacy, making them inadequate for complex network services. Various technologies are widely applied in the development of large models. Therefore, to enhance the operational efficiency of satellite networks and optimize their computing power, this study proposes a network traffic classification method based on the Global Perception Module (GPM) and ViT (Vision Transformer) model. This method transforms network traffic data into grayscale images and extracts features to fully capture global and local information. The processed data are then input into the ViT model, which leverages its multihead attention mechanism to extract data correlation information and enhance classification capability. Experimental results indicate that the accuracy of the GPM-ViT model reaches 97.86%, which is a significant improvement over that of baseline models.

  • AI-Enabled Vehicular Edge Computing
    CUI Mengmeng, SHI Jingyan, XIANG Haolong
    Computer Engineering. 2025, 51(9): 25-37. https://doi.org/10.19678/j.issn.1000-3428.0069836

    To optimize Quality of Service (QoS), Mobile Edge Computing (MEC) has been deeply integrated into the Internet of Vehicle (IoV) to provide geographically proximal computing resources for vehicles, thereby reducing task processing latency and energy consumption. However, traditional MEC server deployment relies primarily on terrestrial Base Stations (BSs), resulting in high deployment costs and limited coverage, making it difficult to ensure uninterrupted services for all vehicles. Air-ground collaborative IoV technology has emerged as a solution to these challenges. Unmanned Aerial Vehicles (UAVs) can dynamically assist Road-Side Units (RSUs) using their flexibility in line-of-sight links, providing more flexible computing resources for vehicular users, thereby ensuring the continuity and efficiency of in-vehicle services. Therefore, this study proposes a Dynamic Vehicular Edge Task Offloading Method (DVETOM) based on air-ground collaboration. This method adopts a vehicle-road-air architecture, establishing Vehicle-to-RSU (V2R) and Vehicle-to-UAV (V2U) links. Transmission and computation models are constructed for three modes: local execution of vehicular tasks, offloading tasks to the RSU, and offloading tasks to the UAV. An objective function is established with the joint optimization goal of minimizing system latency and energy consumption. DVETOM transforms the task offloading problem into a Markov Decision Process (MDP) and optimizes the task offloading strategy by using the Distributed Deep Deterministic Policy Gradient (D4PG) algorithm based on Deep Reinforcement Learning (DRL). Compared with 5 benchmark methods, experimental results show that DVETOM outperforms existing methods by 3.45%—23.7% in terms of reducing system latency and 5.8%—23.47% in terms of reducing system energy consumption while improving QoS for vehicular users. In conclusion, DVETOM enhances the offloading of vehicular edge computing tasks within the IoV effectively. It offers IoV users a more efficient and energy-conserving solution, showcasing its extensive potential for application in intelligent transportation systems.

  • Bojia Chen, Tingnian He, Lianjie Zhang, Shu'an Chen
    Accepted: 2025-10-20
    Cross-domain recommendation systems are widely applied in e-commerce and content platforms. Although the dual-target cross-domain recommendation (DTCDR) proposed in recent years has achieved a breakthrough in simultaneously improving the performance of both domains, it still faces two major challenges: 1) the generated user-item representations lack sufficient correlation and diversity; 2) the semantic noise mixed in the shared preferences leads to negative transfer problems. To address these issues, a dual-target cross-domain recommendation model based on heterogeneous graph and hierarchical preference disentanglement (HGPD-DTCDR) is proposed. Its core innovations include: 1) a heterogeneous graph collaborative learning framework is proposed to integrate user-item interactions, user social networks, and item attribute similarities, constructing a multi-relation heterogeneous graph, and generating high-order semantic representations through a relation graph convolutional network (R-GCN) to enhance the diversity and correlation of the representations; 2) a two-stage decoupling process is designed, first separating domain-specific and shared preferences through a variational graph encoder, and then introducing a semantic filtering network to optimize the quality of shared preferences. Experiments on five real cross-domain datasets show that the performance improvement of this model stems from the synergistic effect of heterogeneous graph modeling and hierarchical decoupling mechanisms. Compared with the best baseline, it achieves average improvements of 3.55%, 7.27%, and 15.57% in hit rate, normalized discounted cumulative gain, and mean reciprocal rank, respectively. In data-sparse scenarios, the performance improvement is even more significant, with an average gain of 10.35%. Ablation studies further verify the effectiveness of each technical component and their synergistic effects.
  • Frontier Perspectives and Reviews
    LIAO Yong, HAN Xiaojin, LIU Jinlin, WANG Hao
    Computer Engineering. 2026, 52(3): 41-61. https://doi.org/10.19678/j.issn.1000-3428.0069925

    Artificial intelligence has made remarkable progress across many fields, encouraging countries to attach great importance to its research and development. However, the rapid development of artificial intelligence has also brought about a series of problems and threats, and overreliance on and blind trust in such models can lead to serious risks. Therefore, interpretable artificial intelligence has become a key element in building trusted and transparent intelligent systems, and its research and development requires immediate attention. This survey comprehensively summarizes the research progress on explainable artificial intelligence at home and abroad comprehensively from multiple dimensions and levels. Based on current research results in the industry, this survey subdivides the key technologies of explainable artificial intelligence into four categories: interpretation model, interpretation method, safety testing, and experimental verification, with the aim of clarifying the technical focus and development direction of each field. Furthermore, the survey explores specific application examples of explainable artificial intelligence across key industry sectors, including but not limited to education, healthcare, finance, autonomous driving, and justice, demonstrating the significant role it plays in enhancing decision-making transparency. Finally, this survey provides an in-depth analysis of the major technical challenges of interpretable artificial intelligence and presents future development trends, in addition to a special investigation and in-depth analysis of the interpretability of large models, which has attracted considerable attention recently.

  • Zhang Yao, Zhang Junsan, Ma Junpeng, Yao Zongquan, Liu Tianyi
    Accepted: 2025-11-07
    This paper proposes an improved YOLOv8-based model named CAFR-YOLO to address the issues of insufficient cross-level feature interaction and limited feature representation capability in multi-scale object detection under complex scenes. First, a novel cross-scale feature reorganization pipeline was designed, constructing the Channel Attention-guided Feature Reorganization (CAFR) module. By using a specific layer as the fusion backbone and incorporating scale alignment, attention-weighted fusion, and feature subset splicing strategies, it alleviates insufficient cross-level interaction in traditional feature pyramid structures. Secondly, at the local level, the method introduces the C2f_DCNv3 module into the backbone network, significantly enhancing the model's geometric adaptability by exploiting the dynamic sampling characteristics of deformable convolution. From a global perspective, the C2f_SAConv module is constructed by combining Switchable Atrous Convolution (SAC) with the C2f module, optimizing multi-scale semantic feature fusion through dynamic atrous rate adjustment. These two approaches enhance the model's robustness to complex scenes. Finally, SPDConv replaces traditional convolution structures, strengthening feature representation through spatial-channel reorganization while reducing computational complexity. Experimental results demonstrate that CAFR-YOLO achieves 86.3% mAP@0.5 and 67.2% mAP@0.5:0.95 on the PASCAL VOC dataset with comparable computational costs to the original model. On the MS COCO dataset, it improves mAP@0.5 and mAP@0.5:0.95 by 3.5% and 3.9%, respectively. Compared to existing state-of-the-art methods, CAFR-YOLO exhibits significant advantages across multiple metrics. The proposed CAFR-YOLO model substantially enhances multi-scale object detection accuracy and robustness while maintaining computational efficiency, providing a novel solution for real-time object detection tasks.
  • Artificial Intelligence and Pattern Recognition
    WEI Mingkang, LI Jianan, HAN Lin, GAO Wei, ZHAO Rongcai, WANG Hongsheng
    Computer Engineering. 2025, 51(5): 62-72. https://doi.org/10.19678/j.issn.1000-3428.0069206

    With the increasing demand for the deployment of large models by major manufacturers, the accuracy of the single quantization method of deep learning compiler Tensor Virtual Machine (TVM) has decreased, and this method is no longer sufficient to satisfy deployment requirements. Therefore, in this study, a flexible granularity model quantization framework is designed and constructed. This framework supports layer-wise and channel-wise quantization processes as well as the implementation of threshold search and adaptive rounding optimization algorithms. First, based on the quantization module ″relay.quantize″, a framework flow for information annotation, threshold calibration, and quantization graph realization is constructed, which includes granularity attributes to explicitly identify the quantization method. Second, fine-tuning is applied to the threshold calibration and weight rounding in quantization to address the issue of ineffective quantization information determination using predefined calibration methods, thereby improving the accuracy of the quantized model. Experiments are conducted using the ImageNet dataset to test visual networks. The results reveal that the new quantization scheme for MobileNetV1 reduces the loss of model accuracy to 2.3% after 8 bit quantization, and this loss is reduced to 0.7% after tuning. Hence, the multi-granularity quantization framework can effectively reduce the quantization error.

  • Research Hotspots and Reviews
    LI Yakang, LI Jianfang, HU Peng, CHEN Juan, WANG Shengxiang, QI Fazhi, CHEN Gang
    Computer Engineering. 2025, 51(10): 53-70. https://doi.org/10.19678/j.issn.1000-3428.0069651

    This study explores the use of Artificial Intelligence (AI) technology throughout the neutron scattering experiments′ lifecycle to determine how AI technology can revolutionize key aspects such as experimental apparatus, data acquisition, and data processing. The study begins by introducing the fundamental principles and experimental procedures of neutron scattering technology before focusing on the multifaceted applications of AI technology in neutron scattering experiments. These applications include optimizing experimental infrastructure, data acquisition, and imaging preprocessing, as well as characterizing experimental samples in neutron diffraction, neutron reflection, and Inelastic Neutron Scattering (INS). This study demonstrates the importance of AI technology in increasing the intelligence level of experiments, accelerating data processing, and improving the accuracy and reliability of data analyses. In addition, an in-depth discussion is held on the future application of AI technology in neutron scattering experiments, indicating that with the continuous advancement of technologies such as multimodal learning, interpretable models, large language models, and AI-Ready databases, AI technology is poised to bring revolutionary changes to neutron scattering experiments, opening up new avenues for revealing the microstructure and properties of complex material systems.

  • Computer Vision and Image Processing
    TANG Ke, WEI Feiming, LI Dongying, YU Wenxian
    Computer Engineering. 2026, 52(3): 97-106. https://doi.org/10.19678/j.issn.1000-3428.0070085

    In view of missed and false detection phenomena caused by numerous small target instances and occlusions among targets in drone images, this paper proposes a lightweight small target detection algorithm for Unmanned Aerial Vehicle (UAV) images based on an improved YOLOv8. The Triple Feature Encoder (TFE) and Scale Sequence Feature Fusion (SSFF) modules are introduced in the neck to enhance the ability of the network to extract features at different scales. Furthermore, a Small Object Detection Head (SMOH) is designed and fused with the improved neck feature extraction network, and an additional detection head is also introduced to reduce the loss of small target features and enhance the recognition ability of the network for small targets. Additionally, considering the defects of Complete Intersection over Union (CIoU), a regression loss function, Wise-Inner-MPDIoU, is proposed by combining Wise-IoU, Inner-IoU, and Minimum Point Distance based IoU (MPDIoU). Finally, to realize the lightweight application requirements of the algorithm in mobile and embedded systems, amplitude-based layer-adaptive sparse pruning is performed to further reduce the model size while ensuring model accuracy. Experimental results demonstrate that, compared to the original YOLOv8s model, the improved model proposed in this paper improves mAP@0.5 by 6.8 percentage points, while reducing the number of parameters, amount of computation, and model size by 76.4%, 17.1%, and 73.5%, respectively. The proposed model is lightweight, improves detection accuracy, and has strong practical significance.

  • Research Hotspots and Reviews
    DONG Yuze, ZHANG Zhongzhi
    Computer Engineering. 2025, 51(6): 20-28. https://doi.org/10.19678/j.issn.1000-3428.0070532

    This study investigates the consensus problem, a fundamental issue in distributed systems and network control. Consensus studies have traditionally focused on unweighted networks, overlooking the impact of edge weights in real-world networks. However, networks such as transportation systems, social networks, and power networks exhibit significant weighted properties, and unweighted models fail to fully capture their complex interactions. To address this issuse, this study examines a cluster of pseudo-fractal-weighted networks to determine how edge weights affect consensus. The Laplacian matrix is used to establish a relationship between the Kirchhoff indices and network consensus, providing an in-depth analysis of consensus behavior in weighted networks. Through the calculation of recursive relations for various indices across iterations, precise formulas for key quantities such as the multiplicative Kirchhoff index, additive Kirchhoff index, Kirchhoff index, and network coherence are derived. A numerical analysis shows that as the network size increases, consensus in weighted networks converges to a constant, indicating greater resistance to external noise.

  • Research Hotspots and Reviews
    WANG Qun, LI Fujuan, MA Zhuo
    Computer Engineering. 2025, 51(8): 39-52. https://doi.org/10.19678/j.issn.1000-3428.0070248

    Autonomous Systems (ASes) that constitute the Border Gateway Protocol (BGP) have different interests and route policies. When actual route announcements exceed expected boundaries, route leakages can occur, leading to network security incidents caused by route redirection. In the propagation of BGP route information, ASes unconditionally trust and accept the routes declared by neighboring ASes. Additionally, each AS independently configures its own local policies and keeps this information secret, which complicates the verification of this route policy. This has been a persistent and unresolved challenge in the field of BGP security. Blockchain technology, with its inherent characteristics of decentralization, traceability, immutability, and transparency, offers a promising infrastructure for digital resource authentication and trust among ASes, potentially serving as a key technology for addressing the threat of route leakages. This study first clearly defines the relationships between neighboring ASes, as well as between the GR (Gao-Rexford) model and BGP route policies, elucidating the root causes of route leakages and the challenges in their prevention. Additionally, it reviews the research on traditional solutions to route leakages, focusing on their strengths, weaknesses, and unresolved issues. Subsequently, it proposes the advantages and technical approaches of using blockchain technology to defend against BGP route leakages and explores the principles and application characteristics of typical solutions. Finally, it discusses the existing challenges and outlines future research directions.

  • Graphics and Image Processing
    ZHAO Yaoqian, TENG Qizhi, HE Xiaohai, SHUI Ai, CHEN Honggang
    Computer Engineering. 2025, 51(5): 257-265. https://doi.org/10.19678/j.issn.1000-3428.0069822

    Single Image Super-Resolution (SISR) reconstructs High-Resolution (HR) images from Low-Resolution (LR) images. In recent years, deep learning-based SISR methods have achieved outstanding reconstruction results, attracting widespread attention. However, most models suffer from high complexity and large parameter size, which affects their practical application. To overcome these issues, this study proposes a module based on self-attention feature distillation, which reduces complexity while fully extracting deep image features, achieving lightweight super-resolution reconstruction. The proposed module has two technical features. First, a feedback network based on asymmetric convolution is proposed for computing global attention, utilizing the superior nonlinear feature extraction capability of asymmetric convolution to compress input channels and reduce computational costs. Second, a partial channel shifting operation is introduced in the spatial attention module to increase feature diversity by shifting partial channels without increasing the computational complexity. In experiments on six commonly used public datasets, the proposed method outperforms representative lightweight SISR methods such as CARN, SMSR, and DLGSANet in terms of Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), and Learned Perceptual Image Patch Similarity (LPIPS). In addition, the subjective visual quality of the reconstruction results produced by the proposed method is superior. Overall, the proposed method achieves a better balance between model complexity and reconstruction performance.