Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Most Downloaded

  • Published in last 1 year
  • In last 2 years
  • In last 3 years
  • All
  • Most Downloaded in Recent Month
  • Most Downloaded in Recent Year

Please wait a minute...
  • Select all
    |
  • Research Hotspots and Reviews
    SUN Lijun, MENG Fanjun, XU Xingjian
    Computer Engineering. 2025, 51(11): 1-21. https://doi.org/10.19678/j.issn.1000-3428.0069543

    In the context of ongoing advancements in educational informatization, constructing precise and efficient curriculum knowledge graphs has become key to promoting personalized education development. As a structured knowledge representation model, curriculum knowledge graphs reveal complex relations between curriculum content and learning objectives to optimize the allocation of educational resources, and tailoring personalized learning paths for learners. This survey presents a discussion around the techniques used to construct curriculum knowledge graphs, starting with an explanation of the basic concepts; intrinsic connections; and significant differences among general, educational, and curriculum knowledge graphs. It then delves into the key technologies used for building curriculum knowledge graphs, covering aspects such as curriculum ontology design, entity extraction, and relation extraction, and provides a detailed analysis and summary of their evolution, key features, and limitations. Furthermore, it explores the application value of curriculum knowledge graphs in scenarios such as learning resource recommendation, learner behavior profile and modeling, and multimodal curriculum knowledge graph construction. Finally, it focuses on the challenges in constructing curriculum knowledge graphs, such as data diversity and heterogeneity, difficulties in quality evaluation, and the lack of cross-curriculum integration, and provides future-oriented insights based on cutting-edge technologies such as deep learning and Large Language Models (LLMs).

  • Large Language Models and Generative Artificial Intelligence
    WANG Heqing, WEI Jie, JING Hongyu, SONG Hui, XU Bo
    Computer Engineering. 2026, 52(2): 383-392. https://doi.org/10.19678/j.issn.1000-3428.0070415

    Large Language Models (LLMs) have made significant progress in dialogue, reasoning, and knowledge retention. However, they still face challenges in terms of factual accuracy, knowledge updates, and a lack of high-quality domain datasets for handling knowledge-intensive tasks in the electricity sector. This study aims to address these challenges by introducing an improved Retrieval-Augmented Generation (RAG) strategy. This strategy combines hybrid retrieval with a fine-tuned generative model for efficient knowledge capturing and updating. The Metadata-driven RAG framework (Meta-RAG) is proposed for knowledge Question Answering (QA) tasks in the electricity domain. This includes data preparation, model fine-tuning, and reasoning retrieval stages. The data-preparation stage involves document conversion, metadata extraction and enhancement, and document parsing. These processes ensure efficient indexing and structured processing of power regulation documents. The Electricity Question Answering (EleQA) dataset, consisting of 19 560 QA pairs, is constructed specifically for this sector. The model fine-tuning stage uses multi-question generation, chain-of-thought prompting, and supervised instruction fine-tuning to optimize the reasoning abilities in specific tasks. The retrieval reasoning stage employs mixed encoding and re-ranking strategies, combining retrieval and generation modules to improve answer accuracy and relevance. Experiments validate the effectiveness of Meta-RAG. Compared to baseline models such as Self-RAG, Corrective-RAG, Adaptive-RAG, and RA-ISF, Meta-RAG shows higher answer accuracy and retrieval hit rates. Meta-RAG with the Qwen1.5-14B-Chat model achieves an overall accuracy of 0.804 3, surpassing the other methods. Ablation and document recall experiments indicate that document retrieval significantly impacts the framework performance, with a 0.292 8 drop in accuracy when the retrieval capability is lost.

  • Mobile Internet and Communication Technology
    WANG Huahua, HUANG Yexia, LI Ling, WANG Jiacheng
    Computer Engineering. 2025, 51(12): 255-267. https://doi.org/10.19678/j.issn.1000-3428.0069877

    When implementing Federated Learning (FL) in a cell-free network environment, user scheduling and resource allocation strategies are crucial for optimizing system time overhead, improving user reachability, and accelerating FL convergence rate. To address the issue of uneven resource allocation, this study designs an optimization scheme that combines user scheduling, CPU processing frequency, and power allocation. This scheme aims to achieve fair resource allocation by maximizing the minimum user rate in the system, thus enhancing FL performance. The joint optimization problem is decomposed into two subproblems: user scheduling and power allocation. For user scheduling, this study proposes a greedy scheduling algorithm based on k-means clustering to comprehensively evaluate channel conditions and data "value" of users and categorize users into different groups. Subsequently, for the resource occupation situation, a personalized CPU processing frequency allocation plan is developed for users within each group based on their resource occupancy. Finally, by independently executing user scheduling within each group, user selection is performed efficiently and precisely, and the complexity of user selection is effectively reduced via early grouping. For power allocation, this study introduces a Bisection Method-based Power Allocation (BM-PA) algorithm. This algorithm not only considers fairness among users but also prioritizes resource-constrained users to ensure that they can obtain superior resource allocation. The BM-PA algorithm achieves fast convergence of power allocation using a low-complexity iterative optimization process, significantly improving the resource utilization efficiency without deteriorating the system performance. In this study, a reasonable user scheduling strategy serves as the foundation for obtaining optimal solutions for the power allocation subproblem. This study adopts an alternating iteration method that allows independent optimization in each subproblem while considering the solution of the other subproblem. Via multiple rounds of iterative optimization, this interdependent relationship ensures that power resources are reasonably allocated to users who need them the most or are most likely to effectively utilize them, thus enhancing the overall system performance. This study realizes joint optimization solutions that significantly improve overall system performance. Simulation results show that compared with the baseline algorithm, the proposed algorithm exhibits outstanding performance in terms of downlink achievable rates-the average improvement reaches up to 103.34% under optimal conditions. Additionally, the uplink achievable rates improve by up to 102.78%. Furthermore, the proposed algorithm can save 67.44% of the FL task training time on average compared to the baseline algorithm, particularly when the FL learning model accuracy reaches 90%, wherein the time overhead of the proposed algorithm is minimal.

  • AI-Enabled Vehicular Edge Computing
    QIN Minhao, SUN Weiwei
    Computer Engineering. 2025, 51(9): 1-13. https://doi.org/10.19678/j.issn.1000-3428.0069416

    Traffic signal control plays an important role in alleviating traffic congestion and improving urban commuting efficiency. In recent years, breakthroughs have been made in traffic signal control algorithms based on deep reinforcement learning using real-time traffic data as input. However, traffic data in real-world scenarios often involve data distortion. Traditional solutions use reinforcement learning algorithms to control signal lights after repairing distorted data. However, on the one hand, the dynamic phases of traffic signal introduces additional uncertainty to distortion repair, and on the other hand, distortion repair is difficult to combine with deep reinforcement learning frameworks to improve performance. To address these issues, a distorted traffic signal control model based on hidden state prediction, HCRL, is proposed. The HCRL model comprises encoding, control, and encoding prediction sub-models. By introducing a hidden state representation mechanism for signalized intersections, the HCRL model can adapt better to deep reinforcement learning frameworks and effectively express the control state of signalized intersections. In addition, the HCRL model uses a special transfer training method to avoid data distortion interference in the control sub-model. Two real datasets are used to verify the impact of data distortion on the intelligent signal light control algorithms. The experimental results show that the HCRL model outperforms the distortion-completion-based traffic signal control models in all distortion scenarios and distortion rates; further, it demonstrates strong robustness against data distortion when compared with other baseline models.

  • Research Hotspots and Reviews
    ZHANG Jin, CHEN Zhu, CHEN Zhaoyun, SHI Yang, CHEN Guanjun
    Computer Engineering. 2025, 51(7): 1-11. https://doi.org/10.19678/j.issn.1000-3428.0068870

    Simulators play an indispensable role in an array of scientific fields involving research and development. Particularly in architectural design, simulators provide a secure and cost-effective virtual environment, enabling researchers to conduct rapid experimental analyses and evaluations. Simultaneously, simulators facilitate the acceleration of the chip design and verification processes, thereby conserving time and reducing resource expenditure. However, with the evolutionary advances in processor architectural designs—specifically, the flourishing diversifications featured in dedicated processors—the key role played by simulators in providing substantial feedback for architectural design exploration has gained prominence. This discourse provides an overview of the current developments and applications of architectural simulators, accentuating a few illustrative examples. Analyzing the techniques employed by simulators dedicated to various processors allows for a deeper understanding of the focal points and technical complexities under different architectures. Moreover, this discourse deliberates speculative assessments and critiques of vital aspects of future architectural simulator developments, aspiring to forecast their prospects in the field of processor design research.

  • Research Hotspots and Reviews
    PENG Long, GAO Yuanjun, LIU Xiaodong, YU Jie
    Computer Engineering. 2025, 51(10): 37-52. https://doi.org/10.19678/j.issn.1000-3428.0069708

    Advances in computational power and network technologies have driven robots toward miniaturization, swarm intelligence, and autonomous capabilities. Robot software deployed on robotic hardware must integrate diverse modules from low-level device drivers and controls to high-level motion planning and reasoning, resulting in increasingly complex architectures. A communication and programming framework for multi-robot systems—focusing on standardization, modularization, and platformization—can alleviate the complexity of programming robotic software. The development trends in robotic software and hardware architecture show that a swarm robotic system is a multi-domain, heterogeneous, and distributed system composed of computing nodes, actuators, sensors, and other hardware devices interconnected through wired or wireless networks. The heterogeneity of hardware devices makes it difficult to integrate software components into a single framework. This survey summarizes and analyzes existing robotic communication frameworks in terms of ease of use and portability, comparing their core features, such as programming models, heterogeneous hardware support, communication and coordination mechanisms between components, and programming languages. The survey then highlights the technical trends of advanced topics such as real-time virtualization, component orchestration, and fault tolerance. Moreover, this survey focuses on building a next-generation framework on a meta Operating System (OS) foundation, aiming to build a ubiquitous and integrated multi-robot software architecture for human-machine-object interactions.

  • Research Hotspots and Reviews
    DI Qinbo, CHEN Shaoli, SHI Liangren
    Computer Engineering. 2025, 51(11): 35-44. https://doi.org/10.19678/j.issn.1000-3428.0069780

    As multivariate time series data become increasingly prevalent across various industries, anomaly detection methods that can ensure the stable operation and security of systems have become crucial. Owing to the inherent complexity and dynamic nature of multivariate time series data, higher demands are placed on anomaly detection algorithms. To address the inefficiencies of existing anomaly detection methods in processing high-dimensional data with complex variable relations, this study proposes an anomaly detection algorithm for multivariate time series data, based on Graph Neural Networks (GNNs) and a diffusion model, named GRD. By leveraging node embedding and graph structure learning, GRD algorithm proficiently captures the relations between variables and refines features through a Gated Recurrent Unit (GRU) and a Denoising Diffusion Probabilistic Model (DDPM), thereby facilitating precise anomaly detection. Traditional assessment methods often employ a Point-Adjustment (PA) protocol that involves pre-scoring, substantially overestimating an algorithm's capability. To reflect model performance realistically, this work adopts a new evaluation protocol along with new metrics. The GRD algorithm demonstrates F1@k scores of 0.741 4, 0.801 7, and 0.767 1 on three public datasets. These results indicate that GRD algorithm consistently outperforms existing methods, with notable advantages in the processing of high-dimensional data, thereby underscoring its practicality and robustness in real-world anomaly detection applications.

  • Service Computing in the Era of Large Language Models
    ZHANG Junna, WANG Hongzun, DING Chuntao
    Computer Engineering. 2026, 52(1): 33-60. https://doi.org/10.19678/j.issn.1000-3428.0252721

    Post-Training Quantization (PTQ) is an efficient model compression method that converts the parameters of high-precision floating-point models into low-bit integer representations without requiring retraining, using only a small amount of unlabeled calibration data. This method significantly reduces storage and computational overhead while maximizing the retention of the original model's inference accuracy; therefore, it is widely recognized and adopted in both academia and industry. This paper systematically summarizes the progress of research on PTQ from four dimensions: quantization steps, method classification, tool ecosystem, and application advancements. First, a clear framework for the quantization process is constructed, covering steps such as dynamic range statistics, quantization parameter calculation, weight and activation quantization, error optimization, and model generation. Second, a complete classification system for quantization methods is proposed, which includes quantization granularity, bit width, calibration methods, and structure-guided quantization. Third, the tool ecosystem supporting the large-scale application of PTQ is analyzed, and its value in hardware adaptation and engineering deployment is discussed. Finally, this paper summarizes the progress in the integration and application of PTQ methods and highlights practical challenges, particularly those related to cross-modal consistency, extremely low-bit semantic collapse, and hardware adaptation. These challenges not only reveal the limitations of current technologies but also provide important directions for future research. This review provides a reference framework for PTQ methods in academia and industry, thereby facilitating the widespread application of artificial intelligence in resource-constrained scenarios.

  • Artificial Intelligence and Pattern Recognition
    YUAN Yinghua, JIN Yingran, GAO Yun
    Computer Engineering. 2025, 51(12): 96-108. https://doi.org/10.19678/j.issn.1000-3428.0069871

    The Siamese tracking network is a popular target tracking framework that includes three modules: backbone, fusion, and positioning networks. The Transformer is a relatively new and effective implementation method for fusion network modules. The encoder and decoder of the Transformer use a self-attention mechanism to enhance the features of the Convolutional Neural Network (CNN). However, the self-attention mechanism can only enhance features in the spatial dimension without considering feature enhancement in the channel dimension. To enable the self-attention network of the Transformer to enhance features both in the spatial and channel dimensions and provide accurate correlation information for the target localization network, a Transformer tracker based on dual-dimensional feature enhancement is proposed to improve the Transformer fusion network. First, using the third- and fourth-stage features of the backbone network as inputs, channel dimension feature enhancement is performed via CAE-Net in the self-attention module of the Transformer encoder and decoder to enhance the importance of the channel. Subsequently, two-stage feature-weighted fusion and linear transformation are performed via SAE-Net to obtain the self-attention factors Q, K, and V. Finally, spatial dimension feature enhancement is performed via a self-attention operation. Experiments conducted on five widely used public benchmark datasets reveal that the improved Transformer feature fusion module can improve the tracking performance of the tracker with minimal reduction in speed of tracking.

  • Research Hotspots and Reviews
    LI Jiangxin, WANG Peng, WANG Wei
    Computer Engineering. 2025, 51(7): 47-58. https://doi.org/10.19678/j.issn.1000-3428.0069406

    Industrial time-series forecasting is critical for optimizing production processes and enhancing decision-making. Existing deep learning-based methods often underperform in this context due to a lack of domain knowledge. Prior studies have proposed using mechanistic models to guide deep learning; however, these approaches typically consider only a single mechanistic model, ignoring scenarios with multiple time-series prediction mechanisms in industrial processes and the inherent complexity of industrial time-series (e.g., multiscale dynamics and nonlinearity). To address this issue, this study proposes a Multi-Mechanism-guided Deep Learning for Industrial Time-series Forecasting (M-MDLITF) framework based on attention mechanisms. This framework embeds multiple mechanistic models into a deep industrial time-series prediction network to guide training and integrate the strengths of different mechanisms by focusing on final predictions. As an instantiation of the M-MDLITF, the Multi-mechanism Deep Wiener (M-DeepWiener) method employs contextual sliding windows and a Transformer-encoder architecture to capture complex patterns in industrial time-series. Experimental results from a simulated dataset and two real-world datasets demonstrate that M-DeepWiener achieves high computational efficiency and robustness. It significantly outperforms the single-mechanism Deep Wiener (DeepWiener), classical Wiener mechanistic models, and purely data-driven methods, reducing the prediction error by 20% compared to DeepWiener-M1 on the simulated dataset.

  • Research Hotspots and Reviews
    LU Yue, ZHOU Xiangyu, ZHANG Shizhou, LIANG Guoqiang, XING Yinghui, CHENG De, ZHANG Yanning
    Computer Engineering. 2025, 51(10): 1-17. https://doi.org/10.19678/j.issn.1000-3428.0070575

    Traditional machine learning algorithms perform well only when the training and testing sets are identically distributed. They cannot perform incremental learning for new categories or tasks that were not present in the original training set. Continual learning enables models to learn new knowledge adaptively while preventing the forgetting of old tasks. However, they still face challenges related to computation, storage overhead, and performance stability. Recent advances in pre-training models have provided new research directions for continual learning, which are promising for further performance improvements. This survey summarizes existing pre-training-based continual learning methods. According to the anti-forgetting mechanism, they are categorized into five types: methods based on prompt pools, methods with slow parameter updating, methods based on backbone branch extension, methods based on parameter regularization, and methods based on classifier design. Additionally, these methods are classified according to the number of phases, fine-tuning approaches, and use of language modalities. Subsequently, the overall challenges of continual learning methods are analyzed, and the applicable scenarios and limitations of various continual learning methods are summarized. The main characteristics and advantages of each method are also outlined. Comprehensive experiments are conducted on multiple benchmarks, followed by in-depth discussions on the performance gaps among the different methods. Finally, the survey discusses research trends in pre-training-based continual learning methods.

  • Artificial Intelligence and Pattern Recognition
    PENG Juhong, ZHANG Chi, GAO Qian, ZHANG Guangming, TAN Donghua, ZHAO Mingjun
    Computer Engineering. 2025, 51(7): 152-160. https://doi.org/10.19678/j.issn.1000-3428.0069283
    Abstract (1017) Download PDF (506) HTML (133)   Knowledge map   Save

    Steel surface defect detection technology in industrial scenarios is hindered by low detection accuracy and slow convergence speed. To address these issues, this study presents an improved YOLOv8 algorithm, namely a YOLOv8n-MDC. First, a Multi-scale Cross-fusion Network (MCN) is added to the backbone network. Establishing closer connections between the feature layers promotes uniform information transmission and reduces semantic information loss during cross-layer feature fusion, thereby enhancing the ability of the model to perceive steel defects. Second, deformable convolution is introduced in the module to adaptively change the shape and position of the convolution kernel, enabling a more flexible capture of the edge features of irregular defects, reducing information loss, and improving detection accuracy. Finally, a Coordinate Attention (CA) mechanism is added to embed position information into channel attention, solving the problem of position information loss and enabling the model to perceive the position and morphological features of defects, thereby enhancing detection precision and stability. Experimental results on the NEU-DET dataset show that the YOLOv8n-MDC algorithm achieves mAP@0.5 of 81.0%, which is 4.2 percentage points higher than that of the original baseline network. The algorithm has a faster convergence speed and higher accuracy; therefore, it meets the requirements of practical industrial production.

  • Graphics and Image Processing
    WANG Shumeng, XU Huiying, ZHU Xinzhong, HUANG Xiao, SONG Jie, LI Yi
    Computer Engineering. 2025, 51(9): 280-293. https://doi.org/10.19678/j.issn.1000-3428.0069353
    Abstract (1187) Download PDF (497) HTML (116)   Knowledge map   Save

    In Unmanned Aerial Vehicle (UAV) aerial photography, targets are usually small targets with dense distribution and unobvious features, and the object scale varies greatly. Therefore, the problems of missing detection and false detection are easy to occur in object detection. In order to solve these problems, a lightweight small object detection algorithm based on improved YOLOv8n, namely PECS-YOLO, is proposed for aerial photography. By adding P2 small object detection layer in the Neck part, the algorithm combines shallow and deep feature maps to better capture details of small targets. A lightweight convolution, namely PartialConv, is introduced to a new structure of Cross Stage Partial PartialConv (CSPPC), to replace Concatenation with Fusion (C2f) in the Neck network to realized lightweight of the model. By using a model of Spatial Pyramid Pooling with Efficient Layer Aggregation Network (SPPELAN), small object features can be captured effectively. By adding Squeeze-and-Excitation (SE)attention mechanism in front of each detection head in the Neck part, the network can better focus on useful channels and reduce the interference of background noise on small object detection tasks in complex environments. Finally, EfficiCIoU is used as the boundary frame loss function, and the shape difference of the boundary frame is also taken into account, which enhances the detection ability of the model for small targets. Experimental results show that, compared YOLOv8n, the mean Average Precision at Intersection over Union (IoU) of 0.5 (mAP@0.5) and the mean Average Precision at IoU of 0.5∶0.95 (mAP@0.5∶0.95) of PECS-YOLO object detection algorithm on VisDrone2019-DET dataset are increased by 3.5% and 3.7% respectively, the number of parameters is reduced by about 25.7%, and detection speed is increased by about 65.2%. In summary, PECS-YOLO model is suitable for small object detection in UAV aerial photography.

  • FENG Guoping, CHEN Zhijian, Lin Zhiyu, HONG Liang
    Accepted: 2025-10-11
    This study explores automatic term recognition in the electric power domain, addressing challenges faced during its digital transformation, such as data silos and knowledge utilization. To improve the identification of specialized and new terms, a dynamic graph-assisted method combining large and small models is proposed. The approach enhances recall and precision through candidate term extraction and term classification. An initial knowledge graph is built using existing term databases. Target text-related nodes are queried and filtered with term features. A retrieval-augmented large language model extracts candidate terms, followed by adversarial training to develop a deep learning model for term classification. The dynamic term knowledge graph is iteratively updated based on classification results, forming a positive feedback loop. Experimental results show that the method's accuracy, recall, and F1 score improve over iterations, reaching 0.8647, 0.8565, and 0.8542, respectively, demonstrating superior performance compared to other term recognition methods.
  • Research Hotspots and Reviews
    LIU Yanghong, FU Yangyouran, DONG Xingping
    Computer Engineering. 2025, 51(10): 18-26. https://doi.org/10.19678/j.issn.1000-3428.0070569

    The generation of High-Definition (HD) environmental semantic maps is indispensable for environmental perception and decision making in autonomous driving systems. To address the modality discrepancy between cameras and LiDARs in perception tasks, this paper proposes an innovative multimodal fusion framework, HDMapFusion, which significantly improves semantic map generation accuracy via feature-level fusion. Unlike traditional methods that directly fuse raw sensor data, our approach innovatively transforms both camera images and LiDAR point cloud features into a unified Bird's-Eye-View (BEV) representation, enabling physically interpretable fusion of multimodal information within a consistent geometric coordinate system. Specifically, this method first extracts visual features from camera images and 3D structural features from LiDAR point clouds using deep learning networks. Subsequently, a differentiable perspective transformation module converts the front-view image features into a BEV space and the LiDAR point clouds are projected into the same BEV space through voxelization. Building on this, an attention-based feature fusion module is designed to adaptively integrate the two modalities using weighted aggregation. Finally, a semantic decoder generates high-precision semantic maps containing lane lines, pedestrian crossings, road boundary lines, and other key elements. Systematic experiments conducted on the nuScenes benchmark dataset demonstrate that HDMapFusion significantly outperforms existing baseline methods in terms of HD map generation accuracy. These results validate the effectiveness and superiority of the proposed method, offering a novel solution to multimodal fusion in autonomous driving perception.

  • Graphics and Image Processing
    WANG Guoming, JIA Daiwang
    Computer Engineering. 2025, 51(12): 294-303. https://doi.org/10.19678/j.issn.1000-3428.0070027

    Deep learning-based object detection has significantly improved the detection of medium and large targets. However, when detecting small objects, traditional algorithms often face challenges such as missed detections and false positives owing to the inherent issues of small scale and complex backgrounds. Therefore, this study aims to enhance the accuracy of small object detection by improving the YOLOv8 model. First, the convolutional module in the backbone is replaced with the RFAConv module, which enhances the ability of the model to process complex images. Second, a Mixed Local Channel Attention (MLCA) mechanism is introduced in the neck part, allowing the model to fuse features from different layers more efficiently while maintaining computational efficiency. Third, the Detect head of YOLOv8 is replaced with the Detect_FASFF head to address the inconsistency between different feature scales and improve the ability of the model to detect small objects. Finally, the Complete Intersection over Union (CIoU) loss function is replaced with the Focaler-IoU loss function, enabling the model to focus more on small objects that are difficult to locate precisely. Experimental results show that the improved model increases mAP@0.5 by 4.8 percentage points and mAP@0.5:0.95 by 3.0 percentage points on the FloW-Img dataset, which is sparse in small objects. On the VisDrone2019 dataset which has a high density of small objects, mAP@0.5 increases by 5.9 percentage points and mAP@0.5:0.95 improves by 4.0 percentage points. In addition, generalization comparison experiments are conducted on the low-altitude dataset AU-AIR and the pedestrian-dense detection dataset WiderPerson. The optimized model significantly improves the accuracy of small object detection compared with the original model and expands its applicability.

  • Research Hotspots and Reviews
    LIAO Niuyu, TIAN Yun, LI Yansong, XUE Haifeng, DU Changkun, ZHANG Guohua
    Computer Engineering. 2025, 51(12): 1-17. https://doi.org/10.19678/j.issn.1000-3428.0253230

    In recent years, Large Language Models (LLMs) such as GPT, LLaMA, Qwen, and DeepSeek, have achieved significant breakthroughs in natural language processing, computer vision, multimodal learning, and other fields. However, constrained by factors such as their reasoning mechanisms, parameter scales, and the inherent knowledge contained within their training data, these models often suffer from issues like ″hallucinations″—characterized by inaccurate answers and even factual deviations—when handling complex tasks, addressing questions from professional domains, or generating time-sensitive content. These limitations severely hinder their application in high-reliability scenarios. The ″tool learning″ paradigm is attracting increasing attention as a promising solution to these capability bottlenecks. Its primary objective is to enable LLMs to understand and utilize external tools to complete specific tasks. By invoking external tools, such as databases, search engines, and mathematical tools, LLMs can transcend their parameterized knowledge; enhance their reasoning, decision-making, and execution capabilities; and mitigate hallucination problems. This paper systematically reviews the development context and technical advancements in LLM tool learning, analyzes the expansion of LLM capabilities through tools, summarizes tool invocation mechanisms ranging from in-context learning to fine-tuning training, and discusses key issues including performance optimization and adaptive tool generation. The paper also analyzes evaluation methods for LLM tool invocation, summarizes the current challenges in tool learning, and outlines future research directions.

  • Frontier Perspectives and Reviews
    LI Yiyang, LU Shenglian, WANG Jijie, CHEN Ming
    Computer Engineering. 2026, 52(4): 62-81. https://doi.org/10.19678/j.issn.1000-3428.0069312

    Convolutional Neural Networks (CNNs) are widely used in the field of object detection, earning widespread acclaim in scholarly circles due to their precision and scalability. It has spawned numerous notable models, including those in the Region-based Convolutional Neural Networks (R-CNNs) (such as Fast R-CNN and Faster R-CNN) and You Only Look Once (YOLO) series. After the success of Transformers in the field of natural language processing, researchers began exploring their application in computer vision, leading to the development of visual backbone networks such as Visual Transformer (ViT) and Swin Transformer. In 2020, a Facebook research team unveiled DEtection TRansformer (DETR), an end-to-end object detection algorithm based on Transformers, designed to minimize the need for prior knowledge and postprocessing in object detection tasks. Despite the promise shown by DETR in object detection, it has limitations including low convergence speed, relatively low accuracy, and the ambiguous physical significance of target queries. These issues have spurred a wave of research aimed at refining and enhancing the algorithm. This paper aims to collate, scrutinize, and synthesize the various efforts aimed at improving DETR, assessing their respective merits and demerits. Furthermore, it presents a comprehensive overview of state-of-the-art research and specialized application domains that employ DETR and concludes with a prospective analysis of the future role of DETR in the field of computer vision.

  • Research Hotspots and Reviews
    ZHAO Kai, HU Yuhuan, YAN Junqiao, BI Xuehua, ZHANG Linlin
    Computer Engineering. 2025, 51(8): 1-15. https://doi.org/10.19678/j.issn.1000-3428.0069147

    Blockchain, as a distributed and trusted database, has gained significant attention in academic and industrial circles for its effective application in the domain of digital copyright protection. Traditional digital copyright protection technologies suffer from issues such as difficulties in tracking infringements, complexities in copyright transactions, and inadequate protection of legitimate rights, which severely hampering the development of digital copyright protection endeavors. The immutability, traceability, and decentralization inherent in blockchain technology provide a highly reliable, transparent, and secure solution to mitigate the risks associated with digital copyright infringement. This overview starts with an introduction to the fundamental principles of blockchain technology. Then, it discusses the latest research findings on the integration of blockchain with traditional copyright protection technologies to address the problems inherent in traditional copyright protection schemes. Further, an evaluation of the practical applications and potential of blockchain is conducted, emphasizing its positive impact on the copyright protection ecosystem. Finally, this overview delves into the challenges and future trends related to blockchain based copyright protection, ultimately aiming to establish a more robust and sustainable blockchain copyright protection system.

  • Artificial Intelligence and Pattern Recognition
    WU Donghui, WANG Jinfeng, QIU Sen, LIU Guozhi
    Computer Engineering. 2025, 51(8): 107-119. https://doi.org/10.19678/j.issn.1000-3428.0070202
    Sign language recognition has received widespread attention in recent years. However, existing sign language recognition models face challenges, such as long training times and high computational costs. To address this issue, this study proposes a hybrid deep learning method that integrates an attention mechanism with an Expanded Wide-kernel Deep Convolutional Neural Network (EWDCNN) and a Bidirectional Long Short-Term Memory (BiLSTM) network based on data obtained from a wearable data glove, EWBiLSTM-ATT model. First, by widening the first convolutional layer, the model parameter count is reduced, which enhances computational speed. Subsequently, by deepening the EWDCNN convolutional layers, the model's ability to automatically extract features from sign language is improved. Second, BiLSTM is introduced as a temporal model to capture the dynamic temporal information of sign language sequential data, effectively handling temporal relationships in the sensor data. Finally, the attention mechanism is employed to map the weighted sum and learn a parameter matrix that assigns different weights to the hidden states of BiLSTM, allowing the model to automatically select key time segments related to gesture actions by calculating the attention weights for each time step. This study uses the STM32F103 as the main control module and builds a data glove sign language acquisition platform with MPU6050 and Flex Sensor 4.5 sensors as the core components. Sixteen dynamic sign language actions are selected to construct the GR-Dataset data training model. Under the same experimental conditions, compared to the CLT-net, CNN-GRU, CLA-net, and CNN-GRU-ATT models, the recognition rate of the EWBiLSTM-ATT model is 99.40%, which is increased by 10.36, 8.41, 3.87, and 3.05 percentage points, respectively. Further, the total training time is reduced to 57%, 61%, 55%, and 56% of the comparison models, respectively.
  • Frontier Perspectives and Reviews
    LIAO Yong, HAN Xiaojin, LIU Jinlin, WANG Hao
    Computer Engineering. 2026, 52(3): 41-61. https://doi.org/10.19678/j.issn.1000-3428.0069925

    Artificial intelligence has made remarkable progress across many fields, encouraging countries to attach great importance to its research and development. However, the rapid development of artificial intelligence has also brought about a series of problems and threats, and overreliance on and blind trust in such models can lead to serious risks. Therefore, interpretable artificial intelligence has become a key element in building trusted and transparent intelligent systems, and its research and development requires immediate attention. This survey comprehensively summarizes the research progress on explainable artificial intelligence at home and abroad comprehensively from multiple dimensions and levels. Based on current research results in the industry, this survey subdivides the key technologies of explainable artificial intelligence into four categories: interpretation model, interpretation method, safety testing, and experimental verification, with the aim of clarifying the technical focus and development direction of each field. Furthermore, the survey explores specific application examples of explainable artificial intelligence across key industry sectors, including but not limited to education, healthcare, finance, autonomous driving, and justice, demonstrating the significant role it plays in enhancing decision-making transparency. Finally, this survey provides an in-depth analysis of the major technical challenges of interpretable artificial intelligence and presents future development trends, in addition to a special investigation and in-depth analysis of the interpretability of large models, which has attracted considerable attention recently.

  • Frontier Perspectives and Reviews
    QIN Yingxin, ZHANG Kejia, PAN Haiwei, JU Yahao
    Computer Engineering. 2026, 52(2): 46-68. https://doi.org/10.19678/j.issn.1000-3428.0069826

    Deep learning has driven the development of artificial intelligence, which is widely used in computer vision. It provides breakthroughs and remarkable results in complex tasks such as image recognition, object detection, object tracking, and face recognition, demonstrating its excellent recognition and prediction capabilities. However, vulnerabilities and loopholes in deep learning models have been gradually exposed. Deep learning techniques, represented by convolutional neural networks, are extremely sensitive to well-designed adversarial examples, which can easily affect the security and privacy of the models. This paper first summarizes the concept of adversarial attacks, reasons for generating adversarial examples, and related terms. It outlines several types of classical adversarial attack strategies in the digital and physical domains and analyzes their advantages and disadvantages. Second, it focuses on computer vision and summarizes the latest research in adversarial attacks during tasks such as object detection, face recognition, object tracking, monocular depth estimation, and optical flow estimation, from both the digital and physical domains, as well as the various datasets commonly used in the study. It also briefly introduces the current stage of adversarial example defense and detection methods, summarizes the advantages and disadvantages of these methods, and describes examples of the applications of adversarial sample defense for various visual tasks. Finally, based on the summary of adversarial attack methods, it explores and analyzes the deficiencies and challenges of existing computer vision adversarial attacks.

  • Research Hotspots and Reviews
    PANG Xin, GE Fengpei, LI Yanling
    Computer Engineering. 2025, 51(6): 1-19. https://doi.org/10.19678/j.issn.1000-3428.0069005

    Acoustic Scene Classification (ASC) aims to enable computers to simulate the human auditory system in the task of recognizing various acoustic environments, which is a challenging task in the field of computer audition. With rapid advancements in intelligent audio processing technologies and neural network learning algorithms, a series of new algorithms and technologies for ASC have emerged in recent years. To comprehensively present the technological development trajectory and evolution in this field, this review systematically examines both early work and recent developments in ASC, providing a thorough overview of the field. This review first describes application scenarios and the challenges encountered in ASC and then details the mainstream frameworks in ASC, with a focus on the application of deep learning algorithms in this domain. Subsequently, it systematically summarizes frontier explorations, extension tasks, and publicly available datasets in ASC and finally discusses the prospects for future development trends in ASC.

  • Artificial Intelligence and Pattern Recognition
    WANG Shuai, SHI Yancui
    Computer Engineering. 2025, 51(8): 190-202. https://doi.org/10.19678/j.issn.1000-3428.0069636

    The sequence recommendation algorithm dynamically models the user's historical behavior to predict the content they may be interested in. This study focuses on the application of contrastive Self Supervised Learning (SSL) in sequence recommendation, enhancing the model's representation ability in sparse data scenarios by designing effective self supervised signals. First, a personalized data augmentation method incorporating user preferences is proposed to address the issue of noise introduced by random data augmentation. This method guides the augmentation process based on user ratings and combines different augmentation methods for short and long sequences to generate augmented sequences that align with user preferences. Second, a mixed-augmentation training approach is designed to address the issue of imbalanced feature learning during training. In the early stages of training, augmentation sequences are generated using randomly selected methods to enhance the model performance and generalization. In the later stages, augmentation sequences with high similarity to the original sequences are selected to enable the model to comprehensively learn the actual preferences and behavior patterns of users. Finally, traditional sequence prediction objectives are combined with SSL objectives to infer user representations. Experimental verification is performed using the Beauty, Toys, and Sports datasets. Compared with the best result in the baseline model, the HR@5 indicator of the proposed method increases by 6.61%, 3.11%, and 3.76%, and the NDCG@5 indicator increases by 11.40%, 3.50%, and 2.16%, respectively, for the aforementioned datasets. These experimental results confirm the rationality and validity of the proposed method.

  • Artificial Intelligence and Pattern Recognition
    CHANG Ru, LIU Yujie, SUN Haojie, DONG Liwei
    Computer Engineering. 2025, 51(9): 110-119. https://doi.org/10.19678/j.issn.1000-3428.0069711

    Aiming at non-affine nonlinear multi-Agent systems with full-state constraints, this study investigates an event-triggered formation control strategy with prescribed performance. The study proposes a barrier function-based nonlinear mapping technique to transform full-state constraints into the boundedness of mapped variables, thereby eliminating feasibility conditions in the controller design. Then, it introduces a shift function and a prescribed time-convergent performance function to constrain the formation tracking error. Consequently, the restriction that the initial value of the formation tracking error must be within the performance constraint range is eliminated, thus improving formation performance. The study also designs an event-triggered prescribed performance formation controller to guarantee that Agents achieve the desired formation within a prescribed time and maintain it thereafter, while significantly reducing controller—actuator signal transmissions. Lyapunov stability analysis proves that all signals in the system are semi-globally, uniformly, and ultimately bounded. The theoretical analysis rules out the possibility of Zeno behavior occurring. Finally, numerical simulations verify the effectiveness of the proposed method.

  • Research Hotspots and Reviews
    Mayilamu Musideke, GAO Yuxin, ZHANG Situo, FENG Ke, Abudukelimu Abulizi, Halidanmu Abudukelimu
    Computer Engineering. 2025, 51(8): 16-38. https://doi.org/10.19678/j.issn.1000-3428.0070619

    With the rapid advancement of general artificial intelligence technology, the application of foundational models across various fields has gained increasing attention. In image segmentation, the Segment Anything Model (SAM), as a foundational model, demonstrates notable advantages in enhancing image comprehension and processing efficiency. While SAM achieves state-of-the-art performance in image segmentation, further optimization in power consumption, computational efficiency, and cross-domain adaptability is required. This review provides an in-depth exploration of the potential improvements to SAM across several crucial dimensions, such as enhancing speed and computational efficiency, improving model accuracy and robustness, increasing adaptability and generalization, optimizing prompt engineering, and boosting data utilization and transfer learning capabilities. With these enhancements, SAM is expected to sustain high efficiency in highly complex tasks and better meet requirements of various fields and application contexts. In addition, this review summarizes the practical applications of SAM in various fields, including medical imaging, remote sensing, and the mechanical industry, and demonstrates the suitability and challenges of the model in different scenarios. Moreover, this review provides a detailed overview of commonly used datasets and evaluation metrics in the field of image segmentation. Through experimental comparative analyses, the impact of Vision Transformer (ViT) variants on the performance of SAM is assessed, along with performance evaluations of enhanced models, such as EfficientSAM, EfficientViT-SAM, MobileSAM, and RobustSAM. The challenges faced by SAM and its improved models in real-world applications are also discussed, and future research directions are proposed. This review aims to provide researchers with a comprehensive understanding of the advancements and applications of SAM and its variants, offering insights that may inform the development of new models.

  • Artificial Intelligence and Pattern Recognition
    ZHOU Shixiang, YU Kai
    Computer Engineering. 2025, 51(11): 144-151. https://doi.org/10.19678/j.issn.1000-3428.0069721

    With the development of social networks, people are increasingly expressing their emotions through multimodal data, such as audio, text, and video. Traditional sentiment analysis methods struggle to process emotional expressions in short videos effectively, and existing multimodal sentiment analysis techniques face issues such as low accuracy and insufficient interaction between modes. To address these problems, this study proposes a Multimodal Sentiment Analysis method based on Dense Co-Attention (DCA-MSA). First, it utilizes the pre-trained Bidirectional Encoder Representations from Transformers (BERT) model, OpenFace 2.0 model, and COVAREP tool to extract features from text, video, and audio, respectively. It then employs a Bidirectional Long Short-Term Memory (BiLSTM) network to model the temporal correlations within different features separately. Finally, it integrates different features through a dense co-attention mechanism. The experimental results show that the model proposed in this paper is competitive in multimodal sentiment analysis tasks compared to some baseline models: on the CMU-MOSEI dataset, the highest increase in binary classification accuracy is 3.7 percentage points, and the highest increase in F1 value is 3.1 percentage points; on the CH-SIMS dataset, the highest increase in binary classification accuracy is 4.1 percentage points, the highest increase in three-classification accuracy is 2.8 percentage points, and the highest increase in F1 value is 3.9 percentage points.

  • Multimodal Information Fusion
    LI Jianlang, WU Xindian, CHEN Ling, YANG Bo, TANG Wensheng
    Computer Engineering. 2026, 52(2): 299-310. https://doi.org/10.19678/j.issn.1000-3428.0070113

    This study proposes a Common and Differential Cross-Attention Module-Bird's-Eye View (CDCAM-BEV) algorithm that combines 4D millimeter-wave radar and vision fusion to improve target detection accuracy for pedestrian and vehicle target recognition and localization in autonomous driving scenarios. First, a radar cylinder network is designed to encode the 4D radar point cloud into a pseudo image and convert the monocular image into a Bird's-Eye View (BEV) feature through Orthogonal Feature Transformation (OFT). Second, based on the cross-attention mechanism, a Common Information Extraction Module (CICAM) and a Differential Information Extraction Module (DICAM) are used to fully explore the common and differential information between radar and images. Finally, a BEV feature fusion module is designed based on CICAM and DICAM to achieve feature-level fusion of image and radar information in the BEV space. Experiments are conducted on the VOD dataset, and the CDCAM-BEV algorithm is compared with five other 3D object detection algorithms. The experimental results show that CDCAM-BEV achieves better detection performance in multiple modes. In the 3D mode, the average detection accuracy of CDCAM-BEV is 3.65 percentage points higher than that of the second ranked Part-A2; in the BEV mode, it is 5.04 percentage points higher than that of the second ranked PointPillars; in the Average Directional Similarity (AOS) mode, it is 2.62 percentage points higher than that of the second ranked Part-A2. These results show that CDCAM-BEV exhibits excellent performance in all modes, effectively fusing images and 4D radar point cloud features, which significantly improves the accuracy and reliability of object detection.

  • Artificial Intelligence and Pattern Recognition
    SHEN Sitong, WANG Yaowu, XIE Zaipeng, TANG Bin
    Computer Engineering. 2025, 51(6): 102-115. https://doi.org/10.19678/j.issn.1000-3428.0070739

    Multi-Agent Reinforcement Learning (MARL) plays a crucial role in solving complex cooperative tasks. However, traditional methods face significant limitations in dynamic environments and information nonstationarity. To address these challenges, this paper proposes a Role learning-based Multi-Agent reinforcement learning framework (RoMAC). The framework employs role division based on action attributes and uses a role assignment network to dynamically allocate roles to agents, thereby enhancing the efficiency of multiagent collaboration. The framework adopts a hierarchical communication design, including inter-role communication based on attention mechanisms and inter-agent communication guided by mutual information. In interrole communication, it leverages attention mechanisms to generate efficient communication messages for coordination between role delegates. In inter-agent communication, it uses mutual information to generate targeted information and improve decision-making quality within role groups. Experiments conducted in the StarCraft Multi-Agent Challenge (SMAC) environment show that RoMAC achieves an average win rate improvement of approximately 8.62 percentage points, a reduction in convergence time by 0.92×106 timesteps, and a 28.18 percentage points average decrease in communication load. Ablation studies further validate the critical contributions of each module in enhancing the performance, demonstrating the robustness and flexibility of the model. Overall, the experimental results indicate that RoMAC offers significant advantages in MARL and cooperative tasks, providing reliable support to efficiently address complex challenges.

  • Artificial Intelligence and Pattern Recognition
    LI Bowen, DING Muheng, FANG Meihua, ZHU Guiping, WEI Zhiyong, CHENG Wei, LI Yayun, BIAN Shuangshuang
    Computer Engineering. 2025, 51(10): 87-96. https://doi.org/10.19678/j.issn.1000-3428.0069857

    Driver fatigue is a major cause of traffic accidents, and driver fatigue state classification based on Electroencephalograms (EEGs) is an important task in the field of artificial intelligence. In recent years, deep learning models that incorporate attention mechanisms have been widely applied to EEG-based fatigue recognition. While these approaches have shown promise, several studies disregard the inherent features of EEG data itself. Additionally, the exploration of the mechanisms and effects of attention on the classifier is vague, which results in failure to explain the specific effects of different attention states on classification performance. Therefore, this study selects the SEED-VIG data as the research object and adopts the ReliefF feature selection algorithm to construct optimized models of Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM) network, and Support Vector Machine (SVM) based on self attention, multihead attention, channel attention, and spatial attention mechanisms. Experimental results on the EEG data included in the SEED-VIG dataset show that the performance of several neural network optimization models based on multimodal attention mechanisms has improved in terms of accuracy, recall rate, F1 score, and other indicators. Among them, the Convolutional Block Attention Module (CBAM)-CNN model, which can enhance spatial and channel information, achieves the best performance with 84.7% mean accuracy with 0.66 standard deviation.

  • Artificial Intelligence and Pattern Recognition
    XU Lei, ZENG Yan, YUAN Junfeng, YUE Lupeng, YIN Yuyu, ZHANG Jilin, XUE Meiting, HAN Meng
    Computer Engineering. 2025, 51(11): 90-99. https://doi.org/10.19678/j.issn.1000-3428.0069678

    As core data for maritime traffic, ship trajectory data can be used for trajectory prediction, early warning, and other tasks with pronounced temporal characteristics. However, owing to factors such as harsh marine environments and poor communication reliability, missing ship trajectory data is a common problem. Learning from time series containing missing data can affect the accuracy of time series analysis significantly. The current mainstream solution is to approximate the imputation of missing data, mainly based on convolutional models, to reshape the time series along a timeline to capture its local features of the time series. However, the ability to capture the global features of long time series is limited. The Transformer enhances the ability of a model to capture the global features of a time series by capturing the relationships between various time points in the time series through its core self-attention mechanism. However, because its attention is calculated through matrix multiplication, it ignores the temporal nature of the time series, and the obtained global feature weights do not have a time span dependency. Therefore, to address the issue of capturing global features in long time series, this study proposes the GANet, a variant network based on the self-attention mechanism. GANet first obtains the basic global feature weight matrix from the time series points through the self-attention mechanism and then uses gated recurrent units to forget and update this global feature weight matrix on the timeline, thereby obtaining a global feature weight matrix with time span dependency, which is then used for data reconstruction to impute the missing data. GANet combines the self-attention mechanism and gating mechanism to capture global features while considering the impact of the time span on different time points, making the captured global feature time span dependent. Experimental results show that compared with existing models, such as Autoformer and Informer, GANet achieves better interpolation performance on Trajectory, ETT, and Electricity datasets.

  • Artificial Intelligence and Pattern Recognition
    ZHANG Hong, LI Feng, MA Yanhong, JI Wenxuan, ZHENG Qipeng
    Computer Engineering. 2025, 51(10): 140-149. https://doi.org/10.19678/j.issn.1000-3428.0069489

    Accurate photovoltaic power prediction is crucial for enhancing grid stability and improving energy utilization efficiency. To address the limitations of existing methods, which struggle to simultaneously consider both long-term dependencies and short-term variation patterns of photovoltaic power, this study proposes a novel photovoltaic power prediction method named Solarformer. This method integrates a Pyramid Attention Module (PAM) with a Temporal Convolutional Network (TCN) to optimize the Transformer architecture. First, multiple feature selection mechanisms are employed to screen the input features, to enhance the model′s ability to characterize photovoltaic data features. Second, a coarse-grained construction module and PAM are utilized to optimize the Transformer encoder, capturing the long-term temporal dependency features of photovoltaic power at multiple scales. Third, a constraint mechanism based on the sunrise-sunset effect of photovoltaic power and the TCN are employed to optimize the Transformer decoder, strengthening the model′s ability to capture short-term variation features of photovoltaic power and better model its short-term variation patterns. Experimental results on the Sanyo dataset from Australia demonstrate that Solarformer can effectively improve photovoltaic power forecasting accuracy. Compared with the DLinear model, it reduces the Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Symmetric Mean Absolute Percentage Error (SMAPE) by approximately 7.45%, 6.99%, and 14.10%, respectively.

  • Graphics and Image Processing
    SHA Yuyang, LU Jingtao, DU Haofan, ZHAI Xiaobing, MENG Weiyu, LIAN Xu, LUO Gang, LI Kefeng
    Computer Engineering. 2025, 51(7): 314-325. https://doi.org/10.19678/j.issn.1000-3428.0068674

    Image segmentation is a crucial technology for environmental perception, and it is widely used in various scenarios such as autonomous driving and virtual reality. With the rapid development of technology, computer vision-based blind guiding systems are attracting increasing attention as they outperform traditional solutions in terms of accuracy and stability. The semantic segmentation of road images is an essential feature of a visual guiding system. By analyzing the output of algorithms, the guiding system can understand the current environment and aid blind people in safe navigation, which helps them avoid obstacles, move efficiently, and get the optimal moving path. Visual blind guiding systems are often used in complex environments, which require high running efficiency and segmentation accuracy. However, commonly used high-precision semantic segmentation algorithms are unsuitable for use in blind guiding systems owing to their low running speed and a large number of model parameters. To solve this problem, this paper proposes a lightweight road image segmentation algorithm based on multiscale features. Unlike existing methods, the proposed model contains two feature extraction branches, namely, the Detail Branch and Semantic Branch. The Detail Branch extracts low-level detail information from the image, while the Semantic Branch extracts high-level semantic information. Multiscale features from the two branches are processed and used by the designed feature mapping module, which can further improve the feature modeling performance. Subsequently, a simple and efficient feature fusion module is designed for the fusion of features with different scales to enhance the ability of the model in terms of encoding contextual information by fusing multiscale features. A large amount of road segmentation data suitable for blind guiding scenarios are collected and labeled, and a corresponding dataset is generated. The model is trained and tested on the dataset. The experimental results show that the mean Intersection over Union (mIoU) of the proposed method is 96.5%, which is better than that of existing image segmentation models. The proposed model can achieve a running speed of 201 frames per second on NVIDIA GTX 3090Ti, which is higher than that of existing lightweight image segmentation models. The model can be deployed on NVIDIA AGX Xavier to obtain a running speed of 53 frames per second, which can meet the requirements for practical applications.

  • Computer Vision and Image Processing
    TANG Ke, WEI Feiming, LI Dongying, YU Wenxian
    Computer Engineering. 2026, 52(3): 97-106. https://doi.org/10.19678/j.issn.1000-3428.0070085

    In view of missed and false detection phenomena caused by numerous small target instances and occlusions among targets in drone images, this paper proposes a lightweight small target detection algorithm for Unmanned Aerial Vehicle (UAV) images based on an improved YOLOv8. The Triple Feature Encoder (TFE) and Scale Sequence Feature Fusion (SSFF) modules are introduced in the neck to enhance the ability of the network to extract features at different scales. Furthermore, a Small Object Detection Head (SMOH) is designed and fused with the improved neck feature extraction network, and an additional detection head is also introduced to reduce the loss of small target features and enhance the recognition ability of the network for small targets. Additionally, considering the defects of Complete Intersection over Union (CIoU), a regression loss function, Wise-Inner-MPDIoU, is proposed by combining Wise-IoU, Inner-IoU, and Minimum Point Distance based IoU (MPDIoU). Finally, to realize the lightweight application requirements of the algorithm in mobile and embedded systems, amplitude-based layer-adaptive sparse pruning is performed to further reduce the model size while ensuring model accuracy. Experimental results demonstrate that, compared to the original YOLOv8s model, the improved model proposed in this paper improves mAP@0.5 by 6.8 percentage points, while reducing the number of parameters, amount of computation, and model size by 76.4%, 17.1%, and 73.5%, respectively. The proposed model is lightweight, improves detection accuracy, and has strong practical significance.

  • AI-Enabled Vehicular Edge Computing
    ZHU Siyuan, LI Jiasheng, ZOU Danping, HE Di, YU Wenxian
    Computer Engineering. 2025, 51(9): 14-24. https://doi.org/10.19678/j.issn.1000-3428.0069534

    Detecting defects on unstructured roads is important for road traffic safety; however, annotated datasets required for detection is limited. This study proposes the Multi-Augmentation with Memory (MAM) semi-supervised object detection algorithm to address the lack of annotated datasets for unstructured roads and the inability of existing models to learn from unlabeled data. First, a cache mechanism is introduced to store the positions of the bounding box regression information for unannotated images and images with pseudo annotations, avoiding computational resource wastage caused by subsequent matching. Second, the study proposes a hybrid data augmentation strategy that mixes the cached pseudo-labeled images with unlabeled images inputted into the student model, to enhance the model′s generalizability to new data and balance the scale distribution of images. The MAM semi-supervised object detection algorithm is not limited by the object detection model and better maintains the consistency of object bounding boxes, thus avoiding the need to compute consistency loss. Experimental results show that the MAM algorithm is superior to other fully supervised and semi-supervised learning algorithms. On a self-built unstructured road defect dataset, called Defect, the MAM algorithm achieves improvements of 6.8, 11.1, and 6.0 percentage points in terms of mean Average Precision (mAP) compared to those of the Soft Teacher algorithm in scenarios with annotation ratios of 10%, 20%, and 30%, respectively. On a self-built unstructured road pothole dataset, called Pothole, the MAM algorithm achieves mAP improvements of 5.8 and 4.3 percentage points compared to those of the Soft Teacher algorithm in scenarios with annotation ratios of 15% and 30%, respectively.

  • Bojia Chen, Tingnian He, Lianjie Zhang, Shu'an Chen
    Accepted: 2025-10-20
    Cross-domain recommendation systems are widely applied in e-commerce and content platforms. Although the dual-target cross-domain recommendation (DTCDR) proposed in recent years has achieved a breakthrough in simultaneously improving the performance of both domains, it still faces two major challenges: 1) the generated user-item representations lack sufficient correlation and diversity; 2) the semantic noise mixed in the shared preferences leads to negative transfer problems. To address these issues, a dual-target cross-domain recommendation model based on heterogeneous graph and hierarchical preference disentanglement (HGPD-DTCDR) is proposed. Its core innovations include: 1) a heterogeneous graph collaborative learning framework is proposed to integrate user-item interactions, user social networks, and item attribute similarities, constructing a multi-relation heterogeneous graph, and generating high-order semantic representations through a relation graph convolutional network (R-GCN) to enhance the diversity and correlation of the representations; 2) a two-stage decoupling process is designed, first separating domain-specific and shared preferences through a variational graph encoder, and then introducing a semantic filtering network to optimize the quality of shared preferences. Experiments on five real cross-domain datasets show that the performance improvement of this model stems from the synergistic effect of heterogeneous graph modeling and hierarchical decoupling mechanisms. Compared with the best baseline, it achieves average improvements of 3.55%, 7.27%, and 15.57% in hit rate, normalized discounted cumulative gain, and mean reciprocal rank, respectively. In data-sparse scenarios, the performance improvement is even more significant, with an average gain of 10.35%. Ablation studies further verify the effectiveness of each technical component and their synergistic effects.
  • AI-Enabled Vehicular Edge Computing
    CUI Mengmeng, SHI Jingyan, XIANG Haolong
    Computer Engineering. 2025, 51(9): 25-37. https://doi.org/10.19678/j.issn.1000-3428.0069836

    To optimize Quality of Service (QoS), Mobile Edge Computing (MEC) has been deeply integrated into the Internet of Vehicle (IoV) to provide geographically proximal computing resources for vehicles, thereby reducing task processing latency and energy consumption. However, traditional MEC server deployment relies primarily on terrestrial Base Stations (BSs), resulting in high deployment costs and limited coverage, making it difficult to ensure uninterrupted services for all vehicles. Air-ground collaborative IoV technology has emerged as a solution to these challenges. Unmanned Aerial Vehicles (UAVs) can dynamically assist Road-Side Units (RSUs) using their flexibility in line-of-sight links, providing more flexible computing resources for vehicular users, thereby ensuring the continuity and efficiency of in-vehicle services. Therefore, this study proposes a Dynamic Vehicular Edge Task Offloading Method (DVETOM) based on air-ground collaboration. This method adopts a vehicle-road-air architecture, establishing Vehicle-to-RSU (V2R) and Vehicle-to-UAV (V2U) links. Transmission and computation models are constructed for three modes: local execution of vehicular tasks, offloading tasks to the RSU, and offloading tasks to the UAV. An objective function is established with the joint optimization goal of minimizing system latency and energy consumption. DVETOM transforms the task offloading problem into a Markov Decision Process (MDP) and optimizes the task offloading strategy by using the Distributed Deep Deterministic Policy Gradient (D4PG) algorithm based on Deep Reinforcement Learning (DRL). Compared with 5 benchmark methods, experimental results show that DVETOM outperforms existing methods by 3.45%—23.7% in terms of reducing system latency and 5.8%—23.47% in terms of reducing system energy consumption while improving QoS for vehicular users. In conclusion, DVETOM enhances the offloading of vehicular edge computing tasks within the IoV effectively. It offers IoV users a more efficient and energy-conserving solution, showcasing its extensive potential for application in intelligent transportation systems.

  • Research Hotspots and Reviews
    LI Yakang, LI Jianfang, HU Peng, CHEN Juan, WANG Shengxiang, QI Fazhi, CHEN Gang
    Computer Engineering. 2025, 51(10): 53-70. https://doi.org/10.19678/j.issn.1000-3428.0069651

    This study explores the use of Artificial Intelligence (AI) technology throughout the neutron scattering experiments′ lifecycle to determine how AI technology can revolutionize key aspects such as experimental apparatus, data acquisition, and data processing. The study begins by introducing the fundamental principles and experimental procedures of neutron scattering technology before focusing on the multifaceted applications of AI technology in neutron scattering experiments. These applications include optimizing experimental infrastructure, data acquisition, and imaging preprocessing, as well as characterizing experimental samples in neutron diffraction, neutron reflection, and Inelastic Neutron Scattering (INS). This study demonstrates the importance of AI technology in increasing the intelligence level of experiments, accelerating data processing, and improving the accuracy and reliability of data analyses. In addition, an in-depth discussion is held on the future application of AI technology in neutron scattering experiments, indicating that with the continuous advancement of technologies such as multimodal learning, interpretable models, large language models, and AI-Ready databases, AI technology is poised to bring revolutionary changes to neutron scattering experiments, opening up new avenues for revealing the microstructure and properties of complex material systems.

  • Zhang Yao, Zhang Junsan, Ma Junpeng, Yao Zongquan, Liu Tianyi
    Accepted: 2025-11-07
    This paper proposes an improved YOLOv8-based model named CAFR-YOLO to address the issues of insufficient cross-level feature interaction and limited feature representation capability in multi-scale object detection under complex scenes. First, a novel cross-scale feature reorganization pipeline was designed, constructing the Channel Attention-guided Feature Reorganization (CAFR) module. By using a specific layer as the fusion backbone and incorporating scale alignment, attention-weighted fusion, and feature subset splicing strategies, it alleviates insufficient cross-level interaction in traditional feature pyramid structures. Secondly, at the local level, the method introduces the C2f_DCNv3 module into the backbone network, significantly enhancing the model's geometric adaptability by exploiting the dynamic sampling characteristics of deformable convolution. From a global perspective, the C2f_SAConv module is constructed by combining Switchable Atrous Convolution (SAC) with the C2f module, optimizing multi-scale semantic feature fusion through dynamic atrous rate adjustment. These two approaches enhance the model's robustness to complex scenes. Finally, SPDConv replaces traditional convolution structures, strengthening feature representation through spatial-channel reorganization while reducing computational complexity. Experimental results demonstrate that CAFR-YOLO achieves 86.3% mAP@0.5 and 67.2% mAP@0.5:0.95 on the PASCAL VOC dataset with comparable computational costs to the original model. On the MS COCO dataset, it improves mAP@0.5 and mAP@0.5:0.95 by 3.5% and 3.9%, respectively. Compared to existing state-of-the-art methods, CAFR-YOLO exhibits significant advantages across multiple metrics. The proposed CAFR-YOLO model substantially enhances multi-scale object detection accuracy and robustness while maintaining computational efficiency, providing a novel solution for real-time object detection tasks.
  • Research Hotspots and Reviews
    WANG Qun, LI Fujuan, MA Zhuo
    Computer Engineering. 2025, 51(8): 39-52. https://doi.org/10.19678/j.issn.1000-3428.0070248

    Autonomous Systems (ASes) that constitute the Border Gateway Protocol (BGP) have different interests and route policies. When actual route announcements exceed expected boundaries, route leakages can occur, leading to network security incidents caused by route redirection. In the propagation of BGP route information, ASes unconditionally trust and accept the routes declared by neighboring ASes. Additionally, each AS independently configures its own local policies and keeps this information secret, which complicates the verification of this route policy. This has been a persistent and unresolved challenge in the field of BGP security. Blockchain technology, with its inherent characteristics of decentralization, traceability, immutability, and transparency, offers a promising infrastructure for digital resource authentication and trust among ASes, potentially serving as a key technology for addressing the threat of route leakages. This study first clearly defines the relationships between neighboring ASes, as well as between the GR (Gao-Rexford) model and BGP route policies, elucidating the root causes of route leakages and the challenges in their prevention. Additionally, it reviews the research on traditional solutions to route leakages, focusing on their strengths, weaknesses, and unresolved issues. Subsequently, it proposes the advantages and technical approaches of using blockchain technology to defend against BGP route leakages and explores the principles and application characteristics of typical solutions. Finally, it discusses the existing challenges and outlines future research directions.

  • Research Hotspots and Reviews
    DONG Yuze, ZHANG Zhongzhi
    Computer Engineering. 2025, 51(6): 20-28. https://doi.org/10.19678/j.issn.1000-3428.0070532

    This study investigates the consensus problem, a fundamental issue in distributed systems and network control. Consensus studies have traditionally focused on unweighted networks, overlooking the impact of edge weights in real-world networks. However, networks such as transportation systems, social networks, and power networks exhibit significant weighted properties, and unweighted models fail to fully capture their complex interactions. To address this issuse, this study examines a cluster of pseudo-fractal-weighted networks to determine how edge weights affect consensus. The Laplacian matrix is used to establish a relationship between the Kirchhoff indices and network consensus, providing an in-depth analysis of consensus behavior in weighted networks. Through the calculation of recursive relations for various indices across iterations, precise formulas for key quantities such as the multiplicative Kirchhoff index, additive Kirchhoff index, Kirchhoff index, and network coherence are derived. A numerical analysis shows that as the network size increases, consensus in weighted networks converges to a constant, indicating greater resistance to external noise.

  • FENG Guang, SU Xu, LIN Yibao, ZHAO Zhiwen, HUANG Junhui, SUN Xiangli, LIAO Beirong
    Accepted: 2025-12-15
    】Multimodal sentiment analysis leverages the complementary information of speech, text, and visual modalities to enhance emotion recognition accuracy and robustness. However, existing approaches still face three major challenges: (1) the lack of unified modeling for multi-scale emotional dynamics across fast and slow temporal rhythms; (2) the difficulty in explicitly characterizing semantic dominance and subordination among modalities; and (3) the limited ability to adaptively regulate modality intensity and information contribution. To address these issues, this paper proposes a multimodal sentiment analysis framework that integrates multi-scale encoding with a polarity-aware fusion mechanism. Specifically, a Multi-Scale Mamba encoder (MS-Mamba) is introduced for visual and audio modalities to jointly capture global and local temporal dependencies; a Polarity-Aware Fusion (PAF) module is designed to explicitly model inter-modal enhancement and suppression through semantic residuals and signed weights; and a Polarity-Driven Gating (PDG) mechanism is developed to adaptively control information flow via a saliency–direction disentanglement strategy. These components collaboratively form a closed-loop structure of “temporal modeling–polarity alignment–global gating.” Experimental results on the CMU-MOSI and CMU-MOSEI datasets demonstrate that the proposed model achieves binary classification accuracies of 86.58% and 86.50%, with F1 scores of 86.59% and 86.26%, respectively—yielding an average improvement of approximately 1.3% over mainstream baselines. The results validate the effectiveness and robustness of the proposed method in semantic alignment, temporal modeling, and adaptive fusion.
  • Artificial Intelligence and Pattern Recognition
    SONG Jie, XU Huiying, ZHU Xinzhong, HUANG Xiao, CHEN Chen, WANG Zeyu
    Computer Engineering. 2025, 51(7): 127-139. https://doi.org/10.19678/j.issn.1000-3428.0069257

    Existing object detection algorithms suffer from low detection accuracy and poor real-time performance when detecting fall events in indoor scenes, owing to changes in angle and light. In response to this challenge, this study proposes an improved fall detection algorithm based on YOLOv8, called OEF-YOLO. The C2f module in YOLOv8 is improved by using a Omni-dimensional Dynamic Convolution (ODConv) module, optimizing the four dimensions of the kernel space to enhance feature extraction capabilities and effectively reduce computational burden. Simultaneously, to capture finer grained features, the Efficient Multi-scale Attention (EMA) module is introduced into the neck network to further aggregate pixel-level features and improve the network's processing ability in fall scenes. Integrating the Focal Loss idea into the Complete Intersection over Union (CIoU) loss function allows the model to pay more attention to difficult-to-classify samples and optimize overall model performance. Experimental results show that compared to YOLOv8n, OEF-YOLO achieves improvements of 1.5 and 1.4 percentage points in terms of mAP@0.5 and mAP@0.5∶0.95, the parameters and computational complexity are 3.1×106 and 6.5 GFLOPs. Frames Per Second (FPS) increases by 44 on a Graphic Processing Unit (GPU), achieving high-precision detection of fall events while also meeting deployment requirements in low computing scenarios.

  • Development Research and Engineering Application
    ZHU Yazhou, DU Pingchuan, CHAI Zhilei
    Computer Engineering. 2025, 51(12): 337-345. https://doi.org/10.19678/j.issn.1000-3428.0069437

    As a mainstream tool for container orchestration, Kubernetes can support automatic deployment, service discovery, and load balancing. It is known for its high availability and performance. However, scheduling strategies such as the best adaptation algorithm or the minimum negative cut method ignore the heterogeneity and energy differences of nodes. In addition, Kubernetes tools only consider CPU and memory resources and set a unified weight mechanism in advance, which can easily lead to problems such as load imbalance, performance degradation, and the inability to satisfy refined scheduling. To address these existing problems, this study proposes a heterogeneous task scheduling algorithm based on multi-dimensional resources, namely A-KCSS. The A-KCSS algorithm is based on the heterogeneous computing resources of a cluster. It adds disk Input/Output (I/O), network I/O load, and GPU resources as evaluation indicators for filtering and screening, and it more comprehensively considers the heterogeneity of nodes. This study also introduces a weight calculation model based on multi-dimensional resource factors. Based on the resource requirements of the task to be scheduled, the weight value of each dimension of the resource factor of the task to be scheduled is calculated, and the score of each node is calculated based on the real-time resource utilization of the cluster. Nodes are prioritized according to the score, and the node with the highest priority is selected for scheduling. The performance of the A-KCSS algorithm is experimentally verified on the Kubernetes cluster. Compared with the default scheduling algorithm and the KCSS algorithm, the average response time is reduced by 10% and 4%, the throughput is increased by 30% and 15%, the availability is improved by 40% and 30%, and the load balancing performance is increased by 23% and 18%, respectively, thereby improving the overall cluster performance.

  • Research Hotspots and Reviews
    LIU Kai, REN Hongyi, LI Ying, JI Yi, LIU Chunping
    Computer Engineering. 2025, 51(6): 49-56. https://doi.org/10.19678/j.issn.1000-3428.0068910

    Medical Visual Question Answering (Med-VQA) requires an understanding of content related to both medical images and text-based questions. Therefore, designing effective modal representations and cross-modal fusion methods is crucial for performing well in Med-VQA tasks. Currently, Med-VQA methods focus only on the global features of medical images and the distribution of attention within a single modality, ignoring medical information in the local features of images and cross-modal interactions, thereby limiting the understanding of image content. This study proposes the Cross-Modal Attention-Guided Medical VQA (CMAG-MVQA) model. First, based on U-Net encoding, this method effectively enhances the local features of an image. Second, from the perspective of cross-modal collaboration, a selection guided attention method is proposed to introduce interactive information from other modalities. In addition, a self-attention mechanism is used to further enhance the image representation obtained by selective guided attention acquisition. Ablation and comparative experiments on the VQA-RAD medical question-answering dataset show that the proposed method performs well in Med-VQA tasks and improves feature representation performance compared to similar methods.

  • Research Hotspots and Reviews
    QIN Yongwang, ZHANG Yang, HU Xing, LIU Sheng, LI Shaoqing
    Computer Engineering. 2025, 51(6): 29-37. https://doi.org/10.19678/j.issn.1000-3428.0068882

    With the rapid increase in the complexity of integrated circuit design, a trend of globalization and division of labor has emerged, necessitating the involvement of an increasing number of third-party Intellectual Property (IP) core providers. The widespread use of third-party IP cores introduces risks of hardware trojans. To detect and evaluate the presence of hardware trojans and their potential functionalities in third-party IP cores, there is an urgent need to explore feasible hardware security evaluation methods for IP cores. The functional identification of digital circuit modules has garnered significant attention as a fundamental research area in hardware trojan analysis. In this study, the task of circuit function detection is transformed into a multiclassification problem. By leveraging the characteristics of the circuit and graph data structures, a gate-level circuit function classification and detection method based on Graph Attention Networks (GAT) is proposed. First, to address the lack of functional identification datasets for gate-level netlists, a representative set of Register Transfer Level (RTL) codes is collected and synthesized to generate gate-level netlists, thereby constructing a gate-level circuit dataset of appropriate scale and diversity. Subsequently, to extract and process the circuit feature information, a software tool based on text recognition is developed. This tool maps the complex interconnections of circuits into a structured and concise JSON(JavaScript Object Notation) format, thereby facilitating neural network processing. Finally, a graph attention neural network is employed to train a multiclassifier using the constructed gate-level netlist dataset. After training, the multiclassifier becomes capable of classifying and identifying unknown gate-level circuits. The experimental results demonstrate that the classifier, after learning from more than 3 000 netlists in the self-built dataset, achieves a classification accuracy of 90% for 645 netlists across six categories.

  • Graphics and Image Processing
    XIAO Jian, HUANG Bo, CHENG Hongliang, HU Xin, YUAN Ye
    Computer Engineering. 2025, 51(10): 319-326. https://doi.org/10.19678/j.issn.1000-3428.0069182

    Traditional face recognition systems use various bionic algorithms combined with Support Vector Machines (SVM) to form a corresponding face recognition model for the final face classification problem. This method selects the optimal SVM parameters through algorithm iteration. However, this strategy is hindered by low classification accuracy, long training time, and the possibility of easily falling into the local optimal solution. This paper proposes a face recognition method using an improved Artificial Hummingbird Algorithm (AHA) to optimize SVM. First, AHA is improved by introducing a chaotic sequence of Tent mapping so that the hummingbird population is initialized more uniformly and the algorithm does not fall into the local optimal solution; second, the improved AHA algorithm is introduced in the method of face recognition using SVM. By setting a certain number of iterations for the algorithm, the optimal relevant parameters used to optimize SVM are selected to improve face recognition accuracy. The improved AHA is compared to the Grey Wolf Optimizer (GWO), Sparrow Search Algorithm (SSA), and Whale Optimization Algorithm (WOA). The improved AHA has a faster convergence speed in solving the benchmark function. Simultaneously, in a face recognition experiment on the ORL face database, the improved AHA combined with SVM is compared to GWO, SSA and WOA combined with SVM. In face recognition tasks, the improved AHA combined with SVM achieves higher accuracy and recall rate, with a faster inference speed.

  • Graphics and Image Processing
    MIAO Ru, LI Yi, ZHOU Ke, ZHANG Yanna, CHANG Ranran, MENG Geng
    Computer Engineering. 2025, 51(8): 292-304. https://doi.org/10.19678/j.issn.1000-3428.0068856

    The complex backgrounds, diverse target types, and significant scale variations in remote sensing images lead to target omission and false detection. To address these issues, this study proposes an improved Faster R-CNN multi-object detection model. First, the ResNet 50 backbone network is replaced with the Swin Transformer to enhance the model's feature extraction capability. Second, a Balanced Feature Pyramid (BFP) module is introduced to fuse shallow and deep semantic information, further strengthening the feature fusion effect. Finally, in the classification and regression branches, a dynamic weighting mechanism is incorporated to encourage the network to focus more on high-quality candidate boxes during training, thereby improving the precision of target localization and classification. The experimental results on the RSOD dataset show that the proposed model significantly reduces the number of Floating-Point Operations per second (FLOPs) compared to the Faster R-CNN model. The proposed model achieves 10.7 percentage points improvement in mAP@0.5 ∶0.95 and 10.6 percentage points increase in Average Recall (AR). Compared to other mainstream detection models, the proposed model achieves higher accuracy while reducing the false detection rate. These results indicate that the proposed model significantly enhances detection accuracy in remote sensing images with complex backgrounds.

  • Interdisciplinary Integration and Engineering Applications
    GU Qun, SUI Siyi, WANG Rui, ZHANG Hai, XU Tianpeng
    Computer Engineering. 2026, 52(3): 429-440. https://doi.org/10.19678/j.issn.1000-3428.0069910

    This study proposes a skin melanoma segmentation algorithm, YOLOv8-Skin, designed to address the issue of imprecise results in existing algorithms caused by diverse shapes and blurred edges. YOLOv8-Skin combines multiscale feature extraction and enhanced edge segmentation based on YOLOv8. First, the backbone network CSPDarkNet53 of YOLOv8 is replaced with U-Net v2, which is more suitable for medical image segmentation. This change introduces rich semantic information into low-level features and refines high-level features, enabling precise delineation of lesion boundaries and effective extraction of small structures in melanoma images. Second, a Deformable-Large Kernel Attention (D-LKA) mechanism is introduced into the neck's C2f, enhancing the model's ability to capture irregular image structures through deformable convolutions and improving multilevel feature fusion using large kernel convolutions. Finally, a Diverse Branch Block (DBB) is incorporated into the head, forming a new segmentation head that enhances the representation capability of single convolutions by combining diverse branches of different scales and complexities. This enriches the feature space and improves feature extraction. Experiments conducted on the ISIC2017, ISIC2018, and PH2 datasets verify the algorithm's effectiveness. On the ISIC2017 dataset, the Dice coefficient, Specificity, Sensitivity, and Accuracy reach 88.86%, 91.34%, 97.24%, and 96.29%, respectively. On the ISIC2018 dataset, they reach 91.64%, 95.42%, 96.69%, and 95.83%, respectively. On the PH2 dataset, they reach 95.92%, 95.43%, 97.02%, and 96.13%, respectively. The algorithm demonstrates stronger segmentation performance and is better suited for melanoma segmentation tasks compared to existing methods.

  • Graphics and Image Processing
    DING Shuai, KUANG Liqun, CAO Yaming, HAN Huiyan, XIONG Fengguang
    Computer Engineering. 2025, 51(11): 283-293. https://doi.org/10.19678/j.issn.1000-3428.0069680

    Traditional methods for human behavior recognition based on RGB videos face numerous challenges when dealing with complex backgrounds, lighting effects, and variations in appearance. By contrast, methods that leverage human skeletal information for behavior recognition are less affected by these factors. However, the current mainstream skeleton-based behavior recognition methods struggle to balance accuracy and complexity. To maintain high recognition accuracy while addressing issues such as large model parameter size and high computational complexity, a lightweight network structure comprising three novel encoding blocks is proposed. First, efficient multiscale attention modules are incorporated into the self-attention graph convolutional module for spatial modeling and the multiscale temporal convolutional module for temporal modeling, enhancing the ability of the model to recognize and utilize temporal and spatial feature information, thereby enriching skeletal data features. Second, a multifeature fusion adaptive module is employed to strengthen the feature fusion and generalization capabilities. Finally, an iterative feature fusion enhancement module is utilized to further improve the understanding of complex feature relationships. Experimental results demonstrate that, on the large-scale NTU-RGB+D60 dataset, the proposed method achieves accuracy rates of 91.1% and 95.4% under Cross-Subject (CS) and Cross-View (CV) evaluations, respectively. On the NTU-RGB+D120 dataset, it attains accuracy rates of 87.3% and 88.8% under CS and Cross-Setup (SS) evaluations, respectively, with a parameter count of 0.72×106 and a floating-point operation count of 0.6×109. Comparative experiments indicate that the proposed algorithm outperforms several mainstream algorithms in recent years in terms of parameter size, floating-point operation count, and recognition accuracy, effectively balancing the relationships among these metrics and providing a lightweight network model for precise human behavior recognition.