Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Most Read

  • Published in last 1 year
  • In last 2 years
  • In last 3 years
  • All

Please wait a minute...
  • Select all
    |
  • Graphics and Image Processing
    WANG Shumeng, XU Huiying, ZHU Xinzhong, HUANG Xiao, SONG Jie, LI Yi
    Computer Engineering. 2025, 51(9): 280-293. https://doi.org/10.19678/j.issn.1000-3428.0069353

    In Unmanned Aerial Vehicle (UAV) aerial photography, targets are usually small targets with dense distribution and unobvious features, and the object scale varies greatly. Therefore, the problems of missing detection and false detection are easy to occur in object detection. In order to solve these problems, a lightweight small object detection algorithm based on improved YOLOv8n, namely PECS-YOLO, is proposed for aerial photography. By adding P2 small object detection layer in the Neck part, the algorithm combines shallow and deep feature maps to better capture details of small targets. A lightweight convolution, namely PartialConv, is introduced to a new structure of Cross Stage Partial PartialConv (CSPPC), to replace Concatenation with Fusion (C2f) in the Neck network to realized lightweight of the model. By using a model of Spatial Pyramid Pooling with Efficient Layer Aggregation Network (SPPELAN), small object features can be captured effectively. By adding Squeeze-and-Excitation (SE)attention mechanism in front of each detection head in the Neck part, the network can better focus on useful channels and reduce the interference of background noise on small object detection tasks in complex environments. Finally, EfficiCIoU is used as the boundary frame loss function, and the shape difference of the boundary frame is also taken into account, which enhances the detection ability of the model for small targets. Experimental results show that, compared YOLOv8n, the mean Average Precision at Intersection over Union (IoU) of 0.5 (mAP@0.5) and the mean Average Precision at IoU of 0.5∶0.95 (mAP@0.5∶0.95) of PECS-YOLO object detection algorithm on VisDrone2019-DET dataset are increased by 3.5% and 3.7% respectively, the number of parameters is reduced by about 25.7%, and detection speed is increased by about 65.2%. In summary, PECS-YOLO model is suitable for small object detection in UAV aerial photography.

  • Artificial Intelligence and Pattern Recognition
    PENG Juhong, ZHANG Chi, GAO Qian, ZHANG Guangming, TAN Donghua, ZHAO Mingjun
    Computer Engineering. 2025, 51(7): 152-160. https://doi.org/10.19678/j.issn.1000-3428.0069283

    Steel surface defect detection technology in industrial scenarios is hindered by low detection accuracy and slow convergence speed. To address these issues, this study presents an improved YOLOv8 algorithm, namely a YOLOv8n-MDC. First, a Multi-scale Cross-fusion Network (MCN) is added to the backbone network. Establishing closer connections between the feature layers promotes uniform information transmission and reduces semantic information loss during cross-layer feature fusion, thereby enhancing the ability of the model to perceive steel defects. Second, deformable convolution is introduced in the module to adaptively change the shape and position of the convolution kernel, enabling a more flexible capture of the edge features of irregular defects, reducing information loss, and improving detection accuracy. Finally, a Coordinate Attention (CA) mechanism is added to embed position information into channel attention, solving the problem of position information loss and enabling the model to perceive the position and morphological features of defects, thereby enhancing detection precision and stability. Experimental results on the NEU-DET dataset show that the YOLOv8n-MDC algorithm achieves mAP@0.5 of 81.0%, which is 4.2 percentage points higher than that of the original baseline network. The algorithm has a faster convergence speed and higher accuracy; therefore, it meets the requirements of practical industrial production.

  • 40th Anniversary Celebration of Shanghai Computer Society
    QI Fenglin, SHEN Jiajie, WANG Maoyi, ZHANG Kai, WANG Xin
    Computer Engineering. 2025, 51(4): 1-14. https://doi.org/10.19678/j.issn.1000-3428.0070222

    The rapid development of Artificial Intelligence (AI) has empowered numerous fields and significantly impacted society, establishing a solid technological foundation for university informatization services. This study explores the historical development of both AI and university informatization by analyzing their respective trajectories and interconnections. Although universities worldwide may focus on different aspects of AI in their digital transformation efforts, they universally demonstrate vast potential of AI in enhancing education quality and streamlining management processes. Thus, this study focuses on five core areas: teaching, learning, administration, assessment, and examination. It comprehensively summarizes typical AI-empowered application cases to demonstrate how AI effectively improves educational quality and management efficiency. In addition, this study highlights the potential challenges associated with AI applications in university informatization, such as data privacy protection, algorithmic bias, and technology dependence. Furthermore, common strategies for addressing these issues such as enhancing data security, optimizing algorithm transparency and fairness, and fostering digital literacy among both teachers and students are elaborated upon in this study. Based on these analyses, the study explores future research directions for AI in university informatization, emphasizing the balance technological innovation and ethical standards. It advocates for the establishment of interdisciplinary collaboration mechanisms to promote the healthy and sustainable development of AI in the field of university informatization.

  • Multimodal Information Fusion
    LI Jianlang, WU Xindian, CHEN Ling, YANG Bo, TANG Wensheng
    Computer Engineering. 2026, 52(2): 299-310. https://doi.org/10.19678/j.issn.1000-3428.0070113

    This study proposes a Common and Differential Cross-Attention Module-Bird's-Eye View (CDCAM-BEV) algorithm that combines 4D millimeter-wave radar and vision fusion to improve target detection accuracy for pedestrian and vehicle target recognition and localization in autonomous driving scenarios. First, a radar cylinder network is designed to encode the 4D radar point cloud into a pseudo image and convert the monocular image into a Bird's-Eye View (BEV) feature through Orthogonal Feature Transformation (OFT). Second, based on the cross-attention mechanism, a Common Information Extraction Module (CICAM) and a Differential Information Extraction Module (DICAM) are used to fully explore the common and differential information between radar and images. Finally, a BEV feature fusion module is designed based on CICAM and DICAM to achieve feature-level fusion of image and radar information in the BEV space. Experiments are conducted on the VOD dataset, and the CDCAM-BEV algorithm is compared with five other 3D object detection algorithms. The experimental results show that CDCAM-BEV achieves better detection performance in multiple modes. In the 3D mode, the average detection accuracy of CDCAM-BEV is 3.65 percentage points higher than that of the second ranked Part-A2; in the BEV mode, it is 5.04 percentage points higher than that of the second ranked PointPillars; in the Average Directional Similarity (AOS) mode, it is 2.62 percentage points higher than that of the second ranked Part-A2. These results show that CDCAM-BEV exhibits excellent performance in all modes, effectively fusing images and 4D radar point cloud features, which significantly improves the accuracy and reliability of object detection.

  • Graphics and Image Processing
    WANG Guoming, JIA Daiwang
    Computer Engineering. 2025, 51(12): 294-303. https://doi.org/10.19678/j.issn.1000-3428.0070027

    Deep learning-based object detection has significantly improved the detection of medium and large targets. However, when detecting small objects, traditional algorithms often face challenges such as missed detections and false positives owing to the inherent issues of small scale and complex backgrounds. Therefore, this study aims to enhance the accuracy of small object detection by improving the YOLOv8 model. First, the convolutional module in the backbone is replaced with the RFAConv module, which enhances the ability of the model to process complex images. Second, a Mixed Local Channel Attention (MLCA) mechanism is introduced in the neck part, allowing the model to fuse features from different layers more efficiently while maintaining computational efficiency. Third, the Detect head of YOLOv8 is replaced with the Detect_FASFF head to address the inconsistency between different feature scales and improve the ability of the model to detect small objects. Finally, the Complete Intersection over Union (CIoU) loss function is replaced with the Focaler-IoU loss function, enabling the model to focus more on small objects that are difficult to locate precisely. Experimental results show that the improved model increases mAP@0.5 by 4.8 percentage points and mAP@0.5:0.95 by 3.0 percentage points on the FloW-Img dataset, which is sparse in small objects. On the VisDrone2019 dataset which has a high density of small objects, mAP@0.5 increases by 5.9 percentage points and mAP@0.5:0.95 improves by 4.0 percentage points. In addition, generalization comparison experiments are conducted on the low-altitude dataset AU-AIR and the pedestrian-dense detection dataset WiderPerson. The optimized model significantly improves the accuracy of small object detection compared with the original model and expands its applicability.

  • Research Hotspots and Reviews
    SUN Lijun, MENG Fanjun, XU Xingjian
    Computer Engineering. 2025, 51(11): 1-21. https://doi.org/10.19678/j.issn.1000-3428.0069543

    In the context of ongoing advancements in educational informatization, constructing precise and efficient curriculum knowledge graphs has become key to promoting personalized education development. As a structured knowledge representation model, curriculum knowledge graphs reveal complex relations between curriculum content and learning objectives to optimize the allocation of educational resources, and tailoring personalized learning paths for learners. This survey presents a discussion around the techniques used to construct curriculum knowledge graphs, starting with an explanation of the basic concepts; intrinsic connections; and significant differences among general, educational, and curriculum knowledge graphs. It then delves into the key technologies used for building curriculum knowledge graphs, covering aspects such as curriculum ontology design, entity extraction, and relation extraction, and provides a detailed analysis and summary of their evolution, key features, and limitations. Furthermore, it explores the application value of curriculum knowledge graphs in scenarios such as learning resource recommendation, learner behavior profile and modeling, and multimodal curriculum knowledge graph construction. Finally, it focuses on the challenges in constructing curriculum knowledge graphs, such as data diversity and heterogeneity, difficulties in quality evaluation, and the lack of cross-curriculum integration, and provides future-oriented insights based on cutting-edge technologies such as deep learning and Large Language Models (LLMs).

  • Artificial Intelligence and Pattern Recognition
    ZHOU Hanqi, FANG Dongxu, ZHANG Ningbo, SUN Wensheng
    Computer Engineering. 2025, 51(4): 57-65. https://doi.org/10.19678/j.issn.1000-3428.0069100

    Unmanned Aerial Vehicle (UAV) Multi-Object Tracking (MOT) technology is widely used in various fields such as traffic operation, safety monitoring, and water area inspection. However, existing MOT algorithms are primarily designed for single-UAV MOT scenarios. The perspective of a single-UAV typically has certain limitations, which can lead to tracking failures when objects are occluded, thereby causing ID switching. To address this issue, this paper proposes a Multi-UAV Multi-Object Tracking (MUMTTrack) algorithm. The MUMTTrack network adopts an MOT paradigm based on Tracking By Detection (TBD), utilizing multiple UAVs to track objects simultaneously and compensating for the perspective limitations of a single-UAV. Additionally, to effectively integrate the tracking results from multiple UAVs, an ID assignment strategy and an image matching strategy are designed based on the Speeded Up Robust Feature (SURF) algorithm for MUMTTrack. Finally, the performance of MUMTTrack is compared with that of existing widely used single-UAV MOT algorithms on the MDMT dataset. According to the comparative analysis, MUMTTrack demonstrates significant advantages in terms of MOT performance metrics, such as the Identity F1 (IDF1) value and Multi-Object Tracking Accuracy (MOTA).

  • Artificial Intelligence and Pattern Recognition
    HUANG Kun, QI Zhaojian, WANG Juanmin, HU Qian, HU Weichao, PI Jianyong
    Computer Engineering. 2025, 51(5): 133-142. https://doi.org/10.19678/j.issn.1000-3428.0069026

    Pedestrian detection in crowded scenes is a key technology in intelligent monitoring of public space. It enables the intelligent monitoring of crowds, using object detection methods to detect the positions and number of pedestrians in videos. This paper presents Crowd-YOLOv8, an improved version of the YOLOv8 detection model, to address the issue of pedestrians being easily missed owing to occlusion and small target size in densely populated areas. First, nostride-Conv-SPD is introduced into the backbone network to enhance its capability of extracting fine-grained information, such as small object features in images. Second, small object detection heads and the CARAFE upsampling operator are introduced into the neck part of the YOLOv8 network to fuse features at different scales and improve the detection performance in the case of small targets. Experimental results demonstrate that the proposed method achieves an mAP@0.5 of 84.3% and an mAP@0.5∶0.95 of 58.2% on a CrowdedHuman dataset, which is an improvement of 3.7 and 5.2 percentage points, respectively, compared to those of the original YOLOv8n. On the WiderPerson dataset, the proposed method achieves an mAP@0.5 of 88.4% and an mAP@0.5∶0.95 of 67.4%, which is an improvement of 1.1 and 1.5 percentage points compared to those of the original YOLOv8n.

  • Research Hotspots and Reviews
    LU Yue, ZHOU Xiangyu, ZHANG Shizhou, LIANG Guoqiang, XING Yinghui, CHENG De, ZHANG Yanning
    Computer Engineering. 2025, 51(10): 1-17. https://doi.org/10.19678/j.issn.1000-3428.0070575

    Traditional machine learning algorithms perform well only when the training and testing sets are identically distributed. They cannot perform incremental learning for new categories or tasks that were not present in the original training set. Continual learning enables models to learn new knowledge adaptively while preventing the forgetting of old tasks. However, they still face challenges related to computation, storage overhead, and performance stability. Recent advances in pre-training models have provided new research directions for continual learning, which are promising for further performance improvements. This survey summarizes existing pre-training-based continual learning methods. According to the anti-forgetting mechanism, they are categorized into five types: methods based on prompt pools, methods with slow parameter updating, methods based on backbone branch extension, methods based on parameter regularization, and methods based on classifier design. Additionally, these methods are classified according to the number of phases, fine-tuning approaches, and use of language modalities. Subsequently, the overall challenges of continual learning methods are analyzed, and the applicable scenarios and limitations of various continual learning methods are summarized. The main characteristics and advantages of each method are also outlined. Comprehensive experiments are conducted on multiple benchmarks, followed by in-depth discussions on the performance gaps among the different methods. Finally, the survey discusses research trends in pre-training-based continual learning methods.

  • Research Hotspots and Reviews
    CI Tianzhao, YANG Hao, ZHOU You, XIE Changsheng, WU Fei
    Computer Engineering. 2025, 51(3): 1-23. https://doi.org/10.19678/j.issn.1000-3428.0068673

    Smartphones have become an integral part of modern daily life. The Android operating system currently holds the largest market share in the mobile operating system market owing to its open-source nature and comprehensive ecosystem. Within Android smartphones, the storage subsystem plays a pivotal role, exerting a significant influence on the user experience. However, the design of Android mobile storage systems diverges from server scenarios, necessitating the consideration of distinct factors, such as resource constraints, cost sensitivity, and foreground application prioritization. Extensive research has been conducted in this area. By summarizing and analyzing the current research status in this field, we categorize the issues experienced by users of Android smartphone storage systems into five categories: host-side writing amplification, memory swapping, file system fragmentation, flash device performance, and I/O priority inversion. Subsequently, existing works addressing these five categories of issues are classified, along with commonly used tools for testing and analyzing mobile storage systems. Finally, we conclude by examining existing techniques that ensure the user experience with Android smartphone storage systems and discuss potential avenues for future investigation.

  • Cyberspace Security
    YAO Yupeng, WEI Lifei, ZHANG Lei
    Computer Engineering. 2025, 51(6): 223-235. https://doi.org/10.19678/j.issn.1000-3428.0069133

    Federated learning enables participants to collaboratively model without revealing their raw data, thereby effectively addressing the privacy issue of distributed data. However, as research advances, federated learning continues to face security concerns such as privacy inference attacks and malicious client poisoning attacks. Existing improvements to federated learning mainly focus on either privacy protection or against poisoning attacks without simultaneously addressing both types of attacks. To address both inference and poisoning attacks in federated learning, a privacy-preserving against poisoning federated learning scheme called APFL is proposed. This scheme involves the design of a model detection algorithm that utilizes Differential Privacy (DP) techniques to assign corresponding aggregation weights to each client based on the cosine similarity between the models. Homomorphic encryption techniques are employed for the weighted aggregation of the local models. Experimental evaluations of the MNIST and CIFAR10 datasets demonstrate that APFL effectively filters malicious models and defends against poisoning attacks while ensuring data privacy. When the poisoning ratio is no more than 50%, APFL achieves a model performance consistent with the Federated Averaging (FedAvg) scheme in a non-poisoned environment. Compared with the Krum and FLTrust schemes, APFL exhibits average reductions of 19% and 9% in model test error rate, respectively.

  • Large Language Models and Generative Artificial Intelligence
    WANG Heqing, WEI Jie, JING Hongyu, SONG Hui, XU Bo
    Computer Engineering. 2026, 52(2): 383-392. https://doi.org/10.19678/j.issn.1000-3428.0070415

    Large Language Models (LLMs) have made significant progress in dialogue, reasoning, and knowledge retention. However, they still face challenges in terms of factual accuracy, knowledge updates, and a lack of high-quality domain datasets for handling knowledge-intensive tasks in the electricity sector. This study aims to address these challenges by introducing an improved Retrieval-Augmented Generation (RAG) strategy. This strategy combines hybrid retrieval with a fine-tuned generative model for efficient knowledge capturing and updating. The Metadata-driven RAG framework (Meta-RAG) is proposed for knowledge Question Answering (QA) tasks in the electricity domain. This includes data preparation, model fine-tuning, and reasoning retrieval stages. The data-preparation stage involves document conversion, metadata extraction and enhancement, and document parsing. These processes ensure efficient indexing and structured processing of power regulation documents. The Electricity Question Answering (EleQA) dataset, consisting of 19 560 QA pairs, is constructed specifically for this sector. The model fine-tuning stage uses multi-question generation, chain-of-thought prompting, and supervised instruction fine-tuning to optimize the reasoning abilities in specific tasks. The retrieval reasoning stage employs mixed encoding and re-ranking strategies, combining retrieval and generation modules to improve answer accuracy and relevance. Experiments validate the effectiveness of Meta-RAG. Compared to baseline models such as Self-RAG, Corrective-RAG, Adaptive-RAG, and RA-ISF, Meta-RAG shows higher answer accuracy and retrieval hit rates. Meta-RAG with the Qwen1.5-14B-Chat model achieves an overall accuracy of 0.804 3, surpassing the other methods. Ablation and document recall experiments indicate that document retrieval significantly impacts the framework performance, with a 0.292 8 drop in accuracy when the retrieval capability is lost.

  • Development Research and Engineering Application
    ZHANG Boqiang, CHEN Xinming, FENG Tianpei, WU Lan, LIU Ningning, SUN Peng
    Computer Engineering. 2025, 51(4): 373-382. https://doi.org/10.19678/j.issn.1000-3428.0068338

    This paper proposes a path-planning method based on hybrid A* and modified RS curve fusion to address the issue of unmanned transfer vehicles in limited scenarios being unable to maintain a safe distance from surrounding obstacles during path planning, resulting in collisions between vehicles and obstacles. First, a distance cost function based on the KD Tree algorithm is proposed and added to the cost function of the hybrid A* algorithm. Second, the expansion strategy of the hybrid A* algorithm is changed by dynamically changing the node expansion distance based on the surrounding environment of the vehicle, achieving dynamic node expansion and improving the algorithm's node search efficiency. Finally, the RS curve generation mechanism of the hybrid A* algorithm is improved to make the straight part of the generated RS curve parallel to the boundary of the surrounding obstacles to meet the requirements of road driving in the plant area. Subsequently, the local path is smoothed to ensure that it meets the continuity of path curvature changes under the conditions of vehicle kinematics constraints to improve the quality of the generated path. The experimental results show that, compared with traditional algorithms, the proposed algorithm reduces the search time by 38.06%, reduces the maximum curvature by 25.2%, and increases the closest distance from the path to the obstacle by 51.3%. Thus, the proposed method effectively improves the quality of path generation of the hybrid A* algorithm and can operate well in limited scenarios.

  • Frontier Perspectives and Reviews
    QIN Yingxin, ZHANG Kejia, PAN Haiwei, JU Yahao
    Computer Engineering. 2026, 52(2): 46-68. https://doi.org/10.19678/j.issn.1000-3428.0069826

    Deep learning has driven the development of artificial intelligence, which is widely used in computer vision. It provides breakthroughs and remarkable results in complex tasks such as image recognition, object detection, object tracking, and face recognition, demonstrating its excellent recognition and prediction capabilities. However, vulnerabilities and loopholes in deep learning models have been gradually exposed. Deep learning techniques, represented by convolutional neural networks, are extremely sensitive to well-designed adversarial examples, which can easily affect the security and privacy of the models. This paper first summarizes the concept of adversarial attacks, reasons for generating adversarial examples, and related terms. It outlines several types of classical adversarial attack strategies in the digital and physical domains and analyzes their advantages and disadvantages. Second, it focuses on computer vision and summarizes the latest research in adversarial attacks during tasks such as object detection, face recognition, object tracking, monocular depth estimation, and optical flow estimation, from both the digital and physical domains, as well as the various datasets commonly used in the study. It also briefly introduces the current stage of adversarial example defense and detection methods, summarizes the advantages and disadvantages of these methods, and describes examples of the applications of adversarial sample defense for various visual tasks. Finally, based on the summary of adversarial attack methods, it explores and analyzes the deficiencies and challenges of existing computer vision adversarial attacks.

  • Research Hotspots and Reviews
    DI Qinbo, CHEN Shaoli, SHI Liangren
    Computer Engineering. 2025, 51(11): 35-44. https://doi.org/10.19678/j.issn.1000-3428.0069780

    As multivariate time series data become increasingly prevalent across various industries, anomaly detection methods that can ensure the stable operation and security of systems have become crucial. Owing to the inherent complexity and dynamic nature of multivariate time series data, higher demands are placed on anomaly detection algorithms. To address the inefficiencies of existing anomaly detection methods in processing high-dimensional data with complex variable relations, this study proposes an anomaly detection algorithm for multivariate time series data, based on Graph Neural Networks (GNNs) and a diffusion model, named GRD. By leveraging node embedding and graph structure learning, GRD algorithm proficiently captures the relations between variables and refines features through a Gated Recurrent Unit (GRU) and a Denoising Diffusion Probabilistic Model (DDPM), thereby facilitating precise anomaly detection. Traditional assessment methods often employ a Point-Adjustment (PA) protocol that involves pre-scoring, substantially overestimating an algorithm's capability. To reflect model performance realistically, this work adopts a new evaluation protocol along with new metrics. The GRD algorithm demonstrates F1@k scores of 0.741 4, 0.801 7, and 0.767 1 on three public datasets. These results indicate that GRD algorithm consistently outperforms existing methods, with notable advantages in the processing of high-dimensional data, thereby underscoring its practicality and robustness in real-world anomaly detection applications.

  • Research Hotspots and Reviews
    ZHAO Kai, HU Yuhuan, YAN Junqiao, BI Xuehua, ZHANG Linlin
    Computer Engineering. 2025, 51(8): 1-15. https://doi.org/10.19678/j.issn.1000-3428.0069147

    Blockchain, as a distributed and trusted database, has gained significant attention in academic and industrial circles for its effective application in the domain of digital copyright protection. Traditional digital copyright protection technologies suffer from issues such as difficulties in tracking infringements, complexities in copyright transactions, and inadequate protection of legitimate rights, which severely hampering the development of digital copyright protection endeavors. The immutability, traceability, and decentralization inherent in blockchain technology provide a highly reliable, transparent, and secure solution to mitigate the risks associated with digital copyright infringement. This overview starts with an introduction to the fundamental principles of blockchain technology. Then, it discusses the latest research findings on the integration of blockchain with traditional copyright protection technologies to address the problems inherent in traditional copyright protection schemes. Further, an evaluation of the practical applications and potential of blockchain is conducted, emphasizing its positive impact on the copyright protection ecosystem. Finally, this overview delves into the challenges and future trends related to blockchain based copyright protection, ultimately aiming to establish a more robust and sustainable blockchain copyright protection system.

  • Artificial Intelligence and Pattern Recognition
    WANG Shuai, SHI Yancui
    Computer Engineering. 2025, 51(8): 190-202. https://doi.org/10.19678/j.issn.1000-3428.0069636

    The sequence recommendation algorithm dynamically models the user's historical behavior to predict the content they may be interested in. This study focuses on the application of contrastive Self Supervised Learning (SSL) in sequence recommendation, enhancing the model's representation ability in sparse data scenarios by designing effective self supervised signals. First, a personalized data augmentation method incorporating user preferences is proposed to address the issue of noise introduced by random data augmentation. This method guides the augmentation process based on user ratings and combines different augmentation methods for short and long sequences to generate augmented sequences that align with user preferences. Second, a mixed-augmentation training approach is designed to address the issue of imbalanced feature learning during training. In the early stages of training, augmentation sequences are generated using randomly selected methods to enhance the model performance and generalization. In the later stages, augmentation sequences with high similarity to the original sequences are selected to enable the model to comprehensively learn the actual preferences and behavior patterns of users. Finally, traditional sequence prediction objectives are combined with SSL objectives to infer user representations. Experimental verification is performed using the Beauty, Toys, and Sports datasets. Compared with the best result in the baseline model, the HR@5 indicator of the proposed method increases by 6.61%, 3.11%, and 3.76%, and the NDCG@5 indicator increases by 11.40%, 3.50%, and 2.16%, respectively, for the aforementioned datasets. These experimental results confirm the rationality and validity of the proposed method.

  • Development Research and Engineering Application
    TANG Jingwen, LAI Huicheng, WANG Tongguan
    Computer Engineering. 2025, 51(4): 303-313. https://doi.org/10.19678/j.issn.1000-3428.0068897

    Pedestrian detection in intelligent community scenarios needs to accurately recognize pedestrians to address various situations. However, for persons who are occluded or at long distances, existing detectors exhibit problems such as missed detection, detection error, and large models. To address these problems, this paper proposes a pedestrian detection algorithm, Multiscale Efficient-YOLO (ME-YOLO), based on YOLOv8. An efficient feature Extraction Module (EM) is designed to improve network learning and capture pedestrian features, which reduces the number of network parameters and improves detection accuracy. The reconstructed detection head module reintegrates the detection layer to enhance the network's ability to recognize small targets and effectively detect small target pedestrians. A Bidirectional Feature Pyramid Network (BiFPN) is introduced to design a new neck network, namely the Bidirectional Dilated Residual-Feature Pyramid Network (BDR-FPN), and the expanded residual module and weighted attention mechanism expand the receptive field and learn pedestrian features with emphasis, thereby alleviating the problem of network insensitivity to occluded pedestrians. Compared with the original YOLOv8 algorithm, ME-YOLO increases the AP50 by 5.6 percentage points, reduces the number of model parameters by 41%, and compresses the model size by 40% after training and verification based on the CityPersons dataset. ME-YOLO also increases the AP50 by 4.1 percentage points and AP50∶95 by 1.7 percentage points on the TinyPerson dataset. Moreover, the algorithm significantly reduces the number of model parameters and model size and effectively improves detection accuracy. This method has a considerable application value in intelligent community scenarios.

  • AI-Enabled Vehicular Edge Computing
    CUI Mengmeng, SHI Jingyan, XIANG Haolong
    Computer Engineering. 2025, 51(9): 25-37. https://doi.org/10.19678/j.issn.1000-3428.0069836

    To optimize Quality of Service (QoS), Mobile Edge Computing (MEC) has been deeply integrated into the Internet of Vehicle (IoV) to provide geographically proximal computing resources for vehicles, thereby reducing task processing latency and energy consumption. However, traditional MEC server deployment relies primarily on terrestrial Base Stations (BSs), resulting in high deployment costs and limited coverage, making it difficult to ensure uninterrupted services for all vehicles. Air-ground collaborative IoV technology has emerged as a solution to these challenges. Unmanned Aerial Vehicles (UAVs) can dynamically assist Road-Side Units (RSUs) using their flexibility in line-of-sight links, providing more flexible computing resources for vehicular users, thereby ensuring the continuity and efficiency of in-vehicle services. Therefore, this study proposes a Dynamic Vehicular Edge Task Offloading Method (DVETOM) based on air-ground collaboration. This method adopts a vehicle-road-air architecture, establishing Vehicle-to-RSU (V2R) and Vehicle-to-UAV (V2U) links. Transmission and computation models are constructed for three modes: local execution of vehicular tasks, offloading tasks to the RSU, and offloading tasks to the UAV. An objective function is established with the joint optimization goal of minimizing system latency and energy consumption. DVETOM transforms the task offloading problem into a Markov Decision Process (MDP) and optimizes the task offloading strategy by using the Distributed Deep Deterministic Policy Gradient (D4PG) algorithm based on Deep Reinforcement Learning (DRL). Compared with 5 benchmark methods, experimental results show that DVETOM outperforms existing methods by 3.45%—23.7% in terms of reducing system latency and 5.8%—23.47% in terms of reducing system energy consumption while improving QoS for vehicular users. In conclusion, DVETOM enhances the offloading of vehicular edge computing tasks within the IoV effectively. It offers IoV users a more efficient and energy-conserving solution, showcasing its extensive potential for application in intelligent transportation systems.

  • Research Hotspots and Reviews
    ZHANG Jin, CHEN Zhu, CHEN Zhaoyun, SHI Yang, CHEN Guanjun
    Computer Engineering. 2025, 51(7): 1-11. https://doi.org/10.19678/j.issn.1000-3428.0068870

    Simulators play an indispensable role in an array of scientific fields involving research and development. Particularly in architectural design, simulators provide a secure and cost-effective virtual environment, enabling researchers to conduct rapid experimental analyses and evaluations. Simultaneously, simulators facilitate the acceleration of the chip design and verification processes, thereby conserving time and reducing resource expenditure. However, with the evolutionary advances in processor architectural designs—specifically, the flourishing diversifications featured in dedicated processors—the key role played by simulators in providing substantial feedback for architectural design exploration has gained prominence. This discourse provides an overview of the current developments and applications of architectural simulators, accentuating a few illustrative examples. Analyzing the techniques employed by simulators dedicated to various processors allows for a deeper understanding of the focal points and technical complexities under different architectures. Moreover, this discourse deliberates speculative assessments and critiques of vital aspects of future architectural simulator developments, aspiring to forecast their prospects in the field of processor design research.

  • Development Research and Engineering Application
    ZHOU Siyu, XU Huiying, ZHU Xinzhong, HUANG Xiao, SHENG Ke, CAO Yuqi, CHEN Chen
    Computer Engineering. 2025, 51(5): 326-339. https://doi.org/10.19678/j.issn.1000-3428.0069259

    As the main window of human-computer interaction, the mobile phone screen has become an important factor affecting the user experience and the overall performance of the terminal. As a result, there is a growing demand to address defects in mobile phone screens. To meet this demand, in view of the low detection accuracy, high missed detection rate of small target defects, and slow detection speed in the process of defect detection on mobile phone screens, a PGS-YOLO algorithm is proposed, with YOLOv8n as the benchmark model. PGS-YOLO effectively improves the detection ability of small targets by adding a special small target detection head and combining it with the SeaAttention attention module. The backbone and feature fusion networks are integrated into PConv and GhostNetV2 lightweight modules, respectively, to ensure accuracy, reduce the number of model parameters, and improve the speed and efficiency of defect detection. The experimental results show that, in the dataset of mobile phone screen surface defects from Peking University, compared with the results of YOLOv8n, the mAP@0.5 and mAP@0.5∶0.95 of the PGS-YOLO algorithm are increased by 2.5 and 2.2 percentage points, respectively. The algorithm can accurately detect large defects in the process of mobile phone screen defect detection as well as maintain a certain degree of accuracy for small defects. In addition, the detection performance is better than that of most YOLO series algorithms, such as YOLOv5n and YOLOv8s. Simultaneously, the number of parameters is only 2.0×106, which is smaller than that of YOLOv8n, meeting the needs of industrial scenarios for mobile phone screen defect detection.

  • Graphics and Image Processing
    LIU Chunxia, MENG Jixing, PAN Lihu, GONG Dali
    Computer Engineering. 2025, 51(7): 326-338. https://doi.org/10.19678/j.issn.1000-3428.0069510

    A multimodal remote sensing small-target detection method, BFMYOLO, is proposed to address misdetection and omission issues in remote sensing images with complex backgrounds and less effective information. The method utilizes a pixel-level Red-Green-Blue (RGB) and infrared (IR) image fusion module, namely, the Bimodal Fusion Module (BFM), for effectively making full use of the complementarity of different modes to realize the effective fusion of information from two modalities. In addition, a full-scale adaptive updating module, AA, is introduced to resolve multitarget information conflicts during feature fusion. This module incorporates the CARAFE up-sampling operator and shallow features to enhance non-neighboring layer fusion and improve the spatial information of small targets. An Improved task decoupling Detection Head (IDHead) is designed to handle classification and regression tasks separately, thereby reducing the mutual interference between different tasks and enhancing detection performance by fusing deeper semantic features. The proposed method adopts the Normalized Wasserstein Distance (NWD) loss function as the localization regression loss function to mitigate positional bias sensitivity. Results of experiments on the VEDAI, NWPU VHR-10, and DIOR datasets demonstrate the superior performance of the model, with mean Average Precision when the threshold is set to 0.5 (mAP@0.5) of 78.6%, 95.5%, and 73.3%, respectively. The model thus outperforms other advanced models in remote sensing small-target detection.

  • Artificial Intelligence and Pattern Recognition
    DAI Kangjia, XU Huiying, ZHU Xinzhong, LI Xiyu, HUANG Xiao, CHEN Guoqiang, ZHANG Zhixiong
    Computer Engineering. 2025, 51(3): 95-104. https://doi.org/10.19678/j.issn.1000-3428.0068950

    Traditional vision Simultaneous Localization And Mapping(SLAM) systems are based on the assumption of a static environment. However, real scenes often have dynamic objects, which may lead to decreased accuracy, deterioration of robustness, and even tracking loss in SLAM position estimation and map construction. To address these issues, this study proposes a new semantic SLAM system, named YGL-SLAM, based on ORB -SLAM2. The system first uses a lightweight target detection algorithm named YOLOv8n, to track dynamic objects and obtain their semantic information. Subsequently, both point and line features are extracted from the tracking thread, and the dynamic features are culled based on the acquired semantic information using the Z-score and parapolar geometry algorithms to improve the performance of SLAM in dynamic scenes. Given that lightweight target detection algorithms suffer from missed detection in consecutive frames when tracking dynamic objects, this study designs a detection compensation method based on neighboring frames. Testing on the public datasets TUM and Bonn reveals that YGL-SLAM system improves detection performance by over 90% compared to ORB-SLAM2, while demonstrating superior accuracy and robustness compared to other dynamic SLAM.

  • Research Hotspots and Reviews
    PANG Xin, GE Fengpei, LI Yanling
    Computer Engineering. 2025, 51(6): 1-19. https://doi.org/10.19678/j.issn.1000-3428.0069005

    Acoustic Scene Classification (ASC) aims to enable computers to simulate the human auditory system in the task of recognizing various acoustic environments, which is a challenging task in the field of computer audition. With rapid advancements in intelligent audio processing technologies and neural network learning algorithms, a series of new algorithms and technologies for ASC have emerged in recent years. To comprehensively present the technological development trajectory and evolution in this field, this review systematically examines both early work and recent developments in ASC, providing a thorough overview of the field. This review first describes application scenarios and the challenges encountered in ASC and then details the mainstream frameworks in ASC, with a focus on the application of deep learning algorithms in this domain. Subsequently, it systematically summarizes frontier explorations, extension tasks, and publicly available datasets in ASC and finally discusses the prospects for future development trends in ASC.

  • Research Hotspots and Reviews
    LIAO Niuyu, TIAN Yun, LI Yansong, XUE Haifeng, DU Changkun, ZHANG Guohua
    Computer Engineering. 2025, 51(12): 1-17. https://doi.org/10.19678/j.issn.1000-3428.0253230

    In recent years, Large Language Models (LLMs) such as GPT, LLaMA, Qwen, and DeepSeek, have achieved significant breakthroughs in natural language processing, computer vision, multimodal learning, and other fields. However, constrained by factors such as their reasoning mechanisms, parameter scales, and the inherent knowledge contained within their training data, these models often suffer from issues like ″hallucinations″—characterized by inaccurate answers and even factual deviations—when handling complex tasks, addressing questions from professional domains, or generating time-sensitive content. These limitations severely hinder their application in high-reliability scenarios. The ″tool learning″ paradigm is attracting increasing attention as a promising solution to these capability bottlenecks. Its primary objective is to enable LLMs to understand and utilize external tools to complete specific tasks. By invoking external tools, such as databases, search engines, and mathematical tools, LLMs can transcend their parameterized knowledge; enhance their reasoning, decision-making, and execution capabilities; and mitigate hallucination problems. This paper systematically reviews the development context and technical advancements in LLM tool learning, analyzes the expansion of LLM capabilities through tools, summarizes tool invocation mechanisms ranging from in-context learning to fine-tuning training, and discusses key issues including performance optimization and adaptive tool generation. The paper also analyzes evaluation methods for LLM tool invocation, summarizes the current challenges in tool learning, and outlines future research directions.

  • Development Research and Engineering Application
    CHEN Ziyan, WANG Xiaolong, HE Di, AN Guocheng
    Computer Engineering. 2025, 51(5): 314-325. https://doi.org/10.19678/j.issn.1000-3428.0069122

    The current high-precision vehicle detection model faces challenges due to its excessive parameterization and computational demands, making it unsuitable for efficient operation on intelligent transportation devices. Conversely, lightweight vehicle detection models often sacrifice accuracy, rendering them unsuitable for practical tasks. In response, an improved lightweight vehicle detection network based on YOLOv8 is proposed. This enhancement involves substituting the main network with the FasterNet architecture, which reduces the computational and memory access requirements. Additionally, we replace the Bidirectional Feature Pyramid Network (BiFPN) in the neck with a weighted bidirectional feature pyramid network to simplify the feature fusion process. Simultaneously, we introduce a dynamic detection head with a fusion attention mechanism to achieve nonredundant integration of the detection head and attention. Furthermore, we address the deficiencies of the Complete Intersection over Union (CIoU) in terms of detection accuracy and convergence speed by proposing a regression loss algorithm that incorporates the Scale-invariant Intersection over Union (SIoU) combined with the Normalized Gaussian Wasserstein Distance (NWD). Finally, to minimize the computational demands on edge devices, we implement amplitude-based layer-wise adaptive sparsity pruning, which further compresses the model size. Experimental results demonstrate that the proposed improved model, compared with the original YOLOv8s model, achieves a 1.5 percentage points increase in accuracy, a 78.9% reduction in parameter count, a 67.4% decrease in computational demands, and a 77.8% reduction in model size. This demonstrates the outstanding lightweight effectiveness and practical utility of the proposed model.

  • Cyberspace Security
    CAO Bei, ZHAO Kui
    Computer Engineering. 2025, 51(6): 193-203. https://doi.org/10.19678/j.issn.1000-3428.0070158

    The accurate recognition of fake news is an important research topic in the online environment, where distinguishing information explosion and authenticity is difficult. Existing studies mostly use multiple deep learning models to extract multivariate semantic features to capture different levels of semantic information in the text; however, the simple splicing of these features causes information redundancy and noise, limiting detection accuracy and generalization, and effective deep fusion methods are not available. In addition, existing studies tend to ignore the impact of dual sentiments co-constructed by news content and its corresponding comments on revealing news authenticity. This paper proposes a Dual Emotion and Multi-feature Fusion based Fake News Detection (DEMF-FND) model to address these problems. First, the emotional features of news and comments are extracted by emotion analysis. The emotional difference features reflecting the correlation between the two are introduced using similarity computation, and a dual emotion feature set is constructed. Subsequently, a fusion mechanism based on multihead attention is used to deeply fuse the global and local semantic features of the news text captured by a Bidirectional Long Short-Term Memory (BiLSTM) network with a designed Integrated Static-Dynamic Embedded Convolutional Neural Network (ISDE-CNN). Eventually, the dual emotion feature set is concatenated with the semantic features obtained by deep fusion and fed into a classification layer consisting of a fully connected layer, to determine news authenticity. Experimental results show that the proposed method outperforms the baseline method in terms of benchmark metrics on three real datasets, namely Weibo20, Twitter15, and Twitter16, and achieves 2.5, 2.3, and 5.5 percentage points improvements in accuracy, respectively, highlighting the importance of dual emotion and the deep fusion of semantic features in enhancing the performance of fake news detection.

  • Development Research and Engineering Application
    ZHU Yazhou, DU Pingchuan, CHAI Zhilei
    Computer Engineering. 2025, 51(12): 337-345. https://doi.org/10.19678/j.issn.1000-3428.0069437

    As a mainstream tool for container orchestration, Kubernetes can support automatic deployment, service discovery, and load balancing. It is known for its high availability and performance. However, scheduling strategies such as the best adaptation algorithm or the minimum negative cut method ignore the heterogeneity and energy differences of nodes. In addition, Kubernetes tools only consider CPU and memory resources and set a unified weight mechanism in advance, which can easily lead to problems such as load imbalance, performance degradation, and the inability to satisfy refined scheduling. To address these existing problems, this study proposes a heterogeneous task scheduling algorithm based on multi-dimensional resources, namely A-KCSS. The A-KCSS algorithm is based on the heterogeneous computing resources of a cluster. It adds disk Input/Output (I/O), network I/O load, and GPU resources as evaluation indicators for filtering and screening, and it more comprehensively considers the heterogeneity of nodes. This study also introduces a weight calculation model based on multi-dimensional resource factors. Based on the resource requirements of the task to be scheduled, the weight value of each dimension of the resource factor of the task to be scheduled is calculated, and the score of each node is calculated based on the real-time resource utilization of the cluster. Nodes are prioritized according to the score, and the node with the highest priority is selected for scheduling. The performance of the A-KCSS algorithm is experimentally verified on the Kubernetes cluster. Compared with the default scheduling algorithm and the KCSS algorithm, the average response time is reduced by 10% and 4%, the throughput is increased by 30% and 15%, the availability is improved by 40% and 30%, and the load balancing performance is increased by 23% and 18%, respectively, thereby improving the overall cluster performance.

  • Artificial Intelligence and Pattern Recognition
    SONG Jie, XU Huiying, ZHU Xinzhong, HUANG Xiao, CHEN Chen, WANG Zeyu
    Computer Engineering. 2025, 51(7): 127-139. https://doi.org/10.19678/j.issn.1000-3428.0069257

    Existing object detection algorithms suffer from low detection accuracy and poor real-time performance when detecting fall events in indoor scenes, owing to changes in angle and light. In response to this challenge, this study proposes an improved fall detection algorithm based on YOLOv8, called OEF-YOLO. The C2f module in YOLOv8 is improved by using a Omni-dimensional Dynamic Convolution (ODConv) module, optimizing the four dimensions of the kernel space to enhance feature extraction capabilities and effectively reduce computational burden. Simultaneously, to capture finer grained features, the Efficient Multi-scale Attention (EMA) module is introduced into the neck network to further aggregate pixel-level features and improve the network's processing ability in fall scenes. Integrating the Focal Loss idea into the Complete Intersection over Union (CIoU) loss function allows the model to pay more attention to difficult-to-classify samples and optimize overall model performance. Experimental results show that compared to YOLOv8n, OEF-YOLO achieves improvements of 1.5 and 1.4 percentage points in terms of mAP@0.5 and mAP@0.5∶0.95, the parameters and computational complexity are 3.1×106 and 6.5 GFLOPs. Frames Per Second (FPS) increases by 44 on a Graphic Processing Unit (GPU), achieving high-precision detection of fall events while also meeting deployment requirements in low computing scenarios.

  • Research Hotspots and Reviews
    TIAN Qing, WANG Bin, ZHOU Zixiao
    Computer Engineering. 2025, 51(7): 12-30. https://doi.org/10.19678/j.issn.1000-3428.0069698

    The primary task of person Re-IDentification (ReID) is to identify and track a specific pedestrian across multiple non-overlapping cameras. With the development of deep neural networks and owing to the increasing demand for intelligent video surveillance, ReID has gradually attracted research attention. Most existing ReID methods primarily adopt labeled data for supervised training; however, the high annotation cost makes the scaling supervised ReID to large unlabeled datasets challenging. The paradigm of unsupervised ReID can significantly alleviate such issues. This can improve its applicability to real-life scenarios, enhancing its research potential. Although several ReID surveys have been published, they have primarily focused on supervised methods and their applications. This survey systematically reviews, analyzes, and summarizes existing ReID studies to provide a reference for researchers in this field. First, the ReID methods are comprehensively reviewed in an unsupervised setting. Based on the availability of source domain labels, the unsupervised ReID methods are categorized into unsupervised domain adaptation methods and fully unsupervised methods. Additionally, their merits and drawbacks are discussed. Subsequently, the benchmark datasets widely evaluated in ReID research are summarized, and the performance of different ReID methods on these datasets is compared. Finally, the current challenges in this field are discussed and potential future directions are proposed.

  • Development Research and Engineering Application
    XU Degang, WANG Shuangchen, YIN Kedong, WANG Zaiqing
    Computer Engineering. 2025, 51(11): 377-391. https://doi.org/10.19678/j.issn.1000-3428.0069125

    To solve the problems of poor detection effect, high misdetection and omission rate, and weak generalization ability of urban vehicle target detection algorithms, this study proposes an improved YOLOv8 urban vehicle target detection algorithm. First, an Efficient Multi-scale Attention (EMA) mechanism is incorporated into the tail of the backbone network, which helps the model better capture the detailed features of a target vehicle. Combined with a 160×160 pixel small-target detection layer, it enhances the detection capability of small targets and aggregates pixel-level features through dimensional interaction to enhance the mining capability of the target vehicle. Second, the study designs a new Multi-scale Lightweight Convolution (MLConv) module for the lightweight network, and the C2f module is reconstructed based on MLConv, which significantly improves the feature extraction capability of the model. Finally, to suppress the harmful gradients generated by low-quality images, the study uses the Wise-Intersection over Union (WIoU) loss function instead of the Complete Intersection over Union (CIoU) to optimize the network's bounding box loss and improve the model's convergence speed and regression accuracy. On the Streets vehicle dataset, the algorithm improves mAP@0.5, mAP@0.5∶0.95, and recall by 1.9, 1.4 and 2.4 percentage points respectively, compared with the YOLOv8n benchmark model. In validations on a domestic vehicle dataset and the VisDrone2019 small target dataset, these performance indexes improve to different degrees, proving that the improved algorithm has good generalization and robustness. Compared with other mainstream algorithms, the improved algorithm exhibits higher accuracy and detection rate, indicating that the algorithm performs better in urban vehicle target detection.

  • Graphics and Image Processing
    SHA Yuyang, LU Jingtao, DU Haofan, ZHAI Xiaobing, MENG Weiyu, LIAN Xu, LUO Gang, LI Kefeng
    Computer Engineering. 2025, 51(7): 314-325. https://doi.org/10.19678/j.issn.1000-3428.0068674

    Image segmentation is a crucial technology for environmental perception, and it is widely used in various scenarios such as autonomous driving and virtual reality. With the rapid development of technology, computer vision-based blind guiding systems are attracting increasing attention as they outperform traditional solutions in terms of accuracy and stability. The semantic segmentation of road images is an essential feature of a visual guiding system. By analyzing the output of algorithms, the guiding system can understand the current environment and aid blind people in safe navigation, which helps them avoid obstacles, move efficiently, and get the optimal moving path. Visual blind guiding systems are often used in complex environments, which require high running efficiency and segmentation accuracy. However, commonly used high-precision semantic segmentation algorithms are unsuitable for use in blind guiding systems owing to their low running speed and a large number of model parameters. To solve this problem, this paper proposes a lightweight road image segmentation algorithm based on multiscale features. Unlike existing methods, the proposed model contains two feature extraction branches, namely, the Detail Branch and Semantic Branch. The Detail Branch extracts low-level detail information from the image, while the Semantic Branch extracts high-level semantic information. Multiscale features from the two branches are processed and used by the designed feature mapping module, which can further improve the feature modeling performance. Subsequently, a simple and efficient feature fusion module is designed for the fusion of features with different scales to enhance the ability of the model in terms of encoding contextual information by fusing multiscale features. A large amount of road segmentation data suitable for blind guiding scenarios are collected and labeled, and a corresponding dataset is generated. The model is trained and tested on the dataset. The experimental results show that the mean Intersection over Union (mIoU) of the proposed method is 96.5%, which is better than that of existing image segmentation models. The proposed model can achieve a running speed of 201 frames per second on NVIDIA GTX 3090Ti, which is higher than that of existing lightweight image segmentation models. The model can be deployed on NVIDIA AGX Xavier to obtain a running speed of 53 frames per second, which can meet the requirements for practical applications.

  • AI-Enabled Vehicular Edge Computing
    QIN Minhao, SUN Weiwei
    Computer Engineering. 2025, 51(9): 1-13. https://doi.org/10.19678/j.issn.1000-3428.0069416

    Traffic signal control plays an important role in alleviating traffic congestion and improving urban commuting efficiency. In recent years, breakthroughs have been made in traffic signal control algorithms based on deep reinforcement learning using real-time traffic data as input. However, traffic data in real-world scenarios often involve data distortion. Traditional solutions use reinforcement learning algorithms to control signal lights after repairing distorted data. However, on the one hand, the dynamic phases of traffic signal introduces additional uncertainty to distortion repair, and on the other hand, distortion repair is difficult to combine with deep reinforcement learning frameworks to improve performance. To address these issues, a distorted traffic signal control model based on hidden state prediction, HCRL, is proposed. The HCRL model comprises encoding, control, and encoding prediction sub-models. By introducing a hidden state representation mechanism for signalized intersections, the HCRL model can adapt better to deep reinforcement learning frameworks and effectively express the control state of signalized intersections. In addition, the HCRL model uses a special transfer training method to avoid data distortion interference in the control sub-model. Two real datasets are used to verify the impact of data distortion on the intelligent signal light control algorithms. The experimental results show that the HCRL model outperforms the distortion-completion-based traffic signal control models in all distortion scenarios and distortion rates; further, it demonstrates strong robustness against data distortion when compared with other baseline models.

  • Computer Architecture and Software Technology
    MENG Fanfeng, WANG Zicong, ZHANG Jintao, WANG Yanjing, OU Yang, WU Lizhou, XIAO Nong
    Computer Engineering. 2025, 51(3): 180-188. https://doi.org/10.19678/j.issn.1000-3428.0068707

    With the advent of the era of big data, the demand for large-scale data storage and high-performance computing in data center applications is rapidly increasing. This growing need has made the access cost of massive data a significant bottleneck affecting application performance. The emergence of the Compute Express Link (CXL) interconnection protocol offers a promising solution to this challenge. This study introduces a design for a CXL extended memory pool. At the hardware level, a CXL extended memory pool system using the CXL extended memory protocol is implemented in gem5. When device memory is exposed to the CPU address space, the CPU can directly access this memory using standard load/store instructions. At the operating system level, the study develops a CXL device driver, which provides a comprehensive software stack for managing and accessing the device. In addition, utilizing the memkind library in user mode, the study integrates host and device memory to deliver a unified memory view to applications. The study builds a complete prototype of the CXL extended memory pool system based on the full system mode of gem5 and conducts a thorough evaluation of its performance. The study also compares the latency and bandwidth of host local Dynamic Random Access Memory (DRAM) and Host-managed Device Memory (HDM) using the membench and STREAM benchmarks. Experimental results show that the latency of HDM is approximately 1.5 times that of DRAM, whereas under various application scenarios, the bandwidth of HDM ranges from 50% to 63% of that of DRAM. Simultaneously, this study runs the key-value storage engine known as Viper on both DRAM and HDM and finds that in scenarios with constrained DRAM capacity, the use of extended HDM can significantly enhance system performance by a factor of 2 to 7 times.

  • Artificial Intelligence and Pattern Recognition
    YUAN Yinghua, JIN Yingran, GAO Yun
    Computer Engineering. 2025, 51(12): 96-108. https://doi.org/10.19678/j.issn.1000-3428.0069871

    The Siamese tracking network is a popular target tracking framework that includes three modules: backbone, fusion, and positioning networks. The Transformer is a relatively new and effective implementation method for fusion network modules. The encoder and decoder of the Transformer use a self-attention mechanism to enhance the features of the Convolutional Neural Network (CNN). However, the self-attention mechanism can only enhance features in the spatial dimension without considering feature enhancement in the channel dimension. To enable the self-attention network of the Transformer to enhance features both in the spatial and channel dimensions and provide accurate correlation information for the target localization network, a Transformer tracker based on dual-dimensional feature enhancement is proposed to improve the Transformer fusion network. First, using the third- and fourth-stage features of the backbone network as inputs, channel dimension feature enhancement is performed via CAE-Net in the self-attention module of the Transformer encoder and decoder to enhance the importance of the channel. Subsequently, two-stage feature-weighted fusion and linear transformation are performed via SAE-Net to obtain the self-attention factors Q, K, and V. Finally, spatial dimension feature enhancement is performed via a self-attention operation. Experiments conducted on five widely used public benchmark datasets reveal that the improved Transformer feature fusion module can improve the tracking performance of the tracker with minimal reduction in speed of tracking.

  • Graphics and Image Processing
    HU Qian, PI Jianyong, HU Weichao, HUANG Kun, WANG Juanmin
    Computer Engineering. 2025, 51(3): 216-228. https://doi.org/10.19678/j.issn.1000-3428.0068753

    Considering the problem of low accuracy in existing pedestrian detection methods for dense or small target pedestrians, this study proposes a comprehensive improved algorithm model called YOLOv5_Conv-SPD_DAFPN based on You Only Look Once (YOLO) v5, a non-strided Convolution Space-to-Depth (Conv-SPD), and Double Asymptotic Feature Pyramid Network (DAFPN). First, to address the issue of feature information loss for small targets or dense pedestrians, a Conv-SPD network module is introduced into the backbone network, to replace the original skip convolution, thereby effectively mitigating the problem of feature information loss. Second, to solve the problem of low feature fusion rates caused by nonadjacent feature maps not directly merging, this study proposes DAFPN to significantly improve the accuracy and precision of pedestrian detection. Finally, based on Efficient Intersection over Union (EIoU) and Complete-IoU (CIoU) losses, this study introduces the EfficiCIoU_Loss location loss function to adjust and accelerate the frame regression rate, thereby promoting faster convergence of the network model. The algorithm model improved mAP@0.5 and mAP@0.5∶0.95 by 3.9, 5.3 and 2.1, 2.1 percentage points, respectively, compared to the original YOLOv5 model on the CrowdHuman and WiderPerson pedestrian datasets. After introducing EfficiCIoU_Loss, the model convergence speed improved by 11% and 33%, respectively. These innovative improvements have led to significant progress in dense pedestrian detection based on YOLOv5 in terms of feature information retention, multiscale fusion, and loss function optimization, thereby enhancing performance and efficiency in practical applications.

  • Artificial Intelligence and Pattern Recognition
    SUN Ziwen, QIAN Lizhi, YUAN Guanglin, YANG Chuandong, LING Chong
    Computer Engineering. 2025, 51(4): 158-168. https://doi.org/10.19678/j.issn.1000-3428.0068892

    Transformer-based object tracking methods are widely used in the field of computer vision and have achieved excellent results. However, object transformations, object occlusion, illumination changes, and rapid object motion can change object information during actual tracking tasks, and consequently, the underutilization of object template change information in existing methods prevents the tracking performance from improving. To solve this problem, this paper presents a Transformer object tracking method, TransTRDT, based on real-time dynamic template update. A dynamic template updating branch is attached to reflect the latest appearance and motion state of an object. The branch determines whether the template is updated through the template quality scoring header; when it identifies the possibility of an update, it passes the initial template, the dynamic template of the previous frame, and the latest prediction after cropping into the dynamic template updating network to update the dynamic template. As a result, the object can be tracked more accurately by obtaining a more reliable template. The tracking performance of TransTRDT on GOT-10k, LsSOT, and TrackingNet is superior to algorithms such as SwinTrack and StarK. It outperforms to achieve a tracking success rate of 71.9% on the OTB100 dataset, with a tracking speed of 36.82 frames per second, reaching the current leading level in the industry.

  • Artificial Intelligence and Pattern Recognition
    ZHAI Zhipeng, CAO Yang, SHEN Qinqin, SHI Quan
    Computer Engineering. 2025, 51(9): 139-148. https://doi.org/10.19678/j.issn.1000-3428.0069439

    Accurate traffic flow prediction is a key prerequisite for realizing intelligent transportation systems, and is of great significance for strengthening system simulation and control and improving the decision-making of managers. To address the problem of most existing Graph Convolutional Network (GCN) models ignoring the dynamic spatial and temporal variations in traffic data and insufficiently employing node information, which leads to insufficient extraction of spatial and temporal correlations, a traffic flow prediction model based on multiple spatio-temporal graph fusion and dynamic attention is proposed. First, the temporal characteristics of traffic flow data in multi-temporal states are extracted by different convolutional cells. The next step involves constructing a multiple spatio-temporal graph to capture the dynamic trend and heterogeneity of nodes in spatial distribution, followed by extracting spatial characteristics through the integration of GCN. Finally, the spatial and temporal characteristics are analyzed and fused using the multi-head self-attention mechanism to output prediction results. Experimental analyses are performed on two public datasets, PeMS04 and PeMS08, and compared with the Attention Based Spatial-Temporal Graph Convolutional Network (ASTGCN), Multiview Spatial-Temporal Transformer Network (MVSTT), Dynamic Spatial-Temporal Aware Graph Neural Network (DSTAGNN) and other benchmark models that utilize spatio-temporal graph convolution. The results show that the Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Root Mean Square Error (RMSE) of the proposed model are reduced by 7.10%, 7.22%, and 6.47%, respectively, demonstrating the proposed model′s strong adaptability and robustness.

  • Artificial Intelligence and Pattern Recognition
    CAI Ruichu, XU Zunhong, CHEN Daoxin, YANG Zhenhui, LI Zijian, HAO Zhifeng
    Computer Engineering. 2025, 51(3): 105-112. https://doi.org/10.19678/j.issn.1000-3428.0068937

    In the field of quantum chemistry, molecular property prediction is a fundamental and critical task, which is widely used in many fields such as drug discovery and chemical synthesis prediction. With the development of artificial intelligence, deep learning methods have been widely used in this field. However, current methods often adopt two extreme levels of abstraction, namely micro- and macro-views, to model molecular properties, posing challenges in generalizing to out-of-distribution samples. The mesoscopic view of chemistry provides a beneficial intermediate level for describing molecular properties through mesoscopic components containing functional groups associated with these properties. By considering these mesoscopic components and modeling them from a causal perspective, more attention can be paid to the functional groups related to these properties. To achieve this goal, this study proposes a Mesoscopic Component Identification(MCI) model. This model is based on a mesoscopic causal generative process that uses molecular data and a framework of variational autoencoders. The proposed model predicts molecular properties by learning the representation of mesoscopic components related to molecular properties. Initially, the model assumes that the atomic latent variables and semantic latent substructure follow Gaussian and multivariate Bernoulli distributions, respectively. Molecular data are then input into a neural network to identify the atomic latent variables and semantic latent substructure. Next, the identified atomic latent variables and semantic latent substructures are used to predict molecular properties. To identify the substructures of the atomic and semantic latent variables, variational lower bounds and sparse terms are used to construct the loss function of the model. Experiments demonstrate that our model not only achieves state-of-the-art performance but also offers in-depth explanations that provide a more comprehensive understanding of model predictions and improve the accuracy and generalization ability of molecular property predictions.

  • Computer Architecture and Software Technology
    ZHANG Ming, GUO Wenkang, WANG Haifeng
    Computer Engineering. 2025, 51(3): 197-207. https://doi.org/10.19678/j.issn.1000-3428.0068477

    Graphics Processing Unit (GPU) is not fully utilized when processing large-scale dynamic graphs, and the limitations of GPU-oriented graph partitioning methods lead to performance bottlenecks. To improve the performance of graph computing, a Central Processing Unit (CPU)/GPU Distributed Heterogeneous Engine (DH-Engine) is proposed to improve the performance of heterogeneous processors. First, a new heterogeneous graph partitioning algorithm is proposed. It uses a streaming algorithm for graph partitioning as the core to achieve dynamic load balancing between the computing nodes and between the CPU and GPU. The greedy strategy assigns vertices based on the maximum number of neighboring vertices during the initial graph partitioning and dynamically adjusts the vertex position based on the minimum number of connected edges during the iteration. Second, the system introduces a GPU heterogeneous computing model to improve graph computing efficiency through functional parallelism. The experiment used PageRank, Connected Components(CC), Single-Source Shortest Path(SSSP), and k-core as examples to conduct comparative experiments with other graph computing systems. Compared with other graph engines, DH-Engine can better balance the computing load of each node and the load between heterogeneous processors to shorten the delay and accelerate the overall computing speed. The results show that the CPU/GPU synergy of this system tends to 1, and the heterogeneous computing has speedup ratio of 5 times compared to other graph computing systems. DH-Engine provides an improved heterogeneous graph scheme.

  • Mobile Internet and Communication Technology
    WANG Huahua, HUANG Yexia, LI Ling, WANG Jiacheng
    Computer Engineering. 2025, 51(12): 255-267. https://doi.org/10.19678/j.issn.1000-3428.0069877

    When implementing Federated Learning (FL) in a cell-free network environment, user scheduling and resource allocation strategies are crucial for optimizing system time overhead, improving user reachability, and accelerating FL convergence rate. To address the issue of uneven resource allocation, this study designs an optimization scheme that combines user scheduling, CPU processing frequency, and power allocation. This scheme aims to achieve fair resource allocation by maximizing the minimum user rate in the system, thus enhancing FL performance. The joint optimization problem is decomposed into two subproblems: user scheduling and power allocation. For user scheduling, this study proposes a greedy scheduling algorithm based on k-means clustering to comprehensively evaluate channel conditions and data "value" of users and categorize users into different groups. Subsequently, for the resource occupation situation, a personalized CPU processing frequency allocation plan is developed for users within each group based on their resource occupancy. Finally, by independently executing user scheduling within each group, user selection is performed efficiently and precisely, and the complexity of user selection is effectively reduced via early grouping. For power allocation, this study introduces a Bisection Method-based Power Allocation (BM-PA) algorithm. This algorithm not only considers fairness among users but also prioritizes resource-constrained users to ensure that they can obtain superior resource allocation. The BM-PA algorithm achieves fast convergence of power allocation using a low-complexity iterative optimization process, significantly improving the resource utilization efficiency without deteriorating the system performance. In this study, a reasonable user scheduling strategy serves as the foundation for obtaining optimal solutions for the power allocation subproblem. This study adopts an alternating iteration method that allows independent optimization in each subproblem while considering the solution of the other subproblem. Via multiple rounds of iterative optimization, this interdependent relationship ensures that power resources are reasonably allocated to users who need them the most or are most likely to effectively utilize them, thus enhancing the overall system performance. This study realizes joint optimization solutions that significantly improve overall system performance. Simulation results show that compared with the baseline algorithm, the proposed algorithm exhibits outstanding performance in terms of downlink achievable rates-the average improvement reaches up to 103.34% under optimal conditions. Additionally, the uplink achievable rates improve by up to 102.78%. Furthermore, the proposed algorithm can save 67.44% of the FL task training time on average compared to the baseline algorithm, particularly when the FL learning model accuracy reaches 90%, wherein the time overhead of the proposed algorithm is minimal.

  • Research Hotspots and Reviews
    YUAN Yajian, MAO Li
    Computer Engineering. 2025, 51(3): 54-63. https://doi.org/10.19678/j.issn.1000-3428.0069042

    Traffic sign detection is crucial for assisted driving and plays a vital role in ensuring driving safety. However, in real-world traffic environments, factors such as darkness and rain create background noise that complicates the detection process. In addition, existing models often struggle to effectively detect small traffic signs from a distance. Furthermore, when a traffic sign detection model is designed, the model size must be considered for practical deployment. To address these challenges, this study proposes a lightweight traffic sign detection model based on YOLOv8 with enhanced foregrounds. First, a lightweight PC2f module is designed to replace a part of the C2f module in the original Backbone. This modification reduces the number of parameters and computational load, enriches the gradient flow, retains more shallow information, and ultimately enhances detection performance while maintaining a lightweight design. Next, the study designs a Foreground Enhancement Module (FEM) and incorporates it into the Neck position to effectively amplify the foreground information and reduce background noise. Finally, the study adds a small-target detection layer to extract shallow features from high-resolution images, thereby improving the ability of the model to detect small-target traffic signs. Experimental results show that the optimized model achieves a mAP50 of 82.5% and 95.3% on the CCTSDB 2021 and GTSDB datasets, which is an improvement of 3.6 and 1 percentage points over the original model, respectively, with a reduction in model weight size by 0.22×106. These results confirm the effectiveness of the proposed model for practical applications.

  • Advanced Computing and Data Processing
    SUN Wenqian, XU Tianchen, YU Peihou, CHEN Yunfang, ZHANG Wei
    Computer Engineering. 2025, 51(12): 189-201. https://doi.org/10.19678/j.issn.1000-3428.0069804

    Data privacy protection has become the focus of social attention, and countries and regions are gradually formulating relevant laws and regulations in this regard. However, because of the long and professional privacy policies released by App products, the use of automated methods to detect compliance with privacy policies has become an urgent technical challenge. Machine learning models, the widely popular solutions for this challenge, require labeled annotated datasets for support; however, a lack of such App privacy policy datasets currently exists in China. Based on the EU General Data Protection Regulation (GDPR) compliance analysis, a labeling scheme suitable for China′s Personal Information Protection Law is designed, which includes 15 required labels. Subsequently, Chinese privacy policies for 363 Apps in 10 categories are obtained using Web crawlers, and these privacy policies are classified and annotated at the sentence level. A Chinese privacy policy corpus consisting of 104 134 privacy policy statements and labels is constructed. The corpus is trained and tested using the latest open-source pretraining language model from Baidu, ERNIE, with a detection accuracy of 85.75%.

  • Space-Air-Ground Integrated Computing Power Networks
    WANG Kewen, ZHANG Weiting, SUN Tong
    Computer Engineering. 2025, 51(5): 52-61. https://doi.org/10.19678/j.issn.1000-3428.0069471

    In response to the increasing demand for fast response and large-scale coverage in application scenarios such as satellite data processing and vehicle remote control, this study focuses on utilizing hierarchical control and artificial intelligence technology to design a resource scheduling mechanism for space-air-ground integrated computing power networks. Air, space, and ground networks are divided into three domains, and domain controllers are deployed for resource management in the corresponding local domain. The areas are divided based on the coverage of satellites and drones to ensure that they can achieve effective service guarantees, efficient data transmission, and task processing. A multi-agent reinforcement learning-based scheduling algorithm is proposed to optimize resource utilization in space-air-ground integrated computing power networks, considering each domain controller is treated as an agent with task scheduling and resource allocation capabilities. Intelligent resource scheduling and efficient resource allocation for computing tasks are realized through collaborative learning and distributed decision-making with satisfactory delay and energy consumption constraints. Computing tasks are generated in different scenarios and processed in real time. Simulation results show that the proposed mechanism can effectively improve resource utilization and shorten task response time.

  • Research Hotspots and Reviews
    LIU Yanghong, FU Yangyouran, DONG Xingping
    Computer Engineering. 2025, 51(10): 18-26. https://doi.org/10.19678/j.issn.1000-3428.0070569

    The generation of High-Definition (HD) environmental semantic maps is indispensable for environmental perception and decision making in autonomous driving systems. To address the modality discrepancy between cameras and LiDARs in perception tasks, this paper proposes an innovative multimodal fusion framework, HDMapFusion, which significantly improves semantic map generation accuracy via feature-level fusion. Unlike traditional methods that directly fuse raw sensor data, our approach innovatively transforms both camera images and LiDAR point cloud features into a unified Bird's-Eye-View (BEV) representation, enabling physically interpretable fusion of multimodal information within a consistent geometric coordinate system. Specifically, this method first extracts visual features from camera images and 3D structural features from LiDAR point clouds using deep learning networks. Subsequently, a differentiable perspective transformation module converts the front-view image features into a BEV space and the LiDAR point clouds are projected into the same BEV space through voxelization. Building on this, an attention-based feature fusion module is designed to adaptively integrate the two modalities using weighted aggregation. Finally, a semantic decoder generates high-precision semantic maps containing lane lines, pedestrian crossings, road boundary lines, and other key elements. Systematic experiments conducted on the nuScenes benchmark dataset demonstrate that HDMapFusion significantly outperforms existing baseline methods in terms of HD map generation accuracy. These results validate the effectiveness and superiority of the proposed method, offering a novel solution to multimodal fusion in autonomous driving perception.

  • Research Hotspots and Reviews
    JIANG Qiqi, ZHANG Liang, PENG Lingqi, KAN Haibin
    Computer Engineering. 2025, 51(3): 24-33. https://doi.org/10.19678/j.issn.1000-3428.0069378

    With the advent of the big data era, the proliferation of information types has increased the requirements for controlled data sharing. Decentralized Attribute-Based Encryption (DABE) has been widely studied in this context to enable fine-grained access control among multiple participants. However, the Internet of Things (IoT) data sharing scenario has become mainstream and requires more data features, such as cross-domain access, transparency, trustworthiness, and controllability, whereas traditional Attribute-Based Encryption (ABE) schemes pose a computational burden on resource-constrained IoT devices. To solve these problems, this study proposes an accountable and verifiable outsourced hierarchical attribute-based encryption scheme based on blockchain to support cross-domain data access and improve the transparency and trustworthiness of data sharing using blockchain. By introducing the concept of Verifiable Credential (VC), this scheme addresses the issue of user identity authentication and distributes the burden of complex encryption and decryption processes to outsourced computing nodes. Finally, using a hierarchical structure, fine-grained data access control is achieved. A security analysis has demonstrated that the proposed scheme can withstand chosen-plaintext attacks. Simulation results on small IoT devices with limited resources using Docker have shown that the proposed scheme has a lower computational overhead than existing schemes. For up to 30 attributes, the computation costs have not exceeded 2.5 s for any of the algorithms, and the average cost is approximately 1 s, making the scheme suitable for resource-constrained IoT devices.

  • Artificial Intelligence and Pattern Recognition
    LI Shuyi, YANG Bo, CHEN Ling, SHEN Ling, TANG Wensheng
    Computer Engineering. 2025, 51(3): 86-94. https://doi.org/10.19678/j.issn.1000-3428.0068626

    Existing surface coverage methods are difficult to adapt to surface changes, and their coverage efficiency in robot cleaning operations is low. This paper proposes a surface coverage method based on Proximal Policy Optimization (PPO), namely SC-SRPPO, with an adaptive reward function. First, the target surface is discretized and the covariance matrix is obtained via spherical query to solve the normal vector of the point cloud, which is then used to establish the 3D surface model. Second, a state model is constructed using the coverage state and curvature change features of the surface local point cloud as the observation value of the surface model, which guides the robot to fit the surface during movement and improves the adaptability of the robot to the surface. Subsequently, based on the global coverage of the surface and the time-related exponential model, an adaptive reward function is constructed to guide the robot to move to the uncovered area as soon as possible and improve coverage efficiency. Finally, the local state model and reward function of the surface are combined with the PPO algorithm to train the robot to complete surface coverage path planning. The average coverage rate on the sphere of SC-SRPPO was 90.72% for the hyperboloid and heart models. Comparing the NSGA Ⅱ, PPO, and SAC, the coverage rate increased by 4.98%, 14.56%, and 27.11%, respectively, while the coverage completion time was reduced by 15.20%, 67.18%, and 62.64%, respectively. The results show that SC-SRPPO can make the robot complete the surface-covering task more efficiently than NSGA Ⅱ and SAC by adapting to surface changes.

  • Computer Vision and Image Processing
    ZHANG Xinjia, WANG Fang
    Computer Engineering. 2026, 52(2): 148-157. https://doi.org/10.19678/j.issn.1000-3428.0069729

    Object detection in Unmanned Aerial Vehicle (UAV) aerial photography images is prone to incorrect or missed detections when the target is small, obstructed, or characterized by dense scales. To address the above challenges, this paper proposes the SNA-YOLOv5s algorithm for small target detection, which is based on YOLOv5s. First, the strided convolution layer in the original model is replaced with the Spatial Depth Transformation Convolution (SPD-Conv) module, eliminating the problem of detail loss caused by strided convolution operations and enhancing the model's ability to extract features from small objects. Second, a novel Average Pyramid Pooling-Fast (AGSPPF) module is designed, and an average pooling operation layer is introduced to address the issue of information loss that occurs while extracting feature information, thereby improving the model's feature extraction capability. Third, a new large-scale detection branch specifically for small targets is added to capture rich details in shallow features and enhance the detection capability for small targets. Finally, the Normalized Attention Mechanism (NAM) is embedded in the backbone network, where feature information is weighted to suppress invalid feature information. The proposed algorithm is trained and tested on the VisDrone2019 and NWPU VHR-10 datasets, on which it achieves mean Average Precision (mAP) of 42.3% and 96.5%, respectively, which is 8.4 and 2.6 percentage points higher than that of the baseline YOLOv5s model. The robustness and accuracy of the proposed model are validated by comparisons with other mainstream deep learning models.

  • AI-Enabled Vehicular Edge Computing
    ZHU Siyuan, LI Jiasheng, ZOU Danping, HE Di, YU Wenxian
    Computer Engineering. 2025, 51(9): 14-24. https://doi.org/10.19678/j.issn.1000-3428.0069534

    Detecting defects on unstructured roads is important for road traffic safety; however, annotated datasets required for detection is limited. This study proposes the Multi-Augmentation with Memory (MAM) semi-supervised object detection algorithm to address the lack of annotated datasets for unstructured roads and the inability of existing models to learn from unlabeled data. First, a cache mechanism is introduced to store the positions of the bounding box regression information for unannotated images and images with pseudo annotations, avoiding computational resource wastage caused by subsequent matching. Second, the study proposes a hybrid data augmentation strategy that mixes the cached pseudo-labeled images with unlabeled images inputted into the student model, to enhance the model′s generalizability to new data and balance the scale distribution of images. The MAM semi-supervised object detection algorithm is not limited by the object detection model and better maintains the consistency of object bounding boxes, thus avoiding the need to compute consistency loss. Experimental results show that the MAM algorithm is superior to other fully supervised and semi-supervised learning algorithms. On a self-built unstructured road defect dataset, called Defect, the MAM algorithm achieves improvements of 6.8, 11.1, and 6.0 percentage points in terms of mean Average Precision (mAP) compared to those of the Soft Teacher algorithm in scenarios with annotation ratios of 10%, 20%, and 30%, respectively. On a self-built unstructured road pothole dataset, called Pothole, the MAM algorithm achieves mAP improvements of 5.8 and 4.3 percentage points compared to those of the Soft Teacher algorithm in scenarios with annotation ratios of 15% and 30%, respectively.

  • Graphics and Image Processing
    HAO Hongda, LUO Jianxu
    Computer Engineering. 2025, 51(8): 270-280. https://doi.org/10.19678/j.issn.1000-3428.0069269

    Deep learning has been widely applied to medical imaging. A medical image segmentation model based on an attention mechanism is one of the main methods used in current research. For the multi-organ segmentation task, most existing 2D segmentation models mainly focus on the overall segmentation effect of slices, while ignoring the loss or under-segmentation of small object feature information in slices, which limits the model′s segmentation performance. To solve this problem, this study proposes a multi-organ semantic segmentation model, DASC-Net, based on multi-scale feature fusion and an improved attention mechanism. The overall framework of the DASC-Net is based on an encoder-decoder architecture. The encoder uses the ResNet 50 network and sets a skip connection with the decoder. The attention mechanism is realized using the parallel structure of a Dual Attention Module (DAM) and a Small Object Capture (SOC) module to perform multi-scale regional feature fusion. DASC-Net not only perceives the feature information of larger objects but also retains the feature information of small objects through attention weight reconstruction, which effectively addresses the limitations of the attention module and further improves the segmentation performance of the model. The experimental results on the CHAOS dataset show that DASC-Net can obtain 83.72%, 75.79%, 87.75%, 85.63% and 77.60% on the Sensitivity, Jaccard similarity coefficient, Positivity Predictive Value (PPV), Dice similarity coefficient, and mean Intersection over Union (mIoU) indicators, respectively; the Dice similarity coefficient and 95% Hausdorff Distance (HD95) values on the Synapse dataset are 82.44% and 21.25 mm, respectively. DASC-Net performs better than the other segmentation networks on both datasets, which demonstrates its reliable and accurate segmentation performance.