Computer Engineering

Select

Research Hotspots and Reviews

Review of Text Similarity Calculation Methods

WEI Wei, DING Xiangxiang, GUO Mengxing, YANG Zhao, LIU Hui

Computer Engineering. 2024, 50(9): 18-32. https://doi.org/10.19678/j.issn.1000-3428.0068086

Abstract (875) Download PDF (1182) HTML (74)

Knowledge map

Save

Text similarity calculation is a part of natural language processing and is used to calculate the similarity between two words, sentences, or texts in many application scenarios. Research on text similarity calculation plays an important role in the development of artificial intelligence. Text similarity calculation has conventionally been based on character string surfaces. With the introduction of word vectors, text similarity calculation can be modeled and calculated based on statistics and deep learning, in addition to combining it with pre-trained models. First, text similarity calculation methods can be divided into five categories: character string-based, word vector-based, pre-trained model-based, deep learning-based, and other methods. Each category is briefly introduced. Subsequently, according to the principles of the different text similarity calculation methods, common methods such as the edit distance, Hamming distance, bag of words model, Vector Space Model (VSM), Deep Structured Semantic Model (DSSM), and Simple Contrastive learning of Sentence Embedding (SimCSE) are discussed. Finally, commonly used data sets and evaluation criteria for text similarity calculation are sorted and analyzed, and the future development of text similarity calculation is prospected.

Select

Research Hotspots and Reviews

Review of Attention Mechanisms in Object Detection

REN Shuyu, WANG Xiaoding, LIN Hui

Computer Engineering. 2024, 50(12): 16-32. https://doi.org/10.19678/j.issn.1000-3428.0068553

Abstract (709) Download PDF (1114) HTML (72)

Knowledge map

Save

The superior performance of Transformer in natural language processing has inspired researchers to explore their applications in computer vision tasks. The Transformer-based object detection model, Detection Transformer (DETR), treats object detection as a set prediction problem, introducing the Transformer model to address this task and eliminating the proposal generation and post-processing steps that are typical of traditional methods. The original DETR model encounters issues related to slow training convergence and inefficiency in detecting small objects. To address these challenges, researchers have implemented various improvements to enhance DETR performance. This study conducts an in-depth investigation of both the basic and enhanced modules of DETR, including modifications to the backbone architecture, query design strategies, and improvements to the attention mechanism. Furthermore, it provides a comparative analysis of various detectors and evaluates their performance and network architecture. The potential and application prospects of DETR in computer vision tasks are discussed herein, along with its current limitations and challenges. Finally, this study analyzes and summarizes related models, assesses the advantages and limitations of attention models in the context of object detection, and outlines future research directions in this field.

Select

Artificial Intelligence and Pattern Recognition

Mobile Robot Path Planning by Improved A^* Algorithm Fused with Improved Dynamic Window Approach

Zhite WANG, Liping LUO, Yikui LIAO

Computer Engineering. 2024, 50(8): 86-101. https://doi.org/10.19678/j.issn.1000-3428.0068483

Abstract (639) Download PDF (2242) HTML (34)

Knowledge map

Save

To satisfy the performance requirements for robot path planning, an algorithm integrating improved A^* algorithm and improved Dynamic Window Approach(DWA) is proposed, which shortens the path length and improves the searching efficiency and path smoothness. To combat the challenges of the traditional A^* algorithm in complex scenarios, a new heuristic function is designed based on Manhattan distance and the diagonal distance. The weights are assigned dynamically, and the global shortest path and the least searching time are obtained. Next, an improved search strategy based on the 8-neighborhood is proposed, which involves dynamically assigning the optimal search direction to the current node, thus improving the searching efficiency and reducing the time consumption compared to the traditional 8-neighborhood 8-direction search method. Subsequently, the Floyd algorithm is employed to remove redundant nodes, reduce the steering times, and shorten the path distance. Additionally, the traditional DWA faces certain challenges; for instance, the path is not globally optimal, the path planning may fail, or the path length may increase. To solve these problems, a keypoint densification strategy is proposed to modify the deflective path. Finally, the proposed improved A^* algorithm and fusion algorithm are compared with existing methods. The simulation results show that the improved A^* algorithm can generate the shortest global path in complex environments, reducing the average steering time by 16.3% and shortening the average path searching time by 55.66%. For the fused algorithm, the average path length and average runtime shorten by 6.1% and 14.7% in the temporary obstacle environment, respectively, and shorten by 1.6% and 39.8%, respectively, in the moving obstacle environment.

Select

Graphics and Image Processing

Improved YOLOv8-based Algorithm for Instance Segmentation in Traffic Scenes

ZHAO Nannan, GAO Feichen

Computer Engineering. 2025, 51(1): 198-207. https://doi.org/10.19678/j.issn.1000-3428.0068677

Abstract (606) Download PDF (645) HTML (64)

Knowledge map

Save

An instance segmentation algorithm (DE-YOLO) based on the improved YOLOv8 is proposed. To decrease the effect of complex backgrounds in the images, efficient multiscale attention is introduced, and cross-dimensional interaction ensures an even spatial feature distribution within each feature group. In the backbone network, a deformable convolution using DCNv2 is combined with a C2f convolutional layer to overcome the limitations of traditional convolutions and increase flexibility. This is performed to reduce harmful gradient effects and improve the overall accuracy of the detector. The dynamic nonmonotonic Wise-Intersection-over-Union (WIoU) focusing mechanism is employed instead of the traditional Complete Intersection-over-Union (CIoU) loss function to evaluate the quality, optimize detection frame positioning, and improve segmentation accuracy. Meanwhile, Mixup data enhancement processing is enabled to enrich the training features of the dataset and improve the learning ability of the model. The experimental results demonstrate that DE-YOLO improves the mean Average Precision of mask(mAP_mask) and mAP_mask@0.5 by 2.0 and 3.2 percentage points compared with the benchmark model YOLOv8n-seg in the Cityscapes dataset of urban landscapes, respectively. Furthermore, DE-YOLO maintains an excellent detection speed and small parameter quantity while exhibiting improved accuracy, with the model requiring 2.2-31.3 percentage points fewer parameters than similar models.

Select

Research Hotspots and Reviews

Review of 2D Image Matching Algorithms Based on Deep Learning Features

HUANG Kaiji, YANG Hua

Computer Engineering. 2024, 50(10): 16-34. https://doi.org/10.19678/j.issn.1000-3428.0068580

Abstract (557) Download PDF (2190) HTML (51)

Knowledge map

Save

The objective of image matching is to establish correspondences between similar structures across two or more images. This task is fundamental to computer vision, with applications in robotics, remote sensing, and autonomous driving. With the advancements in deep learning in recent years, Two-Dimensional (2D) image matching algorithms based on deep learning have seen regular improvements in feature extraction, description, and matching. The performance of these algorithms in terms of matching accuracy and robustness has surpassed that of traditional algorithms, leading to significant advancements. First, this study summarizes 2D image matching algorithms based on deep learning features from the past ten years and categorizes them into three types: two-stage image matching based on local features, image matching of joint detection and description, and image matching without feature detection. Second, the study details the development processes, classification methods, and performance evaluation metrics of these three categories and summarizes their advantages and limitations. Typical application scenarios of 2D image matching algorithms are then introduced, and the effects of research progress in 2D image matching on its application domains are analyzed. Finally, the study summarizes the development trends of 2D image matching algorithms and discusses future prospects.

Select

Research Hotspots and Reviews

Application of Deep Learning in Fingerprint Recognition

LI Shuo, ZHAO Chaoyang, QU Yinxuan, LUO Yaping

Computer Engineering. 2024, 50(12): 33-47. https://doi.org/10.19678/j.issn.1000-3428.0068276

Abstract (504) Download PDF (997) HTML (55)

Knowledge map

Save

Fingerprint recognition is one of the earliest and most mature biometric recognition technologies that is widely used in mobile payments, access control and attendance in the civilian field, and in criminal investigation to retrieve clues from suspects. Recently, deep learning technology has achieved excellent application results in the field of biometric recognition, and provided fingerprint researchers with new methods for automatic processing and the application of fusion features to effectively represent fingerprints, which have excellent application results at all stages of the fingerprint recognition process. This paper outlines the development history and application background of fingerprint recognition, expounds the main processing processes of the three stages of fingerprint recognition, which are image preprocessing, feature extraction, and fingerprint matching, summarizes the application status of deep learning technology in specific links at different stages, and compares the advantages and disadvantages of different deep neural networks in specific links, such as image segmentation, image enhancement, direction field estimation, minutiae extraction, and fingerprint matching. Finally, some of the current problems and challenges in the field of fingerprint recognition are analyzed, and future development directions, such as building public fingerprint datasets, multi-scale fingerprint feature extraction, and training end-to-end fingerprint recognition models, are prospected.

Select

Artificial Intelligence and Pattern Recognition

Chinese Scientific Literature Annotation Method Based on Large Language Model

YANG Dongju, HUANG Juntao

Computer Engineering. 2024, 50(9): 113-120. https://doi.org/10.19678/j.issn.1000-3428.0068400

Abstract (457) Download PDF (1091) HTML (35)

Knowledge map

Save

High-quality annotated data are crucial for Natural Language Processing(NLP) tasks in the field of Chinese scientific literature. A method of annotation based on a Large Language Model(LLM) was proposed to address the lack of high-quality annotated corpora and the issues of inconsistent and inefficient manual annotation in Chinese scientific literature. First, a fine-grained annotation specification suitable for multi-domain Chinese scientific literature was established to clarify entity types and annotation granularity. Second, a structured text annotation prompt template and a generation parser were designed. The annotation task of Chinese scientific literature was set up as a single-stage, single-round question-and-answer process in which the annotation specifications and text to be annotated were filled into the corresponding slots of the prompt template to construct the task prompt. This prompt was then injected into the LLM to generate output text containing annotation information. Finally, the structured annotation data were obtained by the parser. Subsequently, using prompt learning based on LLM, the Annotated Chinese Scientific Literature(ACSL) entity dataset was generated, which contains 10 000 annotated documents and 72 536 annotated entities distributed across 48 disciplines. For ACSL, three baseline models based on RoBERTa-wwm-ext, a configuration of the Robustly optimized Bidirectional Encoder Representations from Transformers(RoBERT) approach, were proposed. The experimental results demonstrate that the BERT+Span model performs best on long-span entity recognition in Chinese scientific literature, achieving an F1 value of 0.335. These results serve as benchmarks for future research.

Select

Research Hotspots and Reviews

Survey on GPGPU and CUDA Unified Memory Research Status

PANG Wenhao, WANG Jialun, WENG Chuliang

Computer Engineering. 2024, 50(12): 1-15. https://doi.org/10.19678/j.issn.1000-3428.0068694

Abstract (455) Download PDF (612) HTML (41)

Knowledge map

Save

In the context of big data, the rapid advancement of fields such as scientific computing and artificial intelligence, there is an increasing demand for high computational power across various domains. The unique hardware architecture of the Graphics Processing Unit (GPU) makes it suitable for parallel computing. In recent years, the concurrent development of GPUs and fields such as artificial intelligence and scientific computing has enhanced GPU capabilities, leading to the emergence of mature General-Purpose Graphics Processing Units (GPGPUs). Currently, GPGPUs are one of the most important co-processors for Central Processing Units (CPUs). However, the fixed hardware configuration of the GPU after delivery and its limited memory capacity can significantly hinder its performance, particularly when dealing with large datasets. To address this issue, Compute Unified Device Architecture (CUDA) 6.0 introduces unified memory, allowing GPGPU and CPU to share a virtual memory space, thereby simplifying heterogeneous programming and expanding the GPGPU-accessible memory space. Unified memory offers a solution for processing large datasets on GPGPUs and alleviates the constraints of limited GPGPU memory capacity. However, the use of unified memory introduces performance issues. Effective data management within unified memory is the key to enhancing performance. This article provides an overview of the development and application of CUDA unified memory. It covers topics such as the features and evolution of unified memory, its advantages and limitations, its applications in artificial intelligence and big data processing systems, and its prospects. This article provides a valuable reference for future work on applying and optimizing CUDA unified memory.

Select

Artificial Intelligence and Pattern Recognition

Enhanced Domain Multi-modal Entity Recognition Based on Knowledge Graph

Huayu LI, Zhikang ZHANG, Yang YAN, Yang YUE

Computer Engineering. 2024, 50(8): 31-39. https://doi.org/10.19678/j.issn.1000-3428.0068225

Abstract (449) Download PDF (924) HTML (43)

Knowledge map

Save

Addressing the limitations of Chinese Named Entity Recognition(NER) within specific domains, this paper proposes a model to enhance entity recognition accuracy by utilizing domain-specific Knowledge Graphs(KGs) and images. The proposed model leverages domain graphs and images to improve entity recognition accuracy in short texts related to computer science. The model employs a Bidirectional Encoder Representations from Transformers(BERT)-Bidirectional Long Short-Term Memory(BiLSTM)-Attention-based model to extract textual features, a ResNet152-based approach to extract image features, and a word segmentation tool to obtain noun entities from sentences. These noun entities are then embedded with KG nodes using BERT. The model uses cosine similarity to determine the most similar nodes in the KG for the segmented words in the sentence. It retains neighboring nodes with a distance of 1 from this node to generate an optimal matching subgraph for semantic enrichment of the sentence. A Multi-Layer Perceptron(MLP) is employed to map the textual, image, and subgraph features into the same space. A unique gating mechanism is utilized to achieve fine-grained cross-modal feature fusion between textual and image features. Finally, multimodal features are fused with subgraph features by using a cross-attention mechanism and are then fed into the decoder for entity labeling. Experimental comparisons with relevant baseline models conducted on Twitter2015, Twitter2017, and a self-constructed computer science dataset are presented. The results indicate that the proposed approach achieved precision, recall, and F1 value of 88.56%, 87.47%, and 88.01% on the domain dataset compared to the optimal baseline model, its F1 value increased by 1.36 percentage points, demonstrating the effectiveness of incorporating domain KGs for entity recognition.

Select

Artificial Intelligence and Pattern Recognition

Deepfake Cross-Model Defense Method Based on Generative Adversarial Network

DAI Lei, CAO Lin, GUO Yanan, ZHANG Fan, DU Kangning

Computer Engineering. 2024, 50(10): 100-109. https://doi.org/10.19678/j.issn.1000-3428.0068106

Abstract (447) Download PDF (75) HTML (10)

Knowledge map

Save

To reduce social risks caused by the abuse of deepfake technology, an active defense method against deep forgery based on a Generative Adversarial Network (GAN) is proposed. Adversarial samples are created by adding imperceptible perturbation to original images, which significantly distorts the output of multiple forgery models. The proposed model comprises an adversarial sample generation module and an adversarial sample optimization module. The adversarial-sample generation module includes a generator and discriminator. After the generator receives an original image to generate a perturbation, the spatial distribution of the perturbation is constrained through adversarial training. By reducing the visual perception of the perturbation, the authenticity of the adversarial sample is improved. The adversarial sample optimization module comprises basic adversarial watermarking, deep forgery models, and discriminators. This module simulates black-box scenarios to attack multiple deep forgery models, thereby improving the attack and migration of adversarial samples. Training and testing are conducted on commonly used deepfake datasets Celebfaces Attributes (CelebA) and Labeled Faces in the Wild (LFW). Experimental results show that compared with existing active defense methods, the proposed method achieves a defense success rate exceeding 85% based on the cross-model active defense method and generates adversarial samples. Additionally, the method improves efficiency by 20-30 times compared with those of conventional algorithms.

Select

Graphics and Image Processing

PCB Defect Detection Algorithm Based on Improved YOLOv7

ZHANG Xu, CHEN Cifa, DONG Fangmin

Computer Engineering. 2024, 50(12): 318-328. https://doi.org/10.19678/j.issn.1000-3428.0068588

Abstract (444) Download PDF (469) HTML (22)

Knowledge map

Save

Achieving enhanced detection accuracy is a challenging task in the field of PCB defect detection. To address this problem, this study proposes a series of improvement methods based on PCB defect detection. First, a novel attention mechanism, referred to as BiFormer, is introduced. This mechanism uses dual-layer routing to achieve dynamic sparse attention, thereby reducing the amount of computation required. Second, an innovative upsampling operator called CARAFE is employed. This operator combines semantic and content information for upsampling, thereby making the upsampling process more comprehensive and efficient. Finally, a new loss function based on the MPDIoU metric, referred to as the LMPDIoU loss function, is adopted. This loss function effectively addresses unbalanced categories, small targets, and denseness problems, thereby further improving image detection performance. The experimental results reveal that the model achieves a significant improvement in mean Average Precision (mAP) with a score of 93.91%, 13.12 percentage points higher than that of the original model. In terms of recognition accuracy, the new model reached a score of 90.55%, representing an improvement of 8.74 percentage points. These results show that the introduction of the BiFormer attention mechanism, CARAFE upsampling operator, and LMPDIoU loss function effectively improves the accuracy and efficiency of PCB defect detection. Thus, the proposed methods provide valuable references for research in industrial inspection, laying the foundation for future research and applications.

Select

Artificial Intelligence and Pattern Recognition

Robot Local Path Planning Based on Improved Artificial Potential Field Method

ZHANG Guosheng, LI Caihong, ZHANG Yaoyu, ZHOU Ruihong, LIANG Zhenying

Computer Engineering. 2025, 51(1): 88-97. https://doi.org/10.19678/j.issn.1000-3428.0068738

Abstract (436) Download PDF (1010) HTML (47)

Knowledge map

Save

This study proposes an improved Artificial Potential Field (APF) algorithm (called FC-V-APF) based on Fuzzy Control (FC) and a virtual target point method to solve the local minimum trap and path redundancy issues of the APF method in robot local path planning. First, a virtual target point obstacle avoidance strategy is designed, and the V-APF algorithm is constructed to help the robot overcome local minimum traps by adding an obstacle crossing mechanism and a target point update threshold. Second, a control strategy based on the cumulative angle sum is proposed to assist the robot in exiting a multi-U complex obstacle area. Subsequently, the V-APF and FC algorithms are combined to construct the FC-V-APF algorithm. The corresponding environment is evaluated using real-time data from the radar sensor and designed weight function, and a fuzzy controller is selected to output the auxiliary force to avoid obstacles in advance. Finally, a simulation environment is built on the Robot Operating System (ROS) platform to compare the path planning performance of the FC-V-APF algorithm with that of other algorithms. Considering path length, running time, and speed curves, the designed FC-V-APF algorithm can quickly eliminate traps, reduce redundant paths, improve path smoothness, and reduce planning time.

Select

Cyberspace Security

Privacy Preserving Algorithm Using Federated Learning Against Attacks

WU Ruolan, CHEN Yuling, DOU Hui, ZHANG Yangwen, LONG Zhong

Computer Engineering. 2025, 51(2): 179-187. https://doi.org/10.19678/j.issn.1000-3428.0068705

Abstract (422) Download PDF (10409) HTML (12)

Knowledge map

Save

Federated learning is an emerging distributed learning framework that facilitates the collective engagement of multiple clients in global model training without sharing raw data, thereby effectively safeguarding data privacy. However, traditional federated learning still harbors latent security vulnerabilities that are susceptible to poisoning and inference attacks. Therefore, enhancing the security and model performance of federated learning has become imperative for precisely identifying malicious client behavior by employing gradient noise as a countermeasure to prevent attackers from gaining access to client data through gradient monitoring. This study proposes a robust federated learning framework that combines mechanisms for malicious client detection with Local Differential Privacy (LDP) techniques. The algorithm initially employs gradient similarity to identify and classify potentially malicious clients, thereby minimizing their adverse impact on model training tasks. Subsequently, a dynamic privacy budget based on LDP is designed, to accommodate the sensitivity of different queries and individual privacy requirements, with the objective of achieving a balance between privacy preservation and data quality. Experimental results on the MNIST, CIFAR-10, and Movie Reviews (MR) text classification datasets demonstrate that compared to the three baseline algorithms, this algorithm results in an average 3 percentage points increase in accuracy for sP-type clients, thereby achieving a higher security level with significantly enhanced model performance within the federated learning framework.

Select

Computer Architecture and Software Technology

Design of PCIe Verification Platform in SoC Environment Based on UVM

GAO Qiuchen, HU Yonghua

Computer Engineering. 2024, 50(9): 189-196. https://doi.org/10.19678/j.issn.1000-3428.0068240

Abstract (418) Download PDF (841) HTML (17)

Knowledge map

Save

The System of Chip (SoC) integrates multiple peripheral interfaces, the verification of which has become one of the most time-consuming steps in chip development. The PCIe protocol provides high-speed peer-to-peer serial interconnection services within the system, while supporting hot swapping, which has gradually become a universal bus protocol. When using conventional Hardware Description Languages (HDL) to validate PCIe interface designs, problems usually arise, such as difficulty in covering multiple design scenarios and boundary conditions in a short period, leading to insufficient verification. To address the above issues, this study utilizes Universal Verification Methodology (UVM) to build a PCIe interface validation platform. This platform adopts a UVM-defined framework and test classes, achieving top-level environmental integration and design of test constraints, with strong reusability and comprehensive verification. This implementation includes SoC system-level environmental integration, design, and connection of the modules to be tested, implementation of sequencer and monitor classes in the verification platform, and partial interface design. To ensure that the test cases cover as many design states and paths as possible, different functional points are divided deliberately, and constraint conditions are designed to evaluate the effectiveness and coverage of test cases using various coverage indicators. The experimental results show that the verification platform can curtail the verification cycle and increase the comprehensive coverage by more than 30%.

Select

Research Hotspots and Reviews

Survey of Zero-Shot Transfer Learning Methods Based on Vision-Language Pre-Trained Models

SUN Renke, XU Jinghao, HUANGFU Zhiyu, LI Zhongnian, XU Xinzheng

Computer Engineering. 2024, 50(10): 1-15. https://doi.org/10.19678/j.issn.1000-3428.0070036

Abstract (403) Download PDF (715) HTML (27)

Knowledge map

Save

In recent years, remarkable advancements in Artificial Intelligence (AI) across unimodal domains, such as computer vision and Natural Language Processing (NLP), have highlighted the growing importance and necessity of multimodal learning. Among the emerging techniques, the Zero-Shot Transfer (ZST) method, based on visual-language pre-trained models, has garnered widespread attention from researchers worldwide. Owing to the robust generalization capabilities of pre-trained models, leveraging visual-language pre-trained models not only enhances the accuracy of zero-shot recognition tasks but also addresses certain zero-shot downstream tasks that are beyond the scope of conventional approaches. This review provides an overview of ZST methods based on vision-language pre-trained models. First, it introduces conventional approaches to Few-Shot Learning (FSL) and summarizes its main forms. It then discusses the distinctions between ZST and FSL based on vision-language pre-trained models, highlighting the new tasks that ZST can address. Subsequently, it explores the application of ZST methods in various downstream tasks, including sample recognition, object detection, semantic segmentation, and cross-modal generation. Finally, it analyzes the challenges of current ZST methods based on vision-language pre-trained models and outlines potential future research directions.

Select

Development Research and Engineering Application

Improved Traffic Sign Detection Algorithm Based on YOLOv8s

XIE Jing, DENG Yueming, WANG Runmin

Computer Engineering. 2024, 50(11): 338-349. https://doi.org/10.19678/j.issn.1000-3428.0068742

Abstract (400) Download PDF (1048) HTML (28)

Knowledge map

Save

Due to low detection accuracy for small targets in complex environments, along with false and missed detections in mainstream traffic sign detection algorithms, an improved algorithm based on YOLOv8s is proposed. This algorithm uses Pconv convolution in the backbone network and incorporates a C2faster module to achieve a lightweight network structure while maintaining network accuracy. In addition, to better utilize the information between low- and high-level features and enhance the regional context association ability, the SPPFCSPC module is designed as a spatial pyramid pooling module based on the concept of SPPF. In addition, by adding the GAM attention mechanism, the feature extraction capability of the network is further enhanced, and the detection accuracy is effectively improved. To improve the detection ability of small targets, a four-fold downsampling branch is added at the neck of the network to optimize target positioning. In addition, the Focal-EIoU loss function is used to replace the original CIoU loss function to accurately define the aspect ratio of the prediction box, which alleviates the problem of imbalance between the positive and negative samples. Experimental results show that on the CCTSDB-2021 traffic sign dataset, the improved algorithm achieved 86.1%, 73.0%, and 81.2% precision, recall, and mAP@0.5, respectively. Compared with the original YOLOv8s algorithm, increases of 0.8%, 6.3%, and 6.9% were observed, respectively. This algorithm significantly reduces false and missed detections in complex weather and harsh environments, offering better overall detection performance than the comparison algorithm, with strong practical value.

Select

Intelligent Situational Awareness and Computing

Network Security Situation Awareness Method Based on Fusion Model

GUO Shangwei, LIU Shufeng, LI Ziming, OUYANG Deqiang, WANG Ning, XIANG Tao

Computer Engineering. 2024, 50(11): 1-9. https://doi.org/10.19678/j.issn.1000-3428.0069758

Abstract (390) Download PDF (586) HTML (52)

Knowledge map

Save

Cybersecurity threats are becoming increasingly prevalent with the rapid advancement of Internet technologies. Cyberattacks exhibiting high complexity and diversity, are posing significant challenges to existing defense mechanisms. As an emerging concept, situation awareness technology offers new approaches to enhancing cybersecurity defense. However, the current cybersecurity situation awareness methods suffer from limited data feature extraction capabilities and inadequate handling of long-term sequential data. To address these issues, this study proposes a fusion model that integrates Stack Sparse Auto-Encoder (SSAE), Convolutional Neural Network (CNN), Bidirectional Gated Recurrent Unit (BiGRU), and Attention Mechanism (AM). By utilizing SSAE and CNN to extract data features and enhancing the focus on critical information through the AM in the BiGRU model, the proposed model aims to classify the attack categories of abnormal traffic. In conjunction with the network security situational quantification indicators proposed in this study, the network security situation is quantitatively evaluated and classified. The experimental results demonstrate that the proposed fusion model outperforms traditional deep learning models in various metrics, enabling an accurate perception of the network situation.

Select

40th Anniversary Celebration of Shanghai Computer Society

Review of Application of Artificial Intelligence in University Informatization

QI Fenglin, SHEN Jiajie, WANG Maoyi, ZHANG Kai, WANG Xin

Computer Engineering. 2025, 51(4): 1-14. https://doi.org/10.19678/j.issn.1000-3428.0070222

Abstract (377) Download PDF (813) HTML (53)

Knowledge map

Save

The rapid development of Artificial Intelligence (AI) has empowered numerous fields and significantly impacted society, establishing a solid technological foundation for university informatization services. This study explores the historical development of both AI and university informatization by analyzing their respective trajectories and interconnections. Although universities worldwide may focus on different aspects of AI in their digital transformation efforts, they universally demonstrate vast potential of AI in enhancing education quality and streamlining management processes. Thus, this study focuses on five core areas: teaching, learning, administration, assessment, and examination. It comprehensively summarizes typical AI-empowered application cases to demonstrate how AI effectively improves educational quality and management efficiency. In addition, this study highlights the potential challenges associated with AI applications in university informatization, such as data privacy protection, algorithmic bias, and technology dependence. Furthermore, common strategies for addressing these issues such as enhancing data security, optimizing algorithm transparency and fairness, and fostering digital literacy among both teachers and students are elaborated upon in this study. Based on these analyses, the study explores future research directions for AI in university informatization, emphasizing the balance technological innovation and ethical standards. It advocates for the establishment of interdisciplinary collaboration mechanisms to promote the healthy and sustainable development of AI in the field of university informatization.

Select

Research Hotspots and Reviews

Super-Resolution Reconstruction of Spatiotemporal Fusion for Dual-Stream Remote Sensing Images Based on Swin Transformer

WANG Zhihao, QIAN Yuntao

Computer Engineering. 2024, 50(9): 33-45. https://doi.org/10.19678/j.issn.1000-3428.0068296

Abstract (375) Download PDF (1109) HTML (33)

Knowledge map

Save

The spatiotemporal fusion super-resolution reconstruction of remote sensing images extracts information from low-resolution images with high temporal density and high-resolution images with low temporal resolution to generate remote sensing images with both high temporal and spatial resolutions. This process is directly related to the implementation of subsequent tasks such as interpretation, detection, and tracking. With the rapid advancement of Convolutional Neural Network (CNN), researchers have proposed a series of CNN-based spatiotemporal fusion methods. However, because of the inherent limitations of convolution operations, these methods still face challenges with respect to global information extraction. Inspired by the global modeling capabilities of the Swin Transformer, this paper proposes a super-resolution reconstruction model based on the Swin Transformer. In the feature extraction stage, a dual-stream structure is introduced, dividing the feature extraction network into two parts to extract temporal and spatial information separately. The performance of the model is enhanced by the global capabilities of the Swin Transformer. In the feature fusion stage, a Convolutional Block Attention Module (CBAM) that combines channel and spatial attention is introduced to enhance the important features and improve the image reconstruction accuracy. Comparative experiments are conducted on the Coleambally Irrigation Area (CIA) and Lower Gwydir Catchment (LGC) datasets using various spatiotemporal fused super-resolution reconstruction models. The results show that the proposed model achieved optimal performance across all evaluation metrics, demonstrating superior performance and enhanced generalization capabilities.

Select

Research Hotspots and Reviews

Review of Research Progress on Knowledge Graph Embedding

MA Hengzhi, QIAN Yurong, LENG Hongyong, WU Haipeng, TAO Wenbin, ZHANG Yiyang

Computer Engineering. 2025, 51(2): 18-34. https://doi.org/10.19678/j.issn.1000-3428.0068386

Abstract (362) Download PDF (364) HTML (22)

Knowledge map

Save

With the continuous development of big data and artificial intelligence technologies, knowledge graph embedding is developing rapidly, and knowledge graph applications are becoming increasingly widespread. Knowledge graph embedding improves the efficiency of knowledge representation and reasoning by representing structured knowledge into a low-dimensional vector space. This study provides a comprehensive overview of knowledge graph embedding technology, including its basic concepts, model categories, evaluation indices, and application prospects. First, the basic concepts and background of knowledge graph embedding are introduced, classifying the technology into four main categories: embedding models based on translation mechanisms, semantic- matching mechanisms, neural networks, and additional information. The core ideas, scoring functions, advantages and disadvantages, and application scenarios of the related models are meticulously sorted. Second, common datasets and evaluation indices of knowledge graph embedding are summarized, along with application prospects, such as link prediction and triple classification. The experimental results are analyzed, and downstream tasks, such as question-and-answer systems and recommenders, are introduced. Finally, the knowledge graph embedding technology is reviewed and summarized, outlining its limitations and the primary existing problems while discussing the opportunities and challenges for future knowledge graph embedding along with potential research directions.

Select

Research Hotspots and Reviews

Survey of Optimization Methods for Android Smartphone Storage Systems

CI Tianzhao, YANG Hao, ZHOU You, XIE Changsheng, WU Fei

Computer Engineering. 2025, 51(3): 1-23. https://doi.org/10.19678/j.issn.1000-3428.0068673

Abstract (351) Download PDF (762) HTML (46)

Knowledge map

Save

Smartphones have become an integral part of modern daily life. The Android operating system currently holds the largest market share in the mobile operating system market owing to its open-source nature and comprehensive ecosystem. Within Android smartphones, the storage subsystem plays a pivotal role, exerting a significant influence on the user experience. However, the design of Android mobile storage systems diverges from server scenarios, necessitating the consideration of distinct factors, such as resource constraints, cost sensitivity, and foreground application prioritization. Extensive research has been conducted in this area. By summarizing and analyzing the current research status in this field, we categorize the issues experienced by users of Android smartphone storage systems into five categories: host-side writing amplification, memory swapping, file system fragmentation, flash device performance, and I/O priority inversion. Subsequently, existing works addressing these five categories of issues are classified, along with commonly used tools for testing and analyzing mobile storage systems. Finally, we conclude by examining existing techniques that ensure the user experience with Android smartphone storage systems and discuss potential avenues for future investigation.

Select

Artificial Intelligence and Pattern Recognition

Aggregation Pedestrian Detection Model Based on Improved YOLOv8

HUANG Kun, QI Zhaojian, WANG Juanmin, HU Qian, HU Weichao, PI Jianyong

Computer Engineering. 2025, 51(5): 133-142. https://doi.org/10.19678/j.issn.1000-3428.0069026

Abstract (346) Download PDF (270) HTML (30)

Knowledge map

Save

Pedestrian detection in crowded scenes is a key technology in intelligent monitoring of public space. It enables the intelligent monitoring of crowds, using object detection methods to detect the positions and number of pedestrians in videos. This paper presents Crowd-YOLOv8, an improved version of the YOLOv8 detection model, to address the issue of pedestrians being easily missed owing to occlusion and small target size in densely populated areas. First, nostride-Conv-SPD is introduced into the backbone network to enhance its capability of extracting fine-grained information, such as small object features in images. Second, small object detection heads and the CARAFE upsampling operator are introduced into the neck part of the YOLOv8 network to fuse features at different scales and improve the detection performance in the case of small targets. Experimental results demonstrate that the proposed method achieves an mAP@0.5 of 84.3% and an mAP@0.5∶0.95 of 58.2% on a CrowdedHuman dataset, which is an improvement of 3.7 and 5.2 percentage points, respectively, compared to those of the original YOLOv8n. On the WiderPerson dataset, the proposed method achieves an mAP@0.5 of 88.4% and an mAP@0.5∶0.95 of 67.4%, which is an improvement of 1.1 and 1.5 percentage points compared to those of the original YOLOv8n.

Select

Artificial Intelligence and Pattern Recognition

Research on Neural Network Model Generation and Deployment for Edge Intelligence

Yusong TAN, Tian LI, Yusen ZHANG

Computer Engineering. 2024, 50(8): 1-12. https://doi.org/10.19678/j.issn.1000-3428.0068554

Abstract (344) Download PDF (491) HTML (29)

Knowledge map

Save

The exponential growth of diverse terminal devices, driven by advancements in mobile computing, 5th Generation mobile communication (5G), and Internet of Things (IoT) technologies, has generated vast volumes of semi-structured and unstructured data, as these devices connect to networks. Neural networks are widely applicable because of their unique advantages in mining data. To enhance data processing capabilities and inference accuracy, neural network models are often designed to be complex and thus consume substantial computational resources in both storage and execution. However, edge devices typically have limited computational resources and cannot satisfy the storage and inference requirements of complex neural network models. Therefore, they often rely on cloud computing centers to perform these tasks. This cloud-based collaboration can lead to increased response latency and network bandwidth consumption, incurring potential risks such as the infringement of user privacy through data leakage. To address these issues, this paper introduces Neural network Generation and Deployment (NGD), a method for rapidly generating and deploying tailored neural network models on edge devices, whereby neural network models that match the hardware configuration of edge devices and specific computational task requirements are generated and rapidly deployed onto target devices for local inference. Experiments are conducted on three typical edge devices, and the results demonstrate the effectiveness of NGD in generating and deploying models, affirming its practicality and effectiveness.

Select

Artificial Intelligence and Pattern Recognition

Multi-UAV Multi-Object Tracking Based on Deep Learning

ZHOU Hanqi, FANG Dongxu, ZHANG Ningbo, SUN Wensheng

Computer Engineering. 2025, 51(4): 57-65. https://doi.org/10.19678/j.issn.1000-3428.0069100

Abstract (343) Download PDF (651) HTML (61)

Knowledge map

Save

Unmanned Aerial Vehicle (UAV) Multi-Object Tracking (MOT) technology is widely used in various fields such as traffic operation, safety monitoring, and water area inspection. However, existing MOT algorithms are primarily designed for single-UAV MOT scenarios. The perspective of a single-UAV typically has certain limitations, which can lead to tracking failures when objects are occluded, thereby causing ID switching. To address this issue, this paper proposes a Multi-UAV Multi-Object Tracking (MUMTTrack) algorithm. The MUMTTrack network adopts an MOT paradigm based on Tracking By Detection (TBD), utilizing multiple UAVs to track objects simultaneously and compensating for the perspective limitations of a single-UAV. Additionally, to effectively integrate the tracking results from multiple UAVs, an ID assignment strategy and an image matching strategy are designed based on the Speeded Up Robust Feature (SURF) algorithm for MUMTTrack. Finally, the performance of MUMTTrack is compared with that of existing widely used single-UAV MOT algorithms on the MDMT dataset. According to the comparative analysis, MUMTTrack demonstrates significant advantages in terms of MOT performance metrics, such as the Identity F1 (IDF1) value and Multi-Object Tracking Accuracy (MOTA).

Select

Development Research and Engineering Application

Research on Online Listening Behavior Recognition Model Based on Improved YOLOv8 Algorithm

LI Mengkun, YUAN Chen, WANG Qi, ZHAO Chong, CHEN Jingxuan, LIU Lifeng

Computer Engineering. 2025, 51(1): 287-294. https://doi.org/10.19678/j.issn.1000-3428.0068656

Abstract (327) Download PDF (449) HTML (21)

Knowledge map

Save

Target detection technology is advancing, but recognizing online listening behavior remains a challenge. Inaccurate identification of online classroom conduct and high model computation owing to limited human supervision and complex target detection models pose problems. To address this, we employed an upgraded YOLOv8-based method to detect and identify online listening behaviors. This approach incorporates a Bidirectional Feature Pyramid Network (BiFPN) to fuse features based on YOLOv8n, thereby enhancing feature extraction and model recognition accuracy. Second, the C3Ghost module is selected over the C2f module on the Head side to minimize the computational burden significantly. The study demonstrates that the YOLOv8n-BiFPN-C3Ghost model achieved an mAP@0.5 score of 98.6% and an mAP@0.5∶0.95 score of 92.6% on an online listening behavior dataset. The proposed model enhanced the accuracy by 4.2% and 5.7%, respectively, compared with other classroom behavior recognition models. Moreover, the required computation amount is only 6.6 GFLOPS, which is 19.5% less than that of the original model. The YOLOv8n-BiFPN-C3Ghost model is capable of detecting and recognizing online listening behavior with greater speed and accuracy while utilizing lower computing costs. This will ultimately enable the dynamic and scientific recognition of online classroom learning among students.

Select

Artificial Intelligence and Pattern Recognition

Depression Detection Based on Contextual Knowledge Enhanced Transformer Network

Yazhou ZHANG, Yu HE, Lu RONG, Xiangkai WANG

Computer Engineering. 2024, 50(8): 75-85. https://doi.org/10.19678/j.issn.1000-3428.0067936

Abstract (326) Download PDF (463) HTML (15)

Knowledge map

Save

Depression, as a prevalent mental health problem, substantially impacts individual′s daily lives and well-being. Addressing the limitations of current depression detection, such as subjectivity and manual intervention, automatic detection methods based on deep learning have become a popular research direction. The primary challenge in the most accessible text modality is modelling the long-range and sequence dependencies in depressive texts. To address this problem, this paper proposes a contextual knowledge-based enhanced Transformer network model, named Robustly optimized Bidirectional Encoder Representations from Transformers approach-Bidirectional Long Short-Term Memory(RoBERTa-BiLSTM), to comprehensively extract and utilize contextual features from depressive text sequences. By combining the strengths of sequence models and Transformer architectures, the proposed model captures contextual interactions between words to provide a reference for depression category prediction and information characterization. First, the RoBERTa model is employed to embed vocabulary into a semantic vector space, and then, a BiLSTM network effectively captures long-range contextual semantics. Finally, empirical research is conducted on two large-scale datasets, DAIC-WOZ and EATD-Corpus. Experimental results demonstrate that the model achieves an accuracy exceeding 0.74 and 0.93, and a recall exceeding 0.66 and 0.56, respectively, enabling accurate depression detection.

Select

Graphics and Image Processing

Target Detection Under Low Light Conditions Based on Visible and Infrared Images

Yuting WANG, Zhiming LIU, Yaping WAN, Tao ZHU

Computer Engineering. 2024, 50(8): 270-281. https://doi.org/10.19678/j.issn.1000-3428.0068186

Abstract (325) Download PDF (668) HTML (23)

Knowledge map

Save

Image fusion is the process of combining multiple input images into a unified single image. Although visible-infrared image fusion enhances target detection accuracy, its performance often fails in low-light scenarios. This study introduces a novel fusion model, namely, DAPR-Net, which features an encoder-decoder structure with cross-layer residual connections. These connections link the encoder's output to the corresponding layer in the decoder to thereby reinforce the information flow between the convolutional layers. Within the encoder, a Dual Attention Feature Extraction Module(AFEM) is designed to better distinguish the differences between the fused image and the input visible light and infrared images while retaining crucial information from both. The experimental results show that compared with the benchmark PIAFusion model, the information entropy, spatial frequency, mean gradient, standard deviation, and visual fidelity indices on the model on LLVIP and MSRS datasets increase by 0.849, 3.252, 7.634, 10.38, and 0.293, and 2.105, 2.23, 4.099, 27.938, and 0.343, respectively. In the YOLOV5 target detection network, the average mean precision, recall rate, accuracy rate, and F1 value index of the LLVIP and MSRS datasets increased by 8.8, 1.4, 1.9, and 1.5 percentage points and 7.5, 1.4, 8.8, and 1.2 percentage points, respectively, showing significant advantages compared with other fusion methods.

Select

Artificial Intelligence and Pattern Recognition

Seasonal PM_2.5 Concentration Prediction Based on SARIMA-SVM Model

SONG Yinghua, XU Yaan, ZHANG Yuanjin

Computer Engineering. 2025, 51(1): 51-59. https://doi.org/10.19678/j.issn.1000-3428.0068372

Abstract (320) Download PDF (387) HTML (9)

Knowledge map

Save

Air pollution is one of the primary challenges in urban environmental governance, with PM_2.5 being a significant contributor that affects air quality. As the traditional time-series prediction models for PM_2.5 often lack seasonal factor analysis and sufficient prediction accuracy, a fusion model based on machine learning, Seasonal Autoregressive Integrated Moving Average (SARIMA)-Support Vector Machine (SVM), is proposed in this paper. The fusion model is a tandem fusion model, which splits the data into linear and nonlinear parts. Based on the Autoregressive Integral Moving Average (ARIMA) model, the SARIMA model adds seasonal factor extraction parameters, to effectively analyze and predict the future linear seasonal trend of PM_2.5 data. Combined with the SVM model, the sliding step size prediction method is used to determine the optimal prediction step size for the residual series, thereby optimizing the residual sequence of the predicted data. The optimal model parameters are further determined through grid search, leading to the long-term predictions of PM_2.5 data and improves overall prediction accuracy. The analysis of the PM_2.5 monitoring data in Wuhan for the past five years shows that prediction accuracy of the fusion model is significantly higher than that of the single model. In the same experimental environment, the accuracy of the fusion model is improved by 99%, 99%, and 98% compared with those of ARIMA, Auto ARIMA, and SARIMA models, respectively and the stability of the model is also better, thus providing a new direction for the prediction of PM_2.5.

Select

Artificial Intelligence and Pattern Recognition

Multi-Strategy Improved Dung Beetle Optimization Algorithm

KUANG Xin, YANG Bo, MA Hua, TANG Wensheng, XIAO Hongfeng, CHEN Ling

Computer Engineering. 2024, 50(10): 119-136. https://doi.org/10.19678/j.issn.1000-3428.0068502

Abstract (318) Download PDF (376) HTML (10)

Knowledge map

Save

The existing Dung Beetle Optimization(DBO) algorithm has the disadvantages of poor search accuracy and insufficient global search ability, thereby easily falling into local optima. This paper proposes a multi-strategy improved dung beetle optimization algorithm that uses a chaotic opposition-based learning strategy to initialize the dung beetle population, whereby dung beetle individuals are evenly distributed in solution space and population diversity is improved. The golden sine strategy with a nonlinear weight is introduced to improve the ball-rolling behavior and coordinate the global search and local mining ability of the algorithm. Foraging behavior is improved by referring to the position update strategy of the sparrow search algorithm, which brings the population close to the optimal position and improves convergence speed and algorithmic accuracy. Stealing behavior is improved by introducing a piecewise function, which benefits the population in the full global exploration in the early iteration stages, to avoid premature convergence of the algorithm. The Cauchy-Gaussian mutation strategy with a nonlinear weight is used to randomly perturb the current optimal position and guide the algorithm to jump out of the local optimal position. The proposed algorithm is compared with five optimization algorithms using 23 benchmark functions, 12 CEC2022 test functions, and two engineering optimization problems. The experimental results show that the proposed algorithm is superior to the other algorithms and ranks first among at least 21 benchmark functions, 10 CEC2022 test functions, and two engineering optimization problems. Compared with the original dung beetle optimization algorithm, the proposed algorithm exhibits significant improvements in convergence accuracy, convergence speed, global search ability, and stability.

Select

Graphics and Image Processing

Low-illumination Object-Detection Algorithm Based on Image Adaptive Enhancement

WANG Feifan, CHEN Xi'ai, REN Weihong, GUAN Yu, HAN Zhi, TANG Yandong

Computer Engineering. 2024, 50(10): 352-361. https://doi.org/10.19678/j.issn.1000-3428.0068407

Abstract (311) Download PDF (414) HTML (10)

Knowledge map

Save

In the case of detection tasks in low-light environments, owing to the influence of unfavorable factors, such as low brightness, low contrast, and noise, missed and wrong detections can occur. Hence, a low-light object detection algorithm based on image adaptive enhancement is proposed. Combining conventional image processing methods with deep learning, an image adaptive enhancement network is designed, where multiple adjustable filters are combined in cascade to gradually enhance the input low-light image, and the adjustment parameters of each filter are predicted using a convolutional neural network based on the global information of the input image. The adaptive enhancement network is combined with the YOLOv5 object detection network for end-to-end joint training such that the image enhancement effect is more conducive to object detection. As the low-light object detection process is susceptible to missed detection, the channel attention mechanism SE-Net is improved, and a feature enhancement network is designed and embedded into the end of the Neck region of the YOLOv5 network to reduce the loss of information about potential target features caused by the process of fusion of network features. Experimental results show that the proposed algorithm achieves a detection accuracy of 77.3% on the low-light dataset ExDark, which is 2.1 percentage points higher than that afforded by the original YOLOv5 object detection network, and its detection speed reaches 79 frame/s, which affords real-time detection.

Select

Graphics and Image Processing

Super-Resolution-Aided Small-Target Detection Based on Multi-Task Learning

ZHANG Tianpeng, HAN Jing, LÜ Xueqiang

Computer Engineering. 2024, 50(9): 304-312. https://doi.org/10.19678/j.issn.1000-3428.0069039

Abstract (309) Download PDF (432) HTML (29)

Knowledge map

Save

Small targets often exhibit low resolution and blurriness and are easily affected by occlusions and background interference, making accurate and real-time detection of small targets challenging. In this study, to enhance the detection performance, a super-resolution-aided small-target detection algorithm based on multi-task learning called Multi-YOLO is proposed. First, a super-resolution auxiliary branch is introduced to guide the main network in extracting effective features, thereby reducing the loss of information for small targets. Second, a collaborative supervision method is employed by combining Anchor based and Anchor free detection heads to improve the detection accuracy. Additionally, a CTR3 module is used at the end of the backbone network to strengthen the correlation between the target information and position awareness. Finally, during the inference stage, only the detection branch is used to maintain the speed of inference. Experimental results show that, compared with the baseline network, Multi-YOLO achieves performance improvement on the VEDAI, COCO MiniTrain, and SPCD datasets. Specifically, on the VEDAI dataset, this method achieves a 10.9% improvement in mean Average Precision (mAP) improvement while maintaining a model size similar to that of the baseline model. Moreover, compared with mainstream single-stage object detection networks, Multi-YOLO excels in small-target detection, maintaining a remarkable balance between accuracy and speed.

Select

Graphics and Image Processing

Research on Image Adversarial Example Generation Method Based on SE-AdvGAN

ZHAO Hong, SONG Furong, LI Wengai

Computer Engineering. 2025, 51(2): 300-311. https://doi.org/10.19678/j.issn.1000-3428.0068481

Abstract (307) Download PDF (370) HTML (12)

Knowledge map

Save

Adversarial examples are crucial for evaluating the robustness of Deep Neural Network (DNN) and revealing their potential security risks. The adversarial example generation method based on a Generative Adversarial Network (GAN), AdvGAN, has made significant progress in generating image adversarial examples; however, the sparsity and amplitude of the perturbation generated by this method are insufficient, resulting in lower authenticity of adversarial examples. To address this issue, this study proposes an improved image adversarial example generation method based on AdvGAN, Squeeze-and-Excitation (SE)-AdvGAN. SE-AdvGAN improves the sparsity of perturbation by constructing an SE attention generator and an SE residual discriminator. The SE attention generator is used to extract the key features of an image and limit the position of perturbation generation. The SE residual discriminator guides the generator to avoid generating irrelevant perturbation. Moreover, a boundary loss based on l₂ norm is added to the loss function of the SE attention generator to limit the amplitude of perturbation, thereby improving the authenticity of adversarial examples. The experimental results indicate that in the white box attack scenario, the SE-AdvGAN method has higher sparsity and smaller amplitude of adversarial example perturbation compared to existing methods and achieves better attack performance on different target models. This indicates that the high-quality adversarial examples generated by SE-AdvGAN can more effectively evaluate the robustness of DNN.

Select

Graphics and Image Processing

Image Classification Adversarial Example Defense Method Based on Conditional Diffusion Model

CHEN Zimin, GUAN Zhitao

Computer Engineering. 2024, 50(12): 296-305. https://doi.org/10.19678/j.issn.1000-3428.0068512

Abstract (307) Download PDF (356) HTML (17)

Knowledge map

Save

Deep-learning models have achieved impressive results in fields such as image classification; however, they remain vulnerable to interference and threats from adversarial examples. Attackers can craft small perturbations using various attack algorithms to create adversarial examples that are visually indistinguishable yet can lead to misclassification in deep neural networks, posing significant security risks to image classification tasks. To improve the robustness of these models, we propose an adversarial-example defense method that combines adversarial detection and purification using a conditional diffusion model, while preserving the structure and parameters of the target model during detection and purification. This approach features two key modules: adversarial detection and adversarial purification. For adversarial detection, we employ an inconsistency enhancement technique, training an image restoration model that integrates both the high-dimensional features of the target model and basic image features. By comparing the inconsistencies between the initial input and the restored output, adversarial examples can be detected. An end-to-end adversarial purification method is then applied, introducing image artifacts during the denoising process. An adversarial detection and purification module is placed before the target model to ensure its accuracy. Based on detection outcomes, appropriate purification strategies are implemented to remove adversarial examples and improve model robustness. The method was compared with recent adversarial detection and purification approaches on the CIFAR10 and CIFAR100 datasets, using five adversarial attack algorithms to generate adversarial examples. It demonstrated a 5-9 percentage points improvement in detection accuracy over Argos on both datasets in a low-purification setting. Additionally, it exhibited a more stable defense performance than Adaptive Denoising Purification(ADP), with a 1.3 percentage points higher accuracy under Backwards Pass Differentiable Approximation(BPDA) attacks.

Select

Artificial Intelligence and Pattern Recognition

Research on Path Planning of Mobile Robots Based on Autonomous Exploration

CHEN Hao, CHEN Jun, LIU Fei

Computer Engineering. 2025, 51(1): 60-70. https://doi.org/10.19678/j.issn.1000-3428.0068764

Abstract (305) Download PDF (308) HTML (27)

Knowledge map

Save

In path planning for mobile robots, challenges arise when dealing with unknown and dynamically changing environments, such as high collision rates with obstacles and susceptibility to local optima. To address these issues, this paper proposes an improved Twin Delayed Deep Deterministic (TD3) algorithm, based on TD3 policy gradient, to enhance the path-planning performance of mobile robots in unknown dynamic environments. First, a Long Short-Term Memory (LSTM) neural network is introduced and combined with the TD3 algorithm. Employing gate structures, historical state information is filtered to perceive the state changes of obstacles within the sensing range for the robot to gain a better understanding of the dynamic environment and movement patterns of obstacles. This enables the mobile robot to accurately predict and respond to the behavior of dynamic obstacles, thereby reducing the collision rate with obstacles. Second, Ornstein-Uhlenbeck(OU) exploration noise is incorporated to facilitate continuous exploration of the surrounding environment, thereby enhancing the robot's random exploration capability. Additionally, a single experience pool is divided into three separate pools-success, failure, and temporary-to improve the sampling efficiency of the effective samples and reduce training time. Finally, simulation experiments are conducted for two different scenarios involving a mixture of dynamic and static obstacles for path planning. A comparative analysis of the experimental results demonstrates that in scenario 1, the proposed algorithm reduces the convergence of the model by 100-200 rounds compared with the Deep Deterministic Policy Gradient (DDPG) and TD3 algorithms. Moreover, it shortens the path length by 0.5-0.8 units and reduces the planning time by 1-4 s. In scenario 2, the proposed algorithm reduces the convergence of the model by 100-300 rounds compared to the TD3 algorithm, shortening the path length by 1-3 units and reducing the planning time by 4-8 s. However, the DDPG algorithm fails as the mobile robot is unable to reach the destination successfully. Therefore, the improved algorithm exhibits superior path planning performance.

Select

Development Research and Engineering Application

Path Planning Based on Hybrid A^* and Modified RS Curve Fusion

ZHANG Boqiang, CHEN Xinming, FENG Tianpei, WU Lan, LIU Ningning, SUN Peng

Computer Engineering. 2025, 51(4): 373-382. https://doi.org/10.19678/j.issn.1000-3428.0068338

Abstract (297) Download PDF (275) HTML (5)

Knowledge map

Save

This paper proposes a path-planning method based on hybrid A^* and modified RS curve fusion to address the issue of unmanned transfer vehicles in limited scenarios being unable to maintain a safe distance from surrounding obstacles during path planning, resulting in collisions between vehicles and obstacles. First, a distance cost function based on the KD Tree algorithm is proposed and added to the cost function of the hybrid A^* algorithm. Second, the expansion strategy of the hybrid A^* algorithm is changed by dynamically changing the node expansion distance based on the surrounding environment of the vehicle, achieving dynamic node expansion and improving the algorithm's node search efficiency. Finally, the RS curve generation mechanism of the hybrid A^* algorithm is improved to make the straight part of the generated RS curve parallel to the boundary of the surrounding obstacles to meet the requirements of road driving in the plant area. Subsequently, the local path is smoothed to ensure that it meets the continuity of path curvature changes under the conditions of vehicle kinematics constraints to improve the quality of the generated path. The experimental results show that, compared with traditional algorithms, the proposed algorithm reduces the search time by 38.06%, reduces the maximum curvature by 25.2%, and increases the closest distance from the path to the obstacle by 51.3%. Thus, the proposed method effectively improves the quality of path generation of the hybrid A^* algorithm and can operate well in limited scenarios.

Select

Development Research and Engineering Application

Research and Design of Five-Stage Pipelined RISC-V Microprocessor

Xuezhen ZHANG, Xihu WANG, Siwan DONG, Yihong ZHANG

Computer Engineering. 2024, 50(8): 345-352. https://doi.org/10.19678/j.issn.1000-3428.0068146

Abstract (288) Download PDF (81) HTML (12)

Knowledge map

Save

To meet the requirements of low-overhead and high-performance applications in the embedded field, a 32 bit microprocessor based on the RISC-V open-source instruction set architecture has been developed. The processor adopts a five-stage pipeline structure of sequential launch, sequential execution, and out-of-order write back and realizes a combination of integer and multiply-divide instruction set modules. To cope with pipeline conflicts, the processor adopts a dynamic branch-prediction technique and designs data-correlation control and write back disorder mechanisms. The processor is designed using Verilog and builds a System on Chip (SoC) using an Advanced High-performance Bus (AHB) and an Advanced Peripheral Bus (APB) as interconnecting bus protocols. The processor logic functions are verified by writing RV32IM assembly instruction test programs in a simulation environment. Add timing constraints and physical constraints under the Vivado synthesis tool to perform logical synthesis on the processor code and analyze the utilization of processor hardware resources, and the synthesized code stream file is downloaded to the Xilinx Artix-7 (XC7A200T-2FBG484I) Field Programmable Gate Array(FPGA) development board and runs at a main frequency of 50 MHz. In the CoreMark program at 50 MHz, the CoreMark running points reaches 3.25 CoreMark/MHz. The validation results show that the processor performance running points is the same as that of the ARM Cortex-M3 series processors, and under the premise of the same technical comparison indexes, the processor running points is better than that of the RISC-V processor comparison item. The designed processor logic function is correct and uses a low hardware overhead to achieve relatively high-performance indicators suitable for cost-constrained high-performance embedded applications.

Select

Graphics and Image Processing

Image Captioning Method Based on Transformer Visual Features Fusion

Xuebing BAI, Jin CHE, Jinman WU, Yumin CHEN

Computer Engineering. 2024, 50(8): 229-238. https://doi.org/10.19678/j.issn.1000-3428.0068402

Abstract (288) Download PDF (94) HTML (10)

Knowledge map

Save

Existing image captioning methods only use regional visual features to generate description statements and ignore the importance of grid visual features. Moreover, as these methods are two-stage approaches, image captioning quality is affected. To address this issue, this study proposes an end-to-end image captioning method based on the visual feature fusion of Transformer. First, in the feature extraction stage, the visual feature extractor is used to extract regional and grid visual features. Second, in the feature fusion stage, the regional and grid visual features are concatenated using a visual feature fusion module. Finally, the visual features are sent to the language generator to realize image captioning. All components of the method are implemented based on the Transformer model, which is a one-stage method. The experimental results on the MS-COCO dataset show that the proposed method can fully utilize the respective advantages of regional and grid visual features, with the BLEU-1, BLEU-4, METEOR, ROUGE-L, CIDEr, and SPICE metrics reaching 83.1%, 41.5%, 30.2%, 60.1%, 140.3%, and 23.9%, respectively, indicating that the proposed method is superior to mainstream image captioning methods and can generate more accurate and rich description statements.

Select

Development Research and Engineering Application

Improved YOLOv8 Pedestrian Detection Algorithm for Long-Distance Situations

TANG Jingwen, LAI Huicheng, WANG Tongguan

Computer Engineering. 2025, 51(4): 303-313. https://doi.org/10.19678/j.issn.1000-3428.0068897

Abstract (279) Download PDF (183) HTML (16)

Knowledge map

Save

Pedestrian detection in intelligent community scenarios needs to accurately recognize pedestrians to address various situations. However, for persons who are occluded or at long distances, existing detectors exhibit problems such as missed detection, detection error, and large models. To address these problems, this paper proposes a pedestrian detection algorithm, Multiscale Efficient-YOLO (ME-YOLO), based on YOLOv8. An efficient feature Extraction Module (EM) is designed to improve network learning and capture pedestrian features, which reduces the number of network parameters and improves detection accuracy. The reconstructed detection head module reintegrates the detection layer to enhance the network's ability to recognize small targets and effectively detect small target pedestrians. A Bidirectional Feature Pyramid Network (BiFPN) is introduced to design a new neck network, namely the Bidirectional Dilated Residual-Feature Pyramid Network (BDR-FPN), and the expanded residual module and weighted attention mechanism expand the receptive field and learn pedestrian features with emphasis, thereby alleviating the problem of network insensitivity to occluded pedestrians. Compared with the original YOLOv8 algorithm, ME-YOLO increases the AP₅₀ by 5.6 percentage points, reduces the number of model parameters by 41%, and compresses the model size by 40% after training and verification based on the CityPersons dataset. ME-YOLO also increases the AP₅₀ by 4.1 percentage points and AP_50∶95 by 1.7 percentage points on the TinyPerson dataset. Moreover, the algorithm significantly reduces the number of model parameters and model size and effectively improves detection accuracy. This method has a considerable application value in intelligent community scenarios.

Select

Development Research and Engineering Application

Student Classroom Behavior Detection Algorithm Based on Improved YOLOv8 in Smart Education

ZENG Yuqi, LIU Bo, ZHONG Baichang, ZHONG Jin

Computer Engineering. 2024, 50(9): 344-355. https://doi.org/10.19678/j.issn.1000-3428.0069597

Abstract (279) Download PDF (209) HTML (38)

Knowledge map

Save

To accelerate the digital transformation of education, the precise analysis and empirical application of AI technology integrated into the entire process of teaching and learning behaviors have become a current research hotspot. To address the problems of low detection accuracy, high density of bonding boxes, severe overlap and occlusion, large scale variations, and imbalance of data volume in student classroom behavior detection, this paper establishes a student classroom behavior dataset (DBS Dataset). Additionally, it proposes a student classroom behavior detection algorithm VWE-YOLOv8 based on improved YOLOv8. First, it introduces the CSWin-Transformer attention mechanism to enhance the model's capability to extract global information from images. This improves the network's detection accuracy. Second, it increases the model's recognition capability on multi-scale targets by integrating the Large Separable Kernel Attention (LSKA) module into the SPPF architecture. Additionally, it incorporates an occlusion-aware attention mechanism into the design of the detection head (which modifies the original Head structure to SEAMHead) to effectively detect occluded objects. Finally, it introduces a weight adjustment function (Slide Loss) to address the issue of sample imbalance. The experimental results reveal that compared with YOLOv8, the improved VWE-YOLOv8 achieves increases of 1.16% and 1.70% in mAP@0.50 and 7.36% and 2.13% in mAP@0.50∶0.95, on the DBS Dataset and public SCB Dataset. Furthermore, it improves the precision by 4.17%, 6.74% and recall rate by 1.96% and 3.13% on these datasets, respectively. These results indicate that the improved algorithm has a higher detection accuracy and stronger generalization capability. Moreover, it is capable of detecting students' classroom behaviors. This can strongly support the application of smart education and aid the digital transformation of education.

Select

Graphics and Image Processing

Road Traffic Small Target Vehicle Detection Algorithm Based on Improved YOLOv8

HUO Jiuyuan, SU Hongrui, WU Zeyu, WANG Tingjuan

Computer Engineering. 2025, 51(1): 246-257. https://doi.org/10.19678/j.issn.1000-3428.0069825

Abstract (274) Download PDF (224) HTML (14)

Knowledge map

Save

To address the issues of identification difficulties, low detection accuracy, misdetection, and missing detection of small target vehicles on traffic roads, this study proposes a road traffic small target vehicle detection model, RGGE-YOLOv8, based on the YOLOv8 algorithm with a large kernel and multi-scale gradient combination. First, the RepLayer model replaces the backbone of the YOLOv8 network, and depthwise separable convolution is introduced to expand the context information, thereby enhancing the ability of the model to capture information on small targets. Second, the Complete IoU loss (GIoU) replaces the original loss function to address the issue where the IoU cannot be optimized when there is no overlap. Subsequently, a Global Attention Mechanism (GAM) is introduced to improve the feature representation capability of the network by reducing information loss and enhancing global interactive information. Finally, CSPNet is incorporated, and the gradient combination feature pyramid is parameterized to ensure that the model achieves a large receptive field and high shape deviation. The experimental results indicate that the mAP@0.5 index of the improved algorithm on the Visdrone dataset and the custom dataset reaches 34.8% and 94.7%, respectively. The overall accuracy of the improved algorithm is 2.2 percentage points and 5.51 percentage points higher than that of the original YOLOv8n algorithm. These findings demonstrate the practicability of the RGGE-YOLOv8 model for small target vehicle detection on traffic roads.

Select

Artificial Intelligence and Pattern Recognition

Design and Implementation of Rice Planting Intelligent Question-Answering System Based on Knowledge Graph

GAO Ruitao, LIN Dawei, GUO Liang, JIN Hong, WANG Hong

Computer Engineering. 2024, 50(12): 133-141. https://doi.org/10.19678/j.issn.1000-3428.0068464

Abstract (260) Download PDF (223) HTML (17)

Knowledge map

Save

With the development of agricultural information technology, a substantial amount of rice planting-related data has been accumulated on the Internet. To address the challenges that farmers face in quickly obtaining accurate information during the planting process, an intelligent question-answering system is constructed based on a knowledge graph, specifically for rice planting. First, relevant data are obtained through manual collection as well as web crawler technology. Natural language processing techniques, such as the named entity recognition model and an intent recognition model, are built in conjunction with front- and back-end technologies to develop an intelligent question-answering system for rice planting. Experimental results show that in the named entity recognition and intent recognition modules, the F1 values of the constructed models reach 89.17% and 96.54%, respectively, which are higher than those of other conventional models. The intelligent rice planting question-answering system, based on knowledge graph, can accurately answer most inquiries farmers encounter during the process of rice planting, facilitating the management and visualization of rice planting knowledge graph data.

Select

Artificial Intelligence and Pattern Recognition

Key Node Recognition Method Based on Layer Partitioning and Node Features

FU Lidong, AI Xiaotong, DOU Zengfa

Computer Engineering. 2024, 50(12): 142-150. https://doi.org/10.19678/j.issn.1000-3428.0069040

Abstract (251) Download PDF (96) HTML (8)

Knowledge map

Save

Critical node detection has become an important research domain in complex networks; however, current critical node detection methods suffer from high algorithm time complexity, inaccurate critical node sets obtained, and insufficient consideration of node centrality indicators. Based on this, this study first presents a critical node-detection framework based on layer and node features. This framework introduces a layer-partitioning-based method to enhance the efficiency of selecting the initial coverage set of critical nodes, allowing for the calculation of the initial node set in linear time. Subsequently, the nodes were added back to the original network through node centrality feature indicators until the number of nodes in the solution set met a predefined threshold. To overcome the challenge of local optima during node re-addition, a node centrality index was developed, taking into account the network topology and various node attributes. Experimental results from real networks, after comparing five initial coverage set selection algorithms and five centrality indices, indicate that the proposed method, utilizing layer partitioning and node features, offers more accurate and efficient detection of critical nodes in different network types with enhanced robustness and performance over existing methods.

Select

Development Research and Engineering Application

Mobile Phone Screen Defect Detection Algorithm Based on Improved YOLOv8n: PGS-YOLO

ZHOU Siyu, XU Huiying, ZHU Xinzhong, HUANG Xiao, SHENG Ke, CAO Yuqi, CHEN Chen

Computer Engineering. 2025, 51(5): 326-339. https://doi.org/10.19678/j.issn.1000-3428.0069259

Abstract (250) Download PDF (111) HTML (11)

Knowledge map

Save

As the main window of human-computer interaction, the mobile phone screen has become an important factor affecting the user experience and the overall performance of the terminal. As a result, there is a growing demand to address defects in mobile phone screens. To meet this demand, in view of the low detection accuracy, high missed detection rate of small target defects, and slow detection speed in the process of defect detection on mobile phone screens, a PGS-YOLO algorithm is proposed, with YOLOv8n as the benchmark model. PGS-YOLO effectively improves the detection ability of small targets by adding a special small target detection head and combining it with the SeaAttention attention module. The backbone and feature fusion networks are integrated into PConv and GhostNetV2 lightweight modules, respectively, to ensure accuracy, reduce the number of model parameters, and improve the speed and efficiency of defect detection. The experimental results show that, in the dataset of mobile phone screen surface defects from Peking University, compared with the results of YOLOv8n, the mAP@0.5 and mAP@0.5∶0.95 of the PGS-YOLO algorithm are increased by 2.5 and 2.2 percentage points, respectively. The algorithm can accurately detect large defects in the process of mobile phone screen defect detection as well as maintain a certain degree of accuracy for small defects. In addition, the detection performance is better than that of most YOLO series algorithms, such as YOLOv5n and YOLOv8s. Simultaneously, the number of parameters is only 2.0×10⁶, which is smaller than that of YOLOv8n, meeting the needs of industrial scenarios for mobile phone screen defect detection.

Select

Intelligent Situational Awareness and Computing

Design of Multi-Agent Angle Tracking Method Based on Deep Reinforcement Learning

BI Qian, QIAN Cheng, ZHANG Ke, WANG Cheng

Computer Engineering. 2024, 50(11): 10-17. https://doi.org/10.19678/j.issn.1000-3428.0069710

Abstract (241) Download PDF (438) HTML (33)

Knowledge map

Save

In intelligent situational awareness application scenarios, multi-agent angle tracking problems often occur when moving targets must be monitored and controlled. In contrast to traditional target tracking, the angle tracking task entails not only tracking the spatial coordinates of the target, but also determining the relative angles between targets. Existing control methods often exhibit unstable effects and reduced performance when addressing large-scale problems that are susceptible to environmental changes. To address this problem, the present study proposes a solution scheme based on Multi-Agent Reinforcement Learning(MARL). First, a basic model of the multi-agent angle tracking problem is established, a multi-level simulation decision-making framework is designed, and an adaptive method is proposed for this problem. As a stronger multi-agent reinforcement learning algorithm, AR-MAPPO enhances learning efficiency and model stability by dynamically adjusting the number of data reuse rounds. The experimental results show that the proposed method achieves higher convergence efficiency and better angle tracking performance than traditional methods and other reinforcement learning methods in multi-agent angle tracking tasks.

Select

Artificial Intelligence and Pattern Recognition

YGL-SLAM: Point and Line Based Semantic SLAM System for Dynamic Scenes

DAI Kangjia, XU Huiying, ZHU Xinzhong, LI Xiyu, HUANG Xiao, CHEN Guoqiang, ZHANG Zhixiong

Computer Engineering. 2025, 51(3): 95-104. https://doi.org/10.19678/j.issn.1000-3428.0068950

Abstract (239) Download PDF (264) HTML (22)

Knowledge map

Save

Traditional vision Simultaneous Localization And Mapping(SLAM) systems are based on the assumption of a static environment. However, real scenes often have dynamic objects, which may lead to decreased accuracy, deterioration of robustness, and even tracking loss in SLAM position estimation and map construction. To address these issues, this study proposes a new semantic SLAM system, named YGL-SLAM, based on ORB -SLAM2. The system first uses a lightweight target detection algorithm named YOLOv8n, to track dynamic objects and obtain their semantic information. Subsequently, both point and line features are extracted from the tracking thread, and the dynamic features are culled based on the acquired semantic information using the Z-score and parapolar geometry algorithms to improve the performance of SLAM in dynamic scenes. Given that lightweight target detection algorithms suffer from missed detection in consecutive frames when tracking dynamic objects, this study designs a detection compensation method based on neighboring frames. Testing on the public datasets TUM and Bonn reveals that YGL-SLAM system improves detection performance by over 90% compared to ORB-SLAM2, while demonstrating superior accuracy and robustness compared to other dynamic SLAM.

Select

Research Hotspots and Reviews

Research Progress on Graph Neural Network Classification Methods for Alzheimer's Disease

GU Yuheng, PAN Jiacheng, QIAN Jiangbo, DONG Yihong

Computer Engineering. 2024, 50(10): 35-50. https://doi.org/10.19678/j.issn.1000-3428.0068719

Abstract (238) Download PDF (416) HTML (11)

Knowledge map

Save

Alzheimer's Disease (AD) is an irreversible neurodegenerative disorder that leads to gradual cognitive decline. The evolution of AD symptoms can be long, with subtle changes in biomarkers in brain regions that are detectable by different neuroimaging modalities; however, early detection is challenging. Given the high complexity of neuroimaging data and the irregularity of brain networks, traditional machine learning, and deep neural network models exhibit many shortcomings, and the development of Computer-Aided Diagnostic(CAD) models based on Graph Neural Network (GNN) can be beneficial for probing biomarkers and analyzing neuroimaging patterns in non-Euclidean space. First, a detailed investigation and overview of AD prediction based on GNN classification methods is carried out. Subsequently, an analysis is conducted from the two perspectives of single- and multi-modal data, with a focus on discussing and analyzing the processes of data extraction, brain network modeling, feature learning, and information fusion within the context of single- and multi-modal data applications. A performance evaluation is provided for certain methods. Finally, the primary challenges and future research directions for the application of GNNs in AD diagnosis are outlined to provide beneficial suggestions for further research on AD-assisted diagnosis.

Select

Research Hotspots and Reviews

Soundscape Recognition: Explorations and Frontiers of Acoustic Scene Classification in the Digital Era

PANG Xin, GE Fengpei, LI Yanling

Computer Engineering. 2025, 51(6): 1-19. https://doi.org/10.19678/j.issn.1000-3428.0069005

Abstract (238) Download PDF (152) HTML (22)

Knowledge map

Save

Acoustic Scene Classification (ASC) aims to enable computers to simulate the human auditory system in the task of recognizing various acoustic environments, which is a challenging task in the field of computer audition. With rapid advancements in intelligent audio processing technologies and neural network learning algorithms, a series of new algorithms and technologies for ASC have emerged in recent years. To comprehensively present the technological development trajectory and evolution in this field, this review systematically examines both early work and recent developments in ASC, providing a thorough overview of the field. This review first describes application scenarios and the challenges encountered in ASC and then details the mainstream frameworks in ASC, with a focus on the application of deep learning algorithms in this domain. Subsequently, it systematically summarizes frontier explorations, extension tasks, and publicly available datasets in ASC and finally discusses the prospects for future development trends in ASC.

Select

Graphics and Image Processing

Strip Steel Surface Defect Detection Algorithm Based on Improved YOLOv7-tiny

YANG Lisha, LI Maojun, HU Jianwen, WANG Dingxiang

Computer Engineering. 2025, 51(1): 208-215. https://doi.org/10.19678/j.issn.1000-3428.0068397

Abstract (235) Download PDF (92) HTML (10)

Knowledge map

Save

An improved YOLOv7-tiny detection algorithm is proposed to address several challenges, including low efficiency in small-target detection, inaccurate defect localization, excessive parameters in the detection algorithm, and difficulties in deploying the model on terminal equipment for surface defect detection on strip steel. First, GSConv is introduced to replace the standard convolution in the Neck network, followed by the design of an improved and efficient aggregation network, ELAN-G, based on GSConv, which reduces the model parameters while ensuring adequate fusion of strip steel surface defect features. Second, the SPDConv module is integrated between the Head and Neck networks to improve detection of low-resolution and small defects. The module generates an intermediate feature map, which is subsequently filtered and processed to obtain the final feature map, improving the detection accuracy of the Head network for small defects. Finally, the MPDIoU loss function is adopted to leverage the geometric properties of the bounding regression box, simplifying the loss calculation process and enhancing defect localization accuracy. The experimental results indicate that the improved algorithm outperforms six other advanced target detection algorithms on the NEU-DET dataset, demonstrating a more balanced performance. The mean Average Precision (mAP) of the improved algorithm reaches 74.1%, while the parameter count and computational requirements are lower than those of all comparative algorithms, making it suitable for deployment in a steel surface defect detection system within an industrial environment.

Select

Graphics and Image Processing

Dense Pedestrian Detection Algorithm Based on Improved YOLOv5

HU Qian, PI Jianyong, HU Weichao, HUANG Kun, WANG Juanmin

Computer Engineering. 2025, 51(3): 216-228. https://doi.org/10.19678/j.issn.1000-3428.0068753

Abstract (231) Download PDF (191) HTML (31)

Knowledge map

Save

Considering the problem of low accuracy in existing pedestrian detection methods for dense or small target pedestrians, this study proposes a comprehensive improved algorithm model called YOLOv5_Conv-SPD_DAFPN based on You Only Look Once (YOLO) v5, a non-strided Convolution Space-to-Depth (Conv-SPD), and Double Asymptotic Feature Pyramid Network (DAFPN). First, to address the issue of feature information loss for small targets or dense pedestrians, a Conv-SPD network module is introduced into the backbone network, to replace the original skip convolution, thereby effectively mitigating the problem of feature information loss. Second, to solve the problem of low feature fusion rates caused by nonadjacent feature maps not directly merging, this study proposes DAFPN to significantly improve the accuracy and precision of pedestrian detection. Finally, based on Efficient Intersection over Union (EIoU) and Complete-IoU (CIoU) losses, this study introduces the EfficiCIoU_Loss location loss function to adjust and accelerate the frame regression rate, thereby promoting faster convergence of the network model. The algorithm model improved mAP@0.5 and mAP@0.5∶0.95 by 3.9, 5.3 and 2.1, 2.1 percentage points, respectively, compared to the original YOLOv5 model on the CrowdHuman and WiderPerson pedestrian datasets. After introducing EfficiCIoU_Loss, the model convergence speed improved by 11% and 33%, respectively. These innovative improvements have led to significant progress in dense pedestrian detection based on YOLOv5 in terms of feature information retention, multiscale fusion, and loss function optimization, thereby enhancing performance and efficiency in practical applications.

Select

Graphics and Image Processing

Performance Optimization Technique for Large-Scale Parallel Volume Rendering Based on Multiple Rendering Pipelines

Huawei WANG, Ruoyan LIU, Zhiwei AI, Yi CAO

Computer Engineering. 2024, 50(8): 207-215. https://doi.org/10.19678/j.issn.1000-3428.0067530

Abstract (231) Download PDF (52) HTML (6)

Knowledge map

Save

For large-scale scientific data output in numerical simulations, volume rendering methods inevitably perform high-density ray sampling to capture complex physical features, resulting in significant computational overhead and data increment. However, on domestic autonomous-CPU supercomputers, owing to the lower computing power of a single processor core compared to that of commercial CPU, more processor cores must be used to share volume rendering tasks; this leads to scalability bottlenecks in the parallel communication of sampling data. Full utilization of domestic autonomous-CPU supercomputers to efficiently complete volume rendering tasks is an urgent problem that needs to be solved. To address this problem, this paper proposes a performance optimization technique for large-scale parallel volume rendering based on multiple rendering pipelines; here, the parallel scale of a rendering pipeline is reduced by two-level parallelism: first, at the pipeline level, and then, at the process level. In large-scale parallel volume rendering after optimization, the rendered goal image is first divided into multiple sub-regions, and all rendering processes are grouped accordingly. Each process group then executes a rendering pipeline independently, and as a result, the corresponding sub-region of the image is produced. Finally, all sub-regions of the image are collected, and the whole image is output. Experiments demonstrate that the optimized volume rendering algorithm can scale to approximately 10 000 processing cores on domestic autonomous-CPU supercomputers and can effectively complete volume rendering tasks.

Most Read

Please choose a citation manager

Content to export

模态框（Modal）标题

Most Read

Please choose a citation manager

Content to export