Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Most Read

  • Published in last 1 year
  • In last 2 years
  • In last 3 years
  • All

Please wait a minute...
  • Select all
    |
  • Development Research and Engineering Application
    HOU Yutao, Abudukelimu Abulizi, SHI Yaqing, Mayilamu Musideke, Halidanmu Abudukelimu
    Computer Engineering. 2024, 50(4): 332-341. https://doi.org/10.19678/j.issn.1000-3428.0068700
    With the development of the ″Belt and Road″ initiative, the demand for cross-language communication between countries and regions along the ″Belt and Road″ has grown, and Machine Translation(MT) technology has gradually become an important means of in-depth exchange between countries. However, owing to the abundance of low-resource languages and scarcity of language materials in these countries, progress in machine translation research has been relatively slow. This paper proposes a low-resource language machine translation training method based on the NLLB model. An improved training strategy based on a multilingual pre-training model is deployed to optimize the loss function under the premise of data augmentation, thereby effectively improving the translation performance of low-resource languages in machine translation tasks. The ChatGPT and ChatGLM models are used to evaluate translation performance for Laotian-Chinese and Vietnamese-Chinese, respectively. Large Language Models (LLM) are already capable of translating low-resource languages, and the ChatGPT model significantly outperforms the traditional Neural Machine Translation (NMT) model in Vietnamese-Chinese translation tasks. However, the translation of Laotian requires further improvement. The experimental results show that compared to the NLLB-600M baseline model, the proposed model achieves average improvements of 1.33 in terms of BiLingual Evaluation Understudy (BLEU) score and 0.82 in terms of chrF++ score in Chinese translation tasks for four low-resource languages. These results fully demonstrate the effectiveness of the proposed method in low-resource language machine translation. In another experiment, this method uses the ChatGPT and ChatGLM models to conduct preliminary studies on Laotian-Chinese and Vietnamese-Chinese, respectively. In Vietnamese-Chinese translation tasks, the ChatGPT model significantly outperformed the traditional NMT models with a 9.28 improvement in BLEU score and 3.12 improvement in chrF++ score.
  • Artificial Intelligence and Pattern Recognition
    ZHAO Jida, ZHEN Guoyong, CHU Chengqun
    Computer Engineering. 2024, 50(4): 113-120. https://doi.org/10.19678/j.issn.1000-3428.0068268
    Abstract (1668) Download PDF (1641) HTML (167)   Knowledge map   Save
    In the Unmanned Aerial Vehicle(UAV) target detection task, missed and false detections are caused by the small size of the detection target and complex background of the detection image. To address the problem of small target detection, the UAV image target detection algorithm is proposed by improving YOLOv8s. First, for application scenarios where drone shooting targets are generally small, the number of Backbone layers of the algorithm is reduced, and the size of the feature map to be detected is increased such that the network model can focus more on small targets. Second, because a certain number of low-quality examples commonly influence the training effect in the dataset, the Wise-IoU loss function is introduced to enhance the training effect of the dataset. Third, by introducing a context enhancement module, the characteristic information of small targets in different receptive fields is obtained, and the positioning and classification effect of the network model on small targets in complex environments is improved. Finally, a spatial-channel filtering module is designed to enhance the characteristic information of the target during the convolution process to filter out useless interference information and address the problem of some small target characteristic information being submerged and lost during the convolution process. Experiment results on the VisDrone2019 dataset demonstrate that the average detection accuracy(mAP@0.5) of the proposed algorithm reaches 45.4%, which is 7.3 percentage points higher than that of the original YOLOv8s algorithm, and the number of parameters is reduced by 26.13%. Under similar experimental conditions, compared with other common small target detection algorithms, the detection accuracy and speed are improved to a certain extent.
  • Research Hotspots and Reviews
    XIONG Shiqiang, HE Daojing, WANG Zhendong, DU Runmeng
    Computer Engineering. 2024, 50(5): 1-15. https://doi.org/10.19678/j.issn.1000-3428.0067782
    Abstract (1545) Download PDF (3111) HTML (116)   Knowledge map   Save
    Federated Learning (FL) is a new distributed machine earning technology that only requires local maintenance of data and can train a common model through the cooperation of all parties, which mitigates issues pertaining to data collection and privacy security in conventional machine learning. However, with the application and development of FL, it is still exposed to various attacks. To ensure the security of FL, the attack mode in FL and the corresponding privacy protection technology must be investigated. Herein, first, the background knowledge and relevant definitions of FL are introduced, and the development process and classification of FL are summarized. Second, the security three elements of FL are expounded, and the security issues and research progress of FL are summarized from two perspectives based on security sources and the security three elements. Subsequently, privacy protection technologies are classified. This paper summarizes four common privacy protection technologies used in FL: Secure Multiparty Computing (SMC), Homomorphic Encryption (HE), Differential Privacy (DP), and Trusted Execution Environment (TEE). Finally, the future research direction for FL is discussed.
  • Cyberspace Security
    Huawei SONG, Shengqi LI, Fangjie WAN, Yuping WEI
    Computer Engineering. 2024, 50(3): 166-172. https://doi.org/10.19678/j.issn.1000-3428.0067791
    Abstract (1180) Download PDF (1186) HTML (67)   Knowledge map   Save

    Federated Learning(FL) can collaborate to train global models without compromising data privacy. Nonetheless, this collaborative training approach faces the challenge of Non-IID in the real world; slow model convergence and low accuracy. Numerous existing FL methods improve only from one perspective of global model aggregation and local client update, and inevitably will not cause the impact of the other perspective and reduce the quality of the global model. In this context, we introduce a hierarchical continuous learning optimization method for FL, denoted as FedMas, which is based on the idea of hierarchical fusion. First, clients with similar data distribution are divided into different layers using the DBSCAN algorithm, and only part of clients of a certain layer are selected for training each time to avoid weight differences caused by different data distributions when the server global model is aggregated. Further, owing to the different data distributions of each layer, the client combines the solution of continuous learning catastrophic forgetting during local update to effectively integrate the differences between the data of different layers of clients, thus ensuring the performance of the global model. Experiments on MNIST and CIFAR-10 standard datasets demonstrate that the global model test accuracy is improved by 0.3-2.2 percentage points on average compared with FedProx, Scaffold, and FedCurv FL algorithms.

  • Research Hotspots and Reviews
    WEI Wei, DING Xiangxiang, GUO Mengxing, YANG Zhao, LIU Hui
    Computer Engineering. 2024, 50(9): 18-32. https://doi.org/10.19678/j.issn.1000-3428.0068086
    Abstract (1062) Download PDF (1240) HTML (84)   Knowledge map   Save

    Text similarity calculation is a part of natural language processing and is used to calculate the similarity between two words, sentences, or texts in many application scenarios. Research on text similarity calculation plays an important role in the development of artificial intelligence. Text similarity calculation has conventionally been based on character string surfaces. With the introduction of word vectors, text similarity calculation can be modeled and calculated based on statistics and deep learning, in addition to combining it with pre-trained models. First, text similarity calculation methods can be divided into five categories: character string-based, word vector-based, pre-trained model-based, deep learning-based, and other methods. Each category is briefly introduced. Subsequently, according to the principles of the different text similarity calculation methods, common methods such as the edit distance, Hamming distance, bag of words model, Vector Space Model (VSM), Deep Structured Semantic Model (DSSM), and Simple Contrastive learning of Sentence Embedding (SimCSE) are discussed. Finally, commonly used data sets and evaluation criteria for text similarity calculation are sorted and analyzed, and the future development of text similarity calculation is prospected.

  • Graphics and Image Processing
    WANG Shumeng, XU Huiying, ZHU Xinzhong, HUANG Xiao, SONG Jie, LI Yi
    Computer Engineering. 2025, 51(9): 280-293. https://doi.org/10.19678/j.issn.1000-3428.0069353

    In Unmanned Aerial Vehicle (UAV) aerial photography, targets are usually small targets with dense distribution and unobvious features, and the object scale varies greatly. Therefore, the problems of missing detection and false detection are easy to occur in object detection. In order to solve these problems, a lightweight small object detection algorithm based on improved YOLOv8n, namely PECS-YOLO, is proposed for aerial photography. By adding P2 small object detection layer in the Neck part, the algorithm combines shallow and deep feature maps to better capture details of small targets. A lightweight convolution, namely PartialConv, is introduced to a new structure of Cross Stage Partial PartialConv (CSPPC), to replace Concatenation with Fusion (C2f) in the Neck network to realized lightweight of the model. By using a model of Spatial Pyramid Pooling with Efficient Layer Aggregation Network (SPPELAN), small object features can be captured effectively. By adding Squeeze-and-Excitation (SE)attention mechanism in front of each detection head in the Neck part, the network can better focus on useful channels and reduce the interference of background noise on small object detection tasks in complex environments. Finally, EfficiCIoU is used as the boundary frame loss function, and the shape difference of the boundary frame is also taken into account, which enhances the detection ability of the model for small targets. Experimental results show that, compared YOLOv8n, the mean Average Precision at Intersection over Union (IoU) of 0.5 (mAP@0.5) and the mean Average Precision at IoU of 0.5∶0.95 (mAP@0.5∶0.95) of PECS-YOLO object detection algorithm on VisDrone2019-DET dataset are increased by 3.5% and 3.7% respectively, the number of parameters is reduced by about 25.7%, and detection speed is increased by about 65.2%. In summary, PECS-YOLO model is suitable for small object detection in UAV aerial photography.

  • Artificial Intelligence and Pattern Recognition
    LI Jingcan, XIAO Cuilin, QIN Xiaoting, XIE Xia
    Computer Engineering. 2024, 50(4): 87-94. https://doi.org/10.19678/j.issn.1000-3428.0068501
    Relation extraction is a basic and important task that aims to extract the relations between entities from unstructured text. Recent developments show that Large-Language Model (LLM) and basic models can improve the performance of several Natural Language Processing (NLP) tasks. These models utilize the language-representation ability of deep-learning and pre-training models and can automatically learn the semantic features of relations. A method to effectively use of a large model for solving the problems of entity overlap and unsatisfactory information exchange is yet to be revealed. Hence, a relational-extraction model based on large language is proposed. First, the Large-Language model Meta AI (LLaMA) is adapted to the task in this study via fine-tuning. To extract relations, the self-attention mechanism is used to enhance the correlation between entity pairs and information sharing between entities. Subsequently, average pooling is performed to generalize an entire sentence. A filtering matrix is designed for entity pairs, part-of-speech information is introduced to enhance semantics, and invalid triples are filtered out based on the relevance of entity pairs in the filtering matrix. Experimental results show that the F1 value results of the proposed model on the New York Times (NYT) and WebNLG open datasets are 93.1% and 90.4%, respectively. In the case where the LLaMA model becomes an encoder after fine-tuning, the proposed algorithm is superior to the baseline model in terms of accuracy and the F1 value index, thus verifying its effectiveness.
  • Artificial Intelligence and Pattern Recognition
    PENG Juhong, ZHANG Chi, GAO Qian, ZHANG Guangming, TAN Donghua, ZHAO Mingjun
    Computer Engineering. 2025, 51(7): 152-160. https://doi.org/10.19678/j.issn.1000-3428.0069283

    Steel surface defect detection technology in industrial scenarios is hindered by low detection accuracy and slow convergence speed. To address these issues, this study presents an improved YOLOv8 algorithm, namely a YOLOv8n-MDC. First, a Multi-scale Cross-fusion Network (MCN) is added to the backbone network. Establishing closer connections between the feature layers promotes uniform information transmission and reduces semantic information loss during cross-layer feature fusion, thereby enhancing the ability of the model to perceive steel defects. Second, deformable convolution is introduced in the module to adaptively change the shape and position of the convolution kernel, enabling a more flexible capture of the edge features of irregular defects, reducing information loss, and improving detection accuracy. Finally, a Coordinate Attention (CA) mechanism is added to embed position information into channel attention, solving the problem of position information loss and enabling the model to perceive the position and morphological features of defects, thereby enhancing detection precision and stability. Experimental results on the NEU-DET dataset show that the YOLOv8n-MDC algorithm achieves mAP@0.5 of 81.0%, which is 4.2 percentage points higher than that of the original baseline network. The algorithm has a faster convergence speed and higher accuracy; therefore, it meets the requirements of practical industrial production.

  • Research Hotspots and Reviews
    REN Shuyu, WANG Xiaoding, LIN Hui
    Computer Engineering. 2024, 50(12): 16-32. https://doi.org/10.19678/j.issn.1000-3428.0068553

    The superior performance of Transformer in natural language processing has inspired researchers to explore their applications in computer vision tasks. The Transformer-based object detection model, Detection Transformer (DETR), treats object detection as a set prediction problem, introducing the Transformer model to address this task and eliminating the proposal generation and post-processing steps that are typical of traditional methods. The original DETR model encounters issues related to slow training convergence and inefficiency in detecting small objects. To address these challenges, researchers have implemented various improvements to enhance DETR performance. This study conducts an in-depth investigation of both the basic and enhanced modules of DETR, including modifications to the backbone architecture, query design strategies, and improvements to the attention mechanism. Furthermore, it provides a comparative analysis of various detectors and evaluates their performance and network architecture. The potential and application prospects of DETR in computer vision tasks are discussed herein, along with its current limitations and challenges. Finally, this study analyzes and summarizes related models, assesses the advantages and limitations of attention models in the context of object detection, and outlines future research directions in this field.

  • Smart Education
    Huiqian LI, Baichang ZHONG
    Computer Engineering. 2024, 50(7): 1-12. https://doi.org/10.19678/j.issn.1000-3428.0069539

    The deep integration of knowledge graphs with education has promoted the development of smart education. However, there is a lack of literature on educational knowledge graphs currently, necessitating its improvement with regard to research normativity and content perspective. Four conclusions are presented from a systematic literature review of 55 important Chinese journal articles from the previous decade. First, the development of educational knowledge graphs requires five key technologies: ontology construction, knowledge extraction, knowledge representation, knowledge fusion, and knowledge reasoning. Deep learning methods are becoming a popular research topic in this context. Second, in the context of applicability, the educational knowledge graphs cover six application scenarios: personalized learning recommendations, intelligent Question-Answering (Q&A), teaching resource management, intelligent search, intelligent learning diagnosis, and classroom teaching analysis, and the horizon of applications is continuously expanding. Third, regarding application effects, the educational knowledge graphs promote personalized learning and fragmented ubiquitous learning of students while improving their learning performance as well as professionalism of teachers. Fourth, the education knowledge graphs suffer from several problems and challenges, such as single data modality, lack of quality datasets, low level of automation and borderline technology, high level of difficulty in knowledge modeling, insufficient competence care, lack of interoperability standards, and low rate of educational adoption. Hence, for further insight into the study, future research should refine the theory and establish standards, optimize techniques, achieve accurate modeling, and strengthen applications and lifting effects.

  • Graphics and Image Processing
    ZHAO Nannan, GAO Feichen
    Computer Engineering. 2025, 51(1): 198-207. https://doi.org/10.19678/j.issn.1000-3428.0068677

    An instance segmentation algorithm (DE-YOLO) based on the improved YOLOv8 is proposed. To decrease the effect of complex backgrounds in the images, efficient multiscale attention is introduced, and cross-dimensional interaction ensures an even spatial feature distribution within each feature group. In the backbone network, a deformable convolution using DCNv2 is combined with a C2f convolutional layer to overcome the limitations of traditional convolutions and increase flexibility. This is performed to reduce harmful gradient effects and improve the overall accuracy of the detector. The dynamic nonmonotonic Wise-Intersection-over-Union (WIoU) focusing mechanism is employed instead of the traditional Complete Intersection-over-Union (CIoU) loss function to evaluate the quality, optimize detection frame positioning, and improve segmentation accuracy. Meanwhile, Mixup data enhancement processing is enabled to enrich the training features of the dataset and improve the learning ability of the model. The experimental results demonstrate that DE-YOLO improves the mean Average Precision of mask(mAPmask) and mAPmask@0.5 by 2.0 and 3.2 percentage points compared with the benchmark model YOLOv8n-seg in the Cityscapes dataset of urban landscapes, respectively. Furthermore, DE-YOLO maintains an excellent detection speed and small parameter quantity while exhibiting improved accuracy, with the model requiring 2.2-31.3 percentage points fewer parameters than similar models.

  • Research Hotspots and Reviews
    Zhe LIAN, Yanjun YIN, Fei YUN, Min ZHI
    Computer Engineering. 2024, 50(3): 16-27. https://doi.org/10.19678/j.issn.1000-3428.0067427

    Natural scene text detection technology based on deep learning has become a crucial research focal point in the fields of computer vision and natural language processing. Not only does it possess a wide range of potential applications but also serves as a new platform for researchers to explore neural network models and algorithms. First, this study introduces the relevant concepts, research background, and current developments in natural scene text detection technology. Subsequently, an analysis of recent deep learning-based text detection methods is performed, categorizing them into four classes: detection boxes-, segmentation-, detection-boxes and segmentation-based, and others. The fundamental concepts and main algorithmic processes of classical and mainstream methods within these four categories are elaborated, summarizing the usage mechanisms, applicable scenarios, advantages, disadvantages, simulation experimental results, and environment settings of different methods, while clarifying their interrelationships. Thereafter, common public datasets and performance evaluation methods for natural scene text detection are introduced. Finally, the major challenges facing current deep learning-based natural scene text detection technology are outlined, and future development directions are discussed.

  • 40th Anniversary Celebration of Shanghai Computer Society
    QI Fenglin, SHEN Jiajie, WANG Maoyi, ZHANG Kai, WANG Xin
    Computer Engineering. 2025, 51(4): 1-14. https://doi.org/10.19678/j.issn.1000-3428.0070222

    The rapid development of Artificial Intelligence (AI) has empowered numerous fields and significantly impacted society, establishing a solid technological foundation for university informatization services. This study explores the historical development of both AI and university informatization by analyzing their respective trajectories and interconnections. Although universities worldwide may focus on different aspects of AI in their digital transformation efforts, they universally demonstrate vast potential of AI in enhancing education quality and streamlining management processes. Thus, this study focuses on five core areas: teaching, learning, administration, assessment, and examination. It comprehensively summarizes typical AI-empowered application cases to demonstrate how AI effectively improves educational quality and management efficiency. In addition, this study highlights the potential challenges associated with AI applications in university informatization, such as data privacy protection, algorithmic bias, and technology dependence. Furthermore, common strategies for addressing these issues such as enhancing data security, optimizing algorithm transparency and fairness, and fostering digital literacy among both teachers and students are elaborated upon in this study. Based on these analyses, the study explores future research directions for AI in university informatization, emphasizing the balance technological innovation and ethical standards. It advocates for the establishment of interdisciplinary collaboration mechanisms to promote the healthy and sustainable development of AI in the field of university informatization.

  • Artificial Intelligence and Pattern Recognition
    Zhite WANG, Liping LUO, Yikui LIAO
    Computer Engineering. 2024, 50(8): 86-101. https://doi.org/10.19678/j.issn.1000-3428.0068483

    To satisfy the performance requirements for robot path planning, an algorithm integrating improved A* algorithm and improved Dynamic Window Approach(DWA) is proposed, which shortens the path length and improves the searching efficiency and path smoothness. To combat the challenges of the traditional A* algorithm in complex scenarios, a new heuristic function is designed based on Manhattan distance and the diagonal distance. The weights are assigned dynamically, and the global shortest path and the least searching time are obtained. Next, an improved search strategy based on the 8-neighborhood is proposed, which involves dynamically assigning the optimal search direction to the current node, thus improving the searching efficiency and reducing the time consumption compared to the traditional 8-neighborhood 8-direction search method. Subsequently, the Floyd algorithm is employed to remove redundant nodes, reduce the steering times, and shorten the path distance. Additionally, the traditional DWA faces certain challenges; for instance, the path is not globally optimal, the path planning may fail, or the path length may increase. To solve these problems, a keypoint densification strategy is proposed to modify the deflective path. Finally, the proposed improved A* algorithm and fusion algorithm are compared with existing methods. The simulation results show that the improved A* algorithm can generate the shortest global path in complex environments, reducing the average steering time by 16.3% and shortening the average path searching time by 55.66%. For the fused algorithm, the average path length and average runtime shorten by 6.1% and 14.7% in the temporary obstacle environment, respectively, and shorten by 1.6% and 39.8%, respectively, in the moving obstacle environment.

  • Cyberspace Security
    WU Ruolan, CHEN Yuling, DOU Hui, ZHANG Yangwen, LONG Zhong
    Computer Engineering. 2025, 51(2): 179-187. https://doi.org/10.19678/j.issn.1000-3428.0068705
    Abstract (731) Download PDF (10449) HTML (36)   Knowledge map   Save

    Federated learning is an emerging distributed learning framework that facilitates the collective engagement of multiple clients in global model training without sharing raw data, thereby effectively safeguarding data privacy. However, traditional federated learning still harbors latent security vulnerabilities that are susceptible to poisoning and inference attacks. Therefore, enhancing the security and model performance of federated learning has become imperative for precisely identifying malicious client behavior by employing gradient noise as a countermeasure to prevent attackers from gaining access to client data through gradient monitoring. This study proposes a robust federated learning framework that combines mechanisms for malicious client detection with Local Differential Privacy (LDP) techniques. The algorithm initially employs gradient similarity to identify and classify potentially malicious clients, thereby minimizing their adverse impact on model training tasks. Subsequently, a dynamic privacy budget based on LDP is designed, to accommodate the sensitivity of different queries and individual privacy requirements, with the objective of achieving a balance between privacy preservation and data quality. Experimental results on the MNIST, CIFAR-10, and Movie Reviews (MR) text classification datasets demonstrate that compared to the three baseline algorithms, this algorithm results in an average 3 percentage points increase in accuracy for sP-type clients, thereby achieving a higher security level with significantly enhanced model performance within the federated learning framework.

  • Multimodal Information Fusion
    LI Jianlang, WU Xindian, CHEN Ling, YANG Bo, TANG Wensheng
    Computer Engineering. 2026, 52(2): 299-310. https://doi.org/10.19678/j.issn.1000-3428.0070113

    This study proposes a Common and Differential Cross-Attention Module-Bird's-Eye View (CDCAM-BEV) algorithm that combines 4D millimeter-wave radar and vision fusion to improve target detection accuracy for pedestrian and vehicle target recognition and localization in autonomous driving scenarios. First, a radar cylinder network is designed to encode the 4D radar point cloud into a pseudo image and convert the monocular image into a Bird's-Eye View (BEV) feature through Orthogonal Feature Transformation (OFT). Second, based on the cross-attention mechanism, a Common Information Extraction Module (CICAM) and a Differential Information Extraction Module (DICAM) are used to fully explore the common and differential information between radar and images. Finally, a BEV feature fusion module is designed based on CICAM and DICAM to achieve feature-level fusion of image and radar information in the BEV space. Experiments are conducted on the VOD dataset, and the CDCAM-BEV algorithm is compared with five other 3D object detection algorithms. The experimental results show that CDCAM-BEV achieves better detection performance in multiple modes. In the 3D mode, the average detection accuracy of CDCAM-BEV is 3.65 percentage points higher than that of the second ranked Part-A2; in the BEV mode, it is 5.04 percentage points higher than that of the second ranked PointPillars; in the Average Directional Similarity (AOS) mode, it is 2.62 percentage points higher than that of the second ranked Part-A2. These results show that CDCAM-BEV exhibits excellent performance in all modes, effectively fusing images and 4D radar point cloud features, which significantly improves the accuracy and reliability of object detection.

  • Research Hotspots and Reviews
    HUANG Kaiji, YANG Hua
    Computer Engineering. 2024, 50(10): 16-34. https://doi.org/10.19678/j.issn.1000-3428.0068580

    The objective of image matching is to establish correspondences between similar structures across two or more images. This task is fundamental to computer vision, with applications in robotics, remote sensing, and autonomous driving. With the advancements in deep learning in recent years, Two-Dimensional (2D) image matching algorithms based on deep learning have seen regular improvements in feature extraction, description, and matching. The performance of these algorithms in terms of matching accuracy and robustness has surpassed that of traditional algorithms, leading to significant advancements. First, this study summarizes 2D image matching algorithms based on deep learning features from the past ten years and categorizes them into three types: two-stage image matching based on local features, image matching of joint detection and description, and image matching without feature detection. Second, the study details the development processes, classification methods, and performance evaluation metrics of these three categories and summarizes their advantages and limitations. Typical application scenarios of 2D image matching algorithms are then introduced, and the effects of research progress in 2D image matching on its application domains are analyzed. Finally, the study summarizes the development trends of 2D image matching algorithms and discusses future prospects.

  • Artificial Intelligence and Pattern Recognition
    Lai QIAN, Weiwei ZHAO
    Computer Engineering. 2024, 50(7): 104-111. https://doi.org/10.19678/j.issn.1000-3428.0068132

    Text classification is a basic task in the field of natural language processing and plays an important role in information retrieval, machine translation, sentiment analysis, and other applications. However, most deep learning models do not fully consider the rich information in training instances during inference, resulting in inadequate text feature learning. To leverage training instance information fully, this paper proposes a text classification method based on contrastive learning and attention mechanism. First, a supervised contrastive learning training strategy is designed to optimize the retrieval of text vector representations, thereby improving the quality of the retrieved training instances during the inference process. Second, an attention mechanism is constructed to learn the attention distribution of the obtained training text features, focusing on adjacent instance information with stronger relevance and capturing more implicit similarity features. Finally, the attention mechanism is combined with the model network, fusing information from adjacent training instances to enhance the ability of the model to extract diverse features and achieve global and local feature extraction. The experimental results demonstrate that this method achieves significant improvements on various models, including Convolutional Neural Network(CNN), Bidirectional Long Short-Term Memory(BiLSTM), Graph Convolutional Network(GCN), Bidirectional Encoder Representations from Transformers(BERT), and RoBERTa. For the CNN model, the macro F1 value is increased by 4.15, 6.2, and 1.92 percentage points for the THUCNews, Toutiao, and Sogou datasets, respectively. Therefore, this method provides an effective solution for text classification tasks.

  • Development Research and Engineering Application
    Xiangquan GUI, Shiqing LIU, Li LI, Qingsong QIN, Tangyan LI
    Computer Engineering. 2024, 50(7): 342-351. https://doi.org/10.19678/j.issn.1000-3428.0068125

    The TAPDataset pedestrian detection dataset is used in this study to address the issues of low detection accuracy, large number of algorithm parameters, and limitations of existing public datasets for small target detection in current scenic pedestrian detection. This dataset addresses the deficiencies of existing datasets regarding small target detection. Based on the YOLOv8 algorithm, a new model with high detection accuracy and low hardware requirements, called YOLOv8-L, is proposed. First, the lightweight convolution module DepthSepConv is introduced to reduce the number of parameters and computations of the model. Second, the BiFormer attention mechanism and CARAFE upsampling operator are used to enhance the model's semantic understanding of images and information fusion capability, significantly improving detection accuracy. Finally, a small target detection layer is added to extract more shallow features, effectively improving the model's performance for small target detection. The effectiveness of the algorithm is verified using the TAPDataset, VOC 2007, and TAP+VOC datasets. The experimental results show that compared with YOLOv8, the number of model parameters is reduced by 18.06% on the TAPDataset with unchanged FPS, mAP@0.5 improves by 5.51%, and mAP@0.5∶0.95 improves by 6.03%. On the VOC 2007 dataset, the number of parameters is reduced by 13.6%, with mAP@0.5 improving by 3.96% and mAP@0.5∶0.95 improving by 6.39%. On the TAP+VOC dataset, the number of parameters is reduced by 14.02%, with mAP@0.5 improving by 4.49% and mAP@0.5∶0.95 improving by 5.68%. The improved algorithm demonstrates stronger generalization performance and can be better applied to scenic pedestrian detection tasks.

  • Research Hotspots and Reviews
    Baihao JIANG, Jing LIU, Dawei QIU, Liang JIANG
    Computer Engineering. 2024, 50(3): 1-15. https://doi.org/10.19678/j.issn.1000-3428.0067502

    Deep learning algorithms have the advantages of strong learning, strong adaptive, and unique nonlinear mapping abilities in spinal image segmentation. Compared with traditional segmentation methods, they can better extract key information from spinal images and suppress irrelevant information, which can assist doctors in accurately locating focal areas and realizing accurate and efficient segmentation. The application status of deep learning in spinal image segmentation is summarized and analyzed as concerns deep learning algorithms, types of spinal diseases, types of images, experimental segmentation results, and performance evaluation indicators. First, the background of the deep learning model and spinal image segmentation is described, and thereafter, the application of deep learning in spinal image segmentation is introduced. Second, several common types of spinal diseases are introduced, the difficulties in image segmentation are described, and common open datasets, image segmentation method flow, and image segmentation evaluation indicators are introduced in spinal image segmentation. Combined with specific experiments, the application progress of the Convolutional Neural Network(CNN) model, the U-Net model, and their improved models in the image segmentation of vertebrae, intervertebral discs, and spinal tumors are summarized and analyzed. Combined with previous experimental results and the current research progress of deep learning models, this paper summarizes the limitations of current clinical studies and the reasons for the insufficient segmentation effect, and proposes corresponding solutions to the existing problems. Finally, prospects for future studies and development are proposed.

  • Intelligent Transportation
    Wei CHEN, Xiaolong WANG, Yanwei ZHANG, Guocheng AN, Bo JIANG
    Computer Engineering. 2024, 50(4): 11-19. https://doi.org/10.19678/j.issn.1000-3428.0068901

    In highway service areas, complex environments such as lighting and weather changes can cause a sharp decline in vehicle detection accuracy. In addition, factors such as the inclination angle of the camera and the height of installation can increase false-negative and false-positive rates. To this end, a vehicle violation detection algorithm based on the improved YOLOv8 is proposed for highway service areas. First, the feature pyramid pooling layer of the YOLOv8 network, a Dilated Space Pyramid Pooling(DSPP) module, and a DSPP based on branch Attention(DSPPA) module are constructed to reduce the loss of semantic information in the backbone. The Branch Attention(BA) mechanism in DSPPA assigns different weights to the branches with varying degrees of contribution, making the model focus more on features that are suitable for the target size. Second, a parking space allocation strategy based on global matching is designed to effectively reduce the false-negative and false-positive rates of illegal parking detection in situations involving tilted views and overlapping vehicles. The experimental results show that the improved algorithm reduces the false-negative rate of parking violation detection from 15% to 8% and the false-positive rate from 7.5% to 6.1%, demonstrating considerable performance improvement in vehicle violation detection.

  • Graphics and Image Processing
    WANG Guoming, JIA Daiwang
    Computer Engineering. 2025, 51(12): 294-303. https://doi.org/10.19678/j.issn.1000-3428.0070027

    Deep learning-based object detection has significantly improved the detection of medium and large targets. However, when detecting small objects, traditional algorithms often face challenges such as missed detections and false positives owing to the inherent issues of small scale and complex backgrounds. Therefore, this study aims to enhance the accuracy of small object detection by improving the YOLOv8 model. First, the convolutional module in the backbone is replaced with the RFAConv module, which enhances the ability of the model to process complex images. Second, a Mixed Local Channel Attention (MLCA) mechanism is introduced in the neck part, allowing the model to fuse features from different layers more efficiently while maintaining computational efficiency. Third, the Detect head of YOLOv8 is replaced with the Detect_FASFF head to address the inconsistency between different feature scales and improve the ability of the model to detect small objects. Finally, the Complete Intersection over Union (CIoU) loss function is replaced with the Focaler-IoU loss function, enabling the model to focus more on small objects that are difficult to locate precisely. Experimental results show that the improved model increases mAP@0.5 by 4.8 percentage points and mAP@0.5:0.95 by 3.0 percentage points on the FloW-Img dataset, which is sparse in small objects. On the VisDrone2019 dataset which has a high density of small objects, mAP@0.5 increases by 5.9 percentage points and mAP@0.5:0.95 improves by 4.0 percentage points. In addition, generalization comparison experiments are conducted on the low-altitude dataset AU-AIR and the pedestrian-dense detection dataset WiderPerson. The optimized model significantly improves the accuracy of small object detection compared with the original model and expands its applicability.

  • Artificial Intelligence and Pattern Recognition
    ZHANG Hongchen, LI Linyu, YANG Li, SAN Chenjun, YIN Chunlin, YAN Bing, YU Hong, ZHANG Xuan
    Computer Engineering. 2024, 50(4): 168-176. https://doi.org/10.19678/j.issn.1000-3428.0067543
    A knowledge graph is a structured knowledge base comprising various types of knowledge or data units obtained through extraction and other processes. It is used to describe and represent information, such as entities, concepts, facts, and relationships. The limitations of Natural Language Processing(NLP) technology and the presence of noise in the texts of various knowledge or information units affect the accuracy of information extraction. Existing Knowledge Graph Completion(KGC) methods typically account for only single structural information or text semantic information, whereas the structural and text semantic information in the entire knowledge graph is disregarded. Hence, a KGC model based on contrastive learning and language model-enhanced embedding is proposed. The input entities and relationships are obtained using a pretrained language model to obtain the textual semantic information of the entities and relationships. The distance scoring function of the translation model is used to capture the structured information in the knowledge graph. Two negative sampling methods for contrastive learning are used to fuse contrastive learning to train the model to improve its ability to represent positive and negative samples. Experimental results show that compared with the Bidirectional Encoder Representations from Transformers for Knowledge Graph completion(KG-BERT) model, this model improves the average proportion of triple with ranking less than or equal to 10(Hits@10) indicator by 31% and 23% on the WN18RR and FB15K-237 datasets, respectively, thus demonstrating its superiority over other similar models.
  • Research Hotspots and Reviews
    LI Shuo, ZHAO Chaoyang, QU Yinxuan, LUO Yaping
    Computer Engineering. 2024, 50(12): 33-47. https://doi.org/10.19678/j.issn.1000-3428.0068276

    Fingerprint recognition is one of the earliest and most mature biometric recognition technologies that is widely used in mobile payments, access control and attendance in the civilian field, and in criminal investigation to retrieve clues from suspects. Recently, deep learning technology has achieved excellent application results in the field of biometric recognition, and provided fingerprint researchers with new methods for automatic processing and the application of fusion features to effectively represent fingerprints, which have excellent application results at all stages of the fingerprint recognition process. This paper outlines the development history and application background of fingerprint recognition, expounds the main processing processes of the three stages of fingerprint recognition, which are image preprocessing, feature extraction, and fingerprint matching, summarizes the application status of deep learning technology in specific links at different stages, and compares the advantages and disadvantages of different deep neural networks in specific links, such as image segmentation, image enhancement, direction field estimation, minutiae extraction, and fingerprint matching. Finally, some of the current problems and challenges in the field of fingerprint recognition are analyzed, and future development directions, such as building public fingerprint datasets, multi-scale fingerprint feature extraction, and training end-to-end fingerprint recognition models, are prospected.

  • GraphicsandImage Processing
    WU Xing, YIN Haoyu, YAO Junfeng, LI Weimin, QIAN Quan
    Computer Engineering. 2024, 50(6): 218-227. https://doi.org/10.19678/j.issn.1000-3428.0067874
    Multimodal sentiment analysis aims to extract and integrate semantic information from text, images, and audio data in order to identify the emotional states of speakers in online videos. Although, multimodal fusion methods have shown definite outcomes in this research area, previous studies have not adequately addressed the distribution differences between modes and the fusion of relational knowledge. Therefore, a multimodal sentiment analysis method is recommended. In this context, this study proposes the design of a Multimodal Prompt Gate (MPG) module. The proposed module can convert nonverbal information into prompts that fuse the context, filter the noise of nonverbal signals using text information, and obtain prompts containing rich semantic information to enhance information integration between the modalities. In addition, a contrastive learning framework from instance to label is proposed. This framework is used to distinguish the different labels in latent space at the semantic level to further optimize the model output. Experiments on three large-scale sentiment analysis datasets are conducted. The results show that the binary classification accuracy of the proposed method improves by approximately 0.7% compared to the suboptimal model, and the ternary classification accuracy improves by more than 2.5%, reaching 0.671. This method can provide a reference for introducing multimodal sentiment analysis in the fields of user profiling, video understanding, and AI interviews.
  • Research Hotspots and Reviews
    PANG Wenhao, WANG Jialun, WENG Chuliang
    Computer Engineering. 2024, 50(12): 1-15. https://doi.org/10.19678/j.issn.1000-3428.0068694

    In the context of big data, the rapid advancement of fields such as scientific computing and artificial intelligence, there is an increasing demand for high computational power across various domains. The unique hardware architecture of the Graphics Processing Unit (GPU) makes it suitable for parallel computing. In recent years, the concurrent development of GPUs and fields such as artificial intelligence and scientific computing has enhanced GPU capabilities, leading to the emergence of mature General-Purpose Graphics Processing Units (GPGPUs). Currently, GPGPUs are one of the most important co-processors for Central Processing Units (CPUs). However, the fixed hardware configuration of the GPU after delivery and its limited memory capacity can significantly hinder its performance, particularly when dealing with large datasets. To address this issue, Compute Unified Device Architecture (CUDA) 6.0 introduces unified memory, allowing GPGPU and CPU to share a virtual memory space, thereby simplifying heterogeneous programming and expanding the GPGPU-accessible memory space. Unified memory offers a solution for processing large datasets on GPGPUs and alleviates the constraints of limited GPGPU memory capacity. However, the use of unified memory introduces performance issues. Effective data management within unified memory is the key to enhancing performance. This article provides an overview of the development and application of CUDA unified memory. It covers topics such as the features and evolution of unified memory, its advantages and limitations, its applications in artificial intelligence and big data processing systems, and its prospects. This article provides a valuable reference for future work on applying and optimizing CUDA unified memory.

  • Research Hotspots and Reviews
    MA Hengzhi, QIAN Yurong, LENG Hongyong, WU Haipeng, TAO Wenbin, ZHANG Yiyang
    Computer Engineering. 2025, 51(2): 18-34. https://doi.org/10.19678/j.issn.1000-3428.0068386

    With the continuous development of big data and artificial intelligence technologies, knowledge graph embedding is developing rapidly, and knowledge graph applications are becoming increasingly widespread. Knowledge graph embedding improves the efficiency of knowledge representation and reasoning by representing structured knowledge into a low-dimensional vector space. This study provides a comprehensive overview of knowledge graph embedding technology, including its basic concepts, model categories, evaluation indices, and application prospects. First, the basic concepts and background of knowledge graph embedding are introduced, classifying the technology into four main categories: embedding models based on translation mechanisms, semantic- matching mechanisms, neural networks, and additional information. The core ideas, scoring functions, advantages and disadvantages, and application scenarios of the related models are meticulously sorted. Second, common datasets and evaluation indices of knowledge graph embedding are summarized, along with application prospects, such as link prediction and triple classification. The experimental results are analyzed, and downstream tasks, such as question-and-answer systems and recommenders, are introduced. Finally, the knowledge graph embedding technology is reviewed and summarized, outlining its limitations and the primary existing problems while discussing the opportunities and challenges for future knowledge graph embedding along with potential research directions.

  • Graphics and Image Processing
    ZHANG Xu, CHEN Cifa, DONG Fangmin
    Computer Engineering. 2024, 50(12): 318-328. https://doi.org/10.19678/j.issn.1000-3428.0068588

    Achieving enhanced detection accuracy is a challenging task in the field of PCB defect detection. To address this problem, this study proposes a series of improvement methods based on PCB defect detection. First, a novel attention mechanism, referred to as BiFormer, is introduced. This mechanism uses dual-layer routing to achieve dynamic sparse attention, thereby reducing the amount of computation required. Second, an innovative upsampling operator called CARAFE is employed. This operator combines semantic and content information for upsampling, thereby making the upsampling process more comprehensive and efficient. Finally, a new loss function based on the MPDIoU metric, referred to as the LMPDIoU loss function, is adopted. This loss function effectively addresses unbalanced categories, small targets, and denseness problems, thereby further improving image detection performance. The experimental results reveal that the model achieves a significant improvement in mean Average Precision (mAP) with a score of 93.91%, 13.12 percentage points higher than that of the original model. In terms of recognition accuracy, the new model reached a score of 90.55%, representing an improvement of 8.74 percentage points. These results show that the introduction of the BiFormer attention mechanism, CARAFE upsampling operator, and LMPDIoU loss function effectively improves the accuracy and efficiency of PCB defect detection. Thus, the proposed methods provide valuable references for research in industrial inspection, laying the foundation for future research and applications.

  • Computer Architecture and Software Technology
    GAO Qiuchen, HU Yonghua
    Computer Engineering. 2024, 50(9): 189-196. https://doi.org/10.19678/j.issn.1000-3428.0068240

    The System of Chip (SoC) integrates multiple peripheral interfaces, the verification of which has become one of the most time-consuming steps in chip development. The PCIe protocol provides high-speed peer-to-peer serial interconnection services within the system, while supporting hot swapping, which has gradually become a universal bus protocol. When using conventional Hardware Description Languages (HDL) to validate PCIe interface designs, problems usually arise, such as difficulty in covering multiple design scenarios and boundary conditions in a short period, leading to insufficient verification. To address the above issues, this study utilizes Universal Verification Methodology (UVM) to build a PCIe interface validation platform. This platform adopts a UVM-defined framework and test classes, achieving top-level environmental integration and design of test constraints, with strong reusability and comprehensive verification. This implementation includes SoC system-level environmental integration, design, and connection of the modules to be tested, implementation of sequencer and monitor classes in the verification platform, and partial interface design. To ensure that the test cases cover as many design states and paths as possible, different functional points are divided deliberately, and constraint conditions are designed to evaluate the effectiveness and coverage of test cases using various coverage indicators. The experimental results show that the verification platform can curtail the verification cycle and increase the comprehensive coverage by more than 30%.

  • Artificial Intelligence and Pattern Recognition
    SUN Wenjie, LI Zongmin, SUN Haomiao
    Computer Engineering. 2024, 50(5): 62-70. https://doi.org/10.19678/j.issn.1000-3428.0067919
    Collaborative cooperation between agents in partially observable situations is an important problem in Multi-Agent Reinforcement Learning(MARL). The value function factorization approach solves the credit assignment problem and effectively achieves collaborative cooperation between agents. However, existing value function factorization approaches depend only on individual value functions with local information and do not allow explicit information exchange between agents, making them unsuitable for complex scenarios. To address this problem, this study introduces communication in the value function factorization approach to provide effective nonlocal information to agents, helping them understand complex environments. Furthermore, unlike existing communication approaches, the proposed approach uses a multi-layer message passing architecture based on Graph Neural Network(GNN), which extracts useful information that must be exchanged between neighboring agents. Simultaneously, the model realizes the transition from non-communication to full communication and achieves global cooperation with a limited communication range, which is suitable for real-world applications where the communication range is constrained. The results of experiments in the StarCraft II Multi-Agent Challenge(SMAC) and Predator-Prey(PP) environments demonstrate that the average winning rate of this approach improves by 2-40 percentage points compared with those of baseline algorithms, such as QMIX and VBC, in four different scenarios of SMAC. Furthermore, the proposed approach effectively solves the PP problem in non-monotonic environments.
  • Artificial Intelligence and Pattern Recognition
    Jianmin LIU, Hui LIN, Xiaoding WANG
    Computer Engineering. 2024, 50(7): 144-153. https://doi.org/10.19678/j.issn.1000-3428.0068163

    Existing trajectory prediction methods rely heavily on high-definition maps, which are time-consuming, costly, and complex to acquire. This makes it difficult for them to quickly adapt to the widespread adoption of intelligent transportation. To address the problem of vehicle trajectory prediction in map-free scenes, a trajectory prediction method based on spatio-temporal features of multi-modal data is proposed in this paper. Multiple spatio-temporal interaction graphs are constructed from the history of the trajectory, temporal and spatial attention are cross-utilized and deeply fused to model the spatio-temporal correlations between vehicles on the road. Finally, a residual network is used for a multi-objective and multi-modal trajectory generation. The model is trained and tested on the real dataset, Argoverse 2, and the experimental results show that compared with the CRAT-Pred, this model can improve minADE, minFDE and Miss Rate(MR) metrics in single-modal prediction by 3.86%, 3.89%, and 0.48%, and in multi-modal prediction by 0.78%, 0.96% and 0.42%. Hence, the proposed trajectory prediction method can efficiently capture the temporal and spatial characteristics of vehicle movement trajectories and can be effectively applied in related fields such as autonomous driving.

  • Artificial Intelligence and Pattern Recognition
    ZHANG Guosheng, LI Caihong, ZHANG Yaoyu, ZHOU Ruihong, LIANG Zhenying
    Computer Engineering. 2025, 51(1): 88-97. https://doi.org/10.19678/j.issn.1000-3428.0068738

    This study proposes an improved Artificial Potential Field (APF) algorithm (called FC-V-APF) based on Fuzzy Control (FC) and a virtual target point method to solve the local minimum trap and path redundancy issues of the APF method in robot local path planning. First, a virtual target point obstacle avoidance strategy is designed, and the V-APF algorithm is constructed to help the robot overcome local minimum traps by adding an obstacle crossing mechanism and a target point update threshold. Second, a control strategy based on the cumulative angle sum is proposed to assist the robot in exiting a multi-U complex obstacle area. Subsequently, the V-APF and FC algorithms are combined to construct the FC-V-APF algorithm. The corresponding environment is evaluated using real-time data from the radar sensor and designed weight function, and a fuzzy controller is selected to output the auxiliary force to avoid obstacles in advance. Finally, a simulation environment is built on the Robot Operating System (ROS) platform to compare the path planning performance of the FC-V-APF algorithm with that of other algorithms. Considering path length, running time, and speed curves, the designed FC-V-APF algorithm can quickly eliminate traps, reduce redundant paths, improve path smoothness, and reduce planning time.

  • Development Research and Engineering Application
    HU Shuai, LI Hualing, HAO Dechen
    Computer Engineering. 2024, 50(4): 286-293. https://doi.org/10.19678/j.issn.1000-3428.0067779
    Medical image segmentation accuracy plays a key role in clinical diagnosis and treatment. However, because of the complexity of medical images and diversity of target regions, existing medical image segmentation methods are limited to incomplete edge region segmentation and insufficient use of image context feature information. An improved Multistage Edge-Enhanced(MEE) medical image segmentation network of the U-Net, known as MDU-Net model, is proposed to solve these problems. First, a MEE module is added to the encoder structure to extract double-layer low-stage feature information, and the rich edge information in the feature layer is obtained by expanding the convolution blocks at different expansion rates. Second, a Detailed Feature Association(DFA) module integrating the feature information of adjacent layers is embedded in the skip connection to obtain deep-stage and multiscale context feature information. Finally, the feature information extracted from the different modules is aggregated in the corresponding feature layer of the decoder structure, and the final segmentation result is obtained by an upsampling operation. The experimental results on two public datasets show that compared with other models, such as Transformers make strong encoders for medical image segmentation(TransUNet), the MDU-Net model can efficiently use the feature information of different feature layers in medical images and achieve an improved segmentation effect in the edge region.
  • Research Hotspots and Reviews
    SUN Lijun, MENG Fanjun, XU Xingjian
    Computer Engineering. 2025, 51(11): 1-21. https://doi.org/10.19678/j.issn.1000-3428.0069543

    In the context of ongoing advancements in educational informatization, constructing precise and efficient curriculum knowledge graphs has become key to promoting personalized education development. As a structured knowledge representation model, curriculum knowledge graphs reveal complex relations between curriculum content and learning objectives to optimize the allocation of educational resources, and tailoring personalized learning paths for learners. This survey presents a discussion around the techniques used to construct curriculum knowledge graphs, starting with an explanation of the basic concepts; intrinsic connections; and significant differences among general, educational, and curriculum knowledge graphs. It then delves into the key technologies used for building curriculum knowledge graphs, covering aspects such as curriculum ontology design, entity extraction, and relation extraction, and provides a detailed analysis and summary of their evolution, key features, and limitations. Furthermore, it explores the application value of curriculum knowledge graphs in scenarios such as learning resource recommendation, learner behavior profile and modeling, and multimodal curriculum knowledge graph construction. Finally, it focuses on the challenges in constructing curriculum knowledge graphs, such as data diversity and heterogeneity, difficulties in quality evaluation, and the lack of cross-curriculum integration, and provides future-oriented insights based on cutting-edge technologies such as deep learning and Large Language Models (LLMs).

  • Artificial Intelligence and Pattern Recognition
    DAI Lei, CAO Lin, GUO Yanan, ZHANG Fan, DU Kangning
    Computer Engineering. 2024, 50(10): 100-109. https://doi.org/10.19678/j.issn.1000-3428.0068106

    To reduce social risks caused by the abuse of deepfake technology, an active defense method against deep forgery based on a Generative Adversarial Network (GAN) is proposed. Adversarial samples are created by adding imperceptible perturbation to original images, which significantly distorts the output of multiple forgery models. The proposed model comprises an adversarial sample generation module and an adversarial sample optimization module. The adversarial-sample generation module includes a generator and discriminator. After the generator receives an original image to generate a perturbation, the spatial distribution of the perturbation is constrained through adversarial training. By reducing the visual perception of the perturbation, the authenticity of the adversarial sample is improved. The adversarial sample optimization module comprises basic adversarial watermarking, deep forgery models, and discriminators. This module simulates black-box scenarios to attack multiple deep forgery models, thereby improving the attack and migration of adversarial samples. Training and testing are conducted on commonly used deepfake datasets Celebfaces Attributes (CelebA) and Labeled Faces in the Wild (LFW). Experimental results show that compared with existing active defense methods, the proposed method achieves a defense success rate exceeding 85% based on the cross-model active defense method and generates adversarial samples. Additionally, the method improves efficiency by 20-30 times compared with those of conventional algorithms.

  • Artificial Intelligence and Pattern Recognition
    CHEN Hao, CHEN Jun, LIU Fei
    Computer Engineering. 2025, 51(1): 60-70. https://doi.org/10.19678/j.issn.1000-3428.0068764

    In path planning for mobile robots, challenges arise when dealing with unknown and dynamically changing environments, such as high collision rates with obstacles and susceptibility to local optima. To address these issues, this paper proposes an improved Twin Delayed Deep Deterministic (TD3) algorithm, based on TD3 policy gradient, to enhance the path-planning performance of mobile robots in unknown dynamic environments. First, a Long Short-Term Memory (LSTM) neural network is introduced and combined with the TD3 algorithm. Employing gate structures, historical state information is filtered to perceive the state changes of obstacles within the sensing range for the robot to gain a better understanding of the dynamic environment and movement patterns of obstacles. This enables the mobile robot to accurately predict and respond to the behavior of dynamic obstacles, thereby reducing the collision rate with obstacles. Second, Ornstein-Uhlenbeck(OU) exploration noise is incorporated to facilitate continuous exploration of the surrounding environment, thereby enhancing the robot's random exploration capability. Additionally, a single experience pool is divided into three separate pools-success, failure, and temporary-to improve the sampling efficiency of the effective samples and reduce training time. Finally, simulation experiments are conducted for two different scenarios involving a mixture of dynamic and static obstacles for path planning. A comparative analysis of the experimental results demonstrates that in scenario 1, the proposed algorithm reduces the convergence of the model by 100-200 rounds compared with the Deep Deterministic Policy Gradient (DDPG) and TD3 algorithms. Moreover, it shortens the path length by 0.5-0.8 units and reduces the planning time by 1-4 s. In scenario 2, the proposed algorithm reduces the convergence of the model by 100-300 rounds compared to the TD3 algorithm, shortening the path length by 1-3 units and reducing the planning time by 4-8 s. However, the DDPG algorithm fails as the mobile robot is unable to reach the destination successfully. Therefore, the improved algorithm exhibits superior path planning performance.

  • Research Hotspots and Reviews
    SUN Renke, XU Jinghao, HUANGFU Zhiyu, LI Zhongnian, XU Xinzheng
    Computer Engineering. 2024, 50(10): 1-15. https://doi.org/10.19678/j.issn.1000-3428.0070036

    In recent years, remarkable advancements in Artificial Intelligence (AI) across unimodal domains, such as computer vision and Natural Language Processing (NLP), have highlighted the growing importance and necessity of multimodal learning. Among the emerging techniques, the Zero-Shot Transfer (ZST) method, based on visual-language pre-trained models, has garnered widespread attention from researchers worldwide. Owing to the robust generalization capabilities of pre-trained models, leveraging visual-language pre-trained models not only enhances the accuracy of zero-shot recognition tasks but also addresses certain zero-shot downstream tasks that are beyond the scope of conventional approaches. This review provides an overview of ZST methods based on vision-language pre-trained models. First, it introduces conventional approaches to Few-Shot Learning (FSL) and summarizes its main forms. It then discusses the distinctions between ZST and FSL based on vision-language pre-trained models, highlighting the new tasks that ZST can address. Subsequently, it explores the application of ZST methods in various downstream tasks, including sample recognition, object detection, semantic segmentation, and cross-modal generation. Finally, it analyzes the challenges of current ZST methods based on vision-language pre-trained models and outlines potential future research directions.

  • Artificial Intelligence and Pattern Recognition
    ZHOU Hanqi, FANG Dongxu, ZHANG Ningbo, SUN Wensheng
    Computer Engineering. 2025, 51(4): 57-65. https://doi.org/10.19678/j.issn.1000-3428.0069100

    Unmanned Aerial Vehicle (UAV) Multi-Object Tracking (MOT) technology is widely used in various fields such as traffic operation, safety monitoring, and water area inspection. However, existing MOT algorithms are primarily designed for single-UAV MOT scenarios. The perspective of a single-UAV typically has certain limitations, which can lead to tracking failures when objects are occluded, thereby causing ID switching. To address this issue, this paper proposes a Multi-UAV Multi-Object Tracking (MUMTTrack) algorithm. The MUMTTrack network adopts an MOT paradigm based on Tracking By Detection (TBD), utilizing multiple UAVs to track objects simultaneously and compensating for the perspective limitations of a single-UAV. Additionally, to effectively integrate the tracking results from multiple UAVs, an ID assignment strategy and an image matching strategy are designed based on the Speeded Up Robust Feature (SURF) algorithm for MUMTTrack. Finally, the performance of MUMTTrack is compared with that of existing widely used single-UAV MOT algorithms on the MDMT dataset. According to the comparative analysis, MUMTTrack demonstrates significant advantages in terms of MOT performance metrics, such as the Identity F1 (IDF1) value and Multi-Object Tracking Accuracy (MOTA).

  • Artificial Intelligence and Pattern Recognition
    YANG Dongju, HUANG Juntao
    Computer Engineering. 2024, 50(9): 113-120. https://doi.org/10.19678/j.issn.1000-3428.0068400

    High-quality annotated data are crucial for Natural Language Processing(NLP) tasks in the field of Chinese scientific literature. A method of annotation based on a Large Language Model(LLM) was proposed to address the lack of high-quality annotated corpora and the issues of inconsistent and inefficient manual annotation in Chinese scientific literature. First, a fine-grained annotation specification suitable for multi-domain Chinese scientific literature was established to clarify entity types and annotation granularity. Second, a structured text annotation prompt template and a generation parser were designed. The annotation task of Chinese scientific literature was set up as a single-stage, single-round question-and-answer process in which the annotation specifications and text to be annotated were filled into the corresponding slots of the prompt template to construct the task prompt. This prompt was then injected into the LLM to generate output text containing annotation information. Finally, the structured annotation data were obtained by the parser. Subsequently, using prompt learning based on LLM, the Annotated Chinese Scientific Literature(ACSL) entity dataset was generated, which contains 10 000 annotated documents and 72 536 annotated entities distributed across 48 disciplines. For ACSL, three baseline models based on RoBERTa-wwm-ext, a configuration of the Robustly optimized Bidirectional Encoder Representations from Transformers(RoBERT) approach, were proposed. The experimental results demonstrate that the BERT+Span model performs best on long-span entity recognition in Chinese scientific literature, achieving an F1 value of 0.335. These results serve as benchmarks for future research.

  • Artificial Intelligence and Pattern Recognition
    Huayu LI, Zhikang ZHANG, Yang YAN, Yang YUE
    Computer Engineering. 2024, 50(8): 31-39. https://doi.org/10.19678/j.issn.1000-3428.0068225

    Addressing the limitations of Chinese Named Entity Recognition(NER) within specific domains, this paper proposes a model to enhance entity recognition accuracy by utilizing domain-specific Knowledge Graphs(KGs) and images. The proposed model leverages domain graphs and images to improve entity recognition accuracy in short texts related to computer science. The model employs a Bidirectional Encoder Representations from Transformers(BERT)-Bidirectional Long Short-Term Memory(BiLSTM)-Attention-based model to extract textual features, a ResNet152-based approach to extract image features, and a word segmentation tool to obtain noun entities from sentences. These noun entities are then embedded with KG nodes using BERT. The model uses cosine similarity to determine the most similar nodes in the KG for the segmented words in the sentence. It retains neighboring nodes with a distance of 1 from this node to generate an optimal matching subgraph for semantic enrichment of the sentence. A Multi-Layer Perceptron(MLP) is employed to map the textual, image, and subgraph features into the same space. A unique gating mechanism is utilized to achieve fine-grained cross-modal feature fusion between textual and image features. Finally, multimodal features are fused with subgraph features by using a cross-attention mechanism and are then fed into the decoder for entity labeling. Experimental comparisons with relevant baseline models conducted on Twitter2015, Twitter2017, and a self-constructed computer science dataset are presented. The results indicate that the proposed approach achieved precision, recall, and F1 value of 88.56%, 87.47%, and 88.01% on the domain dataset compared to the optimal baseline model, its F1 value increased by 1.36 percentage points, demonstrating the effectiveness of incorporating domain KGs for entity recognition.

  • Development Research and Engineering Application
    XIE Jing, DENG Yueming, WANG Runmin
    Computer Engineering. 2024, 50(11): 338-349. https://doi.org/10.19678/j.issn.1000-3428.0068742

    Due to low detection accuracy for small targets in complex environments, along with false and missed detections in mainstream traffic sign detection algorithms, an improved algorithm based on YOLOv8s is proposed. This algorithm uses Pconv convolution in the backbone network and incorporates a C2faster module to achieve a lightweight network structure while maintaining network accuracy. In addition, to better utilize the information between low- and high-level features and enhance the regional context association ability, the SPPFCSPC module is designed as a spatial pyramid pooling module based on the concept of SPPF. In addition, by adding the GAM attention mechanism, the feature extraction capability of the network is further enhanced, and the detection accuracy is effectively improved. To improve the detection ability of small targets, a four-fold downsampling branch is added at the neck of the network to optimize target positioning. In addition, the Focal-EIoU loss function is used to replace the original CIoU loss function to accurately define the aspect ratio of the prediction box, which alleviates the problem of imbalance between the positive and negative samples. Experimental results show that on the CCTSDB-2021 traffic sign dataset, the improved algorithm achieved 86.1%, 73.0%, and 81.2% precision, recall, and mAP@0.5, respectively. Compared with the original YOLOv8s algorithm, increases of 0.8%, 6.3%, and 6.9% were observed, respectively. This algorithm significantly reduces false and missed detections in complex weather and harsh environments, offering better overall detection performance than the comparison algorithm, with strong practical value.

  • Development Research and Engineering Application
    ZENG Yuqi, LIU Bo, ZHONG Baichang, ZHONG Jin
    Computer Engineering. 2024, 50(9): 344-355. https://doi.org/10.19678/j.issn.1000-3428.0069597

    To accelerate the digital transformation of education, the precise analysis and empirical application of AI technology integrated into the entire process of teaching and learning behaviors have become a current research hotspot. To address the problems of low detection accuracy, high density of bonding boxes, severe overlap and occlusion, large scale variations, and imbalance of data volume in student classroom behavior detection, this paper establishes a student classroom behavior dataset (DBS Dataset). Additionally, it proposes a student classroom behavior detection algorithm VWE-YOLOv8 based on improved YOLOv8. First, it introduces the CSWin-Transformer attention mechanism to enhance the model's capability to extract global information from images. This improves the network's detection accuracy. Second, it increases the model's recognition capability on multi-scale targets by integrating the Large Separable Kernel Attention (LSKA) module into the SPPF architecture. Additionally, it incorporates an occlusion-aware attention mechanism into the design of the detection head (which modifies the original Head structure to SEAMHead) to effectively detect occluded objects. Finally, it introduces a weight adjustment function (Slide Loss) to address the issue of sample imbalance. The experimental results reveal that compared with YOLOv8, the improved VWE-YOLOv8 achieves increases of 1.16% and 1.70% in mAP@0.50 and 7.36% and 2.13% in mAP@0.50∶0.95, on the DBS Dataset and public SCB Dataset. Furthermore, it improves the precision by 4.17%, 6.74% and recall rate by 1.96% and 3.13% on these datasets, respectively. These results indicate that the improved algorithm has a higher detection accuracy and stronger generalization capability. Moreover, it is capable of detecting students' classroom behaviors. This can strongly support the application of smart education and aid the digital transformation of education.

  • CyberspaceSecurity
    LI Yongfei, LI Mingyang, CHANG Xin, CAO Kexin
    Computer Engineering. 2024, 50(6): 179-187. https://doi.org/10.19678/j.issn.1000-3428.0067570
    With the increasing applicability of Internet-of-Things (IoT) technology, the number and types of IoT devices and sensors are continuously increasing. In particular, IoT water quality sensors play a vital role in the field of ecological monitoring and protection. Accordingly, this study proposes an unsupervised anomaly data detection algorithm based on explainable deep learning to address the issues of large volume, high dimensionality, and lack of labeling in the monitoring data collected by IoT water quality sensors. The algorithm uses the Auto-Encoder (AE) and SHAP algorithms to detect anomalies in multi-dimensional water quality datasets. The AE model is trained to flag data with significant reconstruction errors, and SHAP is used to interpret the AE and calculate the importance of each feature in the flagged data. Based on the importance of these features, the final anomaly value is determined for the dataset. Experimental results on an IoT water quality monitoring dataset show that the algorithm can effectively detect anomalous data with an F1 value of 0.875, outperforming existing unsupervised anomaly detection algorithms. Thus, the proposed algorithm has a practical application value for processing IoT water quality monitoring data. Furthermore, the algorithm can be applied to the anomaly detection of massive IoT monitoring data in other fields, such as meteorology and the environment.
  • Graphics and Image Processing
    Fangxin XU, Rong FAN, Xiaolu MA
    Computer Engineering. 2024, 50(3): 250-258. https://doi.org/10.19678/j.issn.1000-3428.0067741

    Aiming at the problem that the detection algorithm is prone to omission and false detection in crowded pedestrian detection scenarios, this study proposes an improved YOLOv7 crowded pedestrian detection algorithm. Introducing a BiFormer visual transformer and an improved RepConv and Channel Space Attention Module (CSAM)-based Efficient Layer Aggregation Network (RC-ELAN) module in the backbone network, the self-attention mechanism and the attention module enable the backbone network to focus more on the important features of the occluded pedestrians, effectively mitigating the adverse effects of the missing target features on the detection. The improved neck network based on the idea of a Bidirectional Feature Pyramid Network (BiFPN) is used, and the transposed convolution and improved Rep-ELAN-W module enable the model to efficiently utilize the small-target feature information in the middle and low-dimensional feature maps, effectively improving the small-target pedestrian detection performance of the model. The introduction of an Efficient Complete Intersection-over-Union (E-CIoU) loss function allows the model to further converge to a higher accuracy. Experimental results on the WiderPerson dataset containing a large number of small target-obscuring pedestrians demonstrate that the average accuracies of the improved YOLOv7 algorithm when the IoU thresholds are set to 0.5 and 0.5-0.95 are improved by 2.5 and 2.8, 9.9 and 7.1, and 12.3 and 10.7 percentage points compared with the YOLOv7, YOLOv5, and YOLOX algorithms, respectively, which can be better applied to crowded pedestrian detection scenarios.

  • Research Hotspots and Reviews
    WANG Zhihao, QIAN Yuntao
    Computer Engineering. 2024, 50(9): 33-45. https://doi.org/10.19678/j.issn.1000-3428.0068296

    The spatiotemporal fusion super-resolution reconstruction of remote sensing images extracts information from low-resolution images with high temporal density and high-resolution images with low temporal resolution to generate remote sensing images with both high temporal and spatial resolutions. This process is directly related to the implementation of subsequent tasks such as interpretation, detection, and tracking. With the rapid advancement of Convolutional Neural Network (CNN), researchers have proposed a series of CNN-based spatiotemporal fusion methods. However, because of the inherent limitations of convolution operations, these methods still face challenges with respect to global information extraction. Inspired by the global modeling capabilities of the Swin Transformer, this paper proposes a super-resolution reconstruction model based on the Swin Transformer. In the feature extraction stage, a dual-stream structure is introduced, dividing the feature extraction network into two parts to extract temporal and spatial information separately. The performance of the model is enhanced by the global capabilities of the Swin Transformer. In the feature fusion stage, a Convolutional Block Attention Module (CBAM) that combines channel and spatial attention is introduced to enhance the important features and improve the image reconstruction accuracy. Comparative experiments are conducted on the Coleambally Irrigation Area (CIA) and Lower Gwydir Catchment (LGC) datasets using various spatiotemporal fused super-resolution reconstruction models. The results show that the proposed model achieved optimal performance across all evaluation metrics, demonstrating superior performance and enhanced generalization capabilities.

  • Artificial Intelligence and Pattern Recognition
    HUANG Kun, QI Zhaojian, WANG Juanmin, HU Qian, HU Weichao, PI Jianyong
    Computer Engineering. 2025, 51(5): 133-142. https://doi.org/10.19678/j.issn.1000-3428.0069026

    Pedestrian detection in crowded scenes is a key technology in intelligent monitoring of public space. It enables the intelligent monitoring of crowds, using object detection methods to detect the positions and number of pedestrians in videos. This paper presents Crowd-YOLOv8, an improved version of the YOLOv8 detection model, to address the issue of pedestrians being easily missed owing to occlusion and small target size in densely populated areas. First, nostride-Conv-SPD is introduced into the backbone network to enhance its capability of extracting fine-grained information, such as small object features in images. Second, small object detection heads and the CARAFE upsampling operator are introduced into the neck part of the YOLOv8 network to fuse features at different scales and improve the detection performance in the case of small targets. Experimental results demonstrate that the proposed method achieves an mAP@0.5 of 84.3% and an mAP@0.5∶0.95 of 58.2% on a CrowdedHuman dataset, which is an improvement of 3.7 and 5.2 percentage points, respectively, compared to those of the original YOLOv8n. On the WiderPerson dataset, the proposed method achieves an mAP@0.5 of 88.4% and an mAP@0.5∶0.95 of 67.4%, which is an improvement of 1.1 and 1.5 percentage points compared to those of the original YOLOv8n.

  • Artificial Intelligence and Pattern Recognition
    FU Mingjian, GUO Fuqiang
    Computer Engineering. 2024, 50(5): 91-99. https://doi.org/10.19678/j.issn.1000-3428.0068112
    Left-turn intersections without signal lights are among the most dangerous scenes in autonomous driving, and achieving efficient and safe left-turn decision-making is highly challenging in autonomous driving. The Deep Reinforcement Learning(DRL) algorithm has broad prospects in autonomous driving decision-making. However, its sample efficiency is low and it cannot be used to easily design reward functions in autonomous driving. Therefore, a DRL algorithm based on expert priors, abbreviated as CBAM-BC SAC, is proposed to solve the aforementioned problems. First, a Scalable Multiagent RL Training School(SMARTS) simulation platform is used to obtain expert prior knowledge. Subsequently, a Convolutional Block Attention Module(CBAM) is used to improve Behavior Cloning(BC), which pretrains and imitates expert strategies based on the prior knowledge of experts. Finally, the learning process of the DRL algorithm is guided by an imitation expert strategy and verified in a left-turn decision-making at intersection without traffic lights. Experimental results indicate that the DRL algorithm based on expert prior is more advantageous than conventional DRL algorithms. It not only eliminates the workload of manually setting reward functions, but also significantly improves sample efficiency and achieves better performance. In left-turn scene at intersection without traffic lights, the CBAM-BC SAC algorithm improves the average traffic success rate by 14.2 and 2.2 percentage points, respectively, compared with the conventional DRL algorithm SAC and the DRL algorithm BC SAC based on classic BC.
  • Research Hotspots and Reviews
    LU Yue, ZHOU Xiangyu, ZHANG Shizhou, LIANG Guoqiang, XING Yinghui, CHENG De, ZHANG Yanning
    Computer Engineering. 2025, 51(10): 1-17. https://doi.org/10.19678/j.issn.1000-3428.0070575

    Traditional machine learning algorithms perform well only when the training and testing sets are identically distributed. They cannot perform incremental learning for new categories or tasks that were not present in the original training set. Continual learning enables models to learn new knowledge adaptively while preventing the forgetting of old tasks. However, they still face challenges related to computation, storage overhead, and performance stability. Recent advances in pre-training models have provided new research directions for continual learning, which are promising for further performance improvements. This survey summarizes existing pre-training-based continual learning methods. According to the anti-forgetting mechanism, they are categorized into five types: methods based on prompt pools, methods with slow parameter updating, methods based on backbone branch extension, methods based on parameter regularization, and methods based on classifier design. Additionally, these methods are classified according to the number of phases, fine-tuning approaches, and use of language modalities. Subsequently, the overall challenges of continual learning methods are analyzed, and the applicable scenarios and limitations of various continual learning methods are summarized. The main characteristics and advantages of each method are also outlined. Comprehensive experiments are conducted on multiple benchmarks, followed by in-depth discussions on the performance gaps among the different methods. Finally, the survey discusses research trends in pre-training-based continual learning methods.

  • Research Hotspots and Reviews
    CI Tianzhao, YANG Hao, ZHOU You, XIE Changsheng, WU Fei
    Computer Engineering. 2025, 51(3): 1-23. https://doi.org/10.19678/j.issn.1000-3428.0068673

    Smartphones have become an integral part of modern daily life. The Android operating system currently holds the largest market share in the mobile operating system market owing to its open-source nature and comprehensive ecosystem. Within Android smartphones, the storage subsystem plays a pivotal role, exerting a significant influence on the user experience. However, the design of Android mobile storage systems diverges from server scenarios, necessitating the consideration of distinct factors, such as resource constraints, cost sensitivity, and foreground application prioritization. Extensive research has been conducted in this area. By summarizing and analyzing the current research status in this field, we categorize the issues experienced by users of Android smartphone storage systems into five categories: host-side writing amplification, memory swapping, file system fragmentation, flash device performance, and I/O priority inversion. Subsequently, existing works addressing these five categories of issues are classified, along with commonly used tools for testing and analyzing mobile storage systems. Finally, we conclude by examining existing techniques that ensure the user experience with Android smartphone storage systems and discuss potential avenues for future investigation.

  • Development Research and Engineering Application
    Jiuyuan HUO, Hongyang WANG, Tao JU, Jun HU
    Computer Engineering. 2024, 50(7): 372-380. https://doi.org/10.19678/j.issn.1000-3428.0068282

    To solve the problem of insufficient personalized monitoring in human health assessment methods and meet the demand for fine-grained health status assessment in different scenarios, a multi-scenario-based human health status assessment method is needed to achieve long-term automated monitoring. This study proposes a multi-scenario human health assessment model based on a combination of the Analytic Hierarchy Process (AHP) and Entropy Weight Method (EWM). First, health monitoring index data for the human body in four different scenarios, including exercise, rest, work/study, and recreation, are collected to construct the corresponding assessment index system. Then, the AHP and EWM weights are calculated for the assessment indicators, and the Quantum-behaved Particle Swarm Optimization (QPSO) algorithm is used to distribute the subjective and objective weights for the AHP and EWM to ensure the objectivity of the proportion of evaluation indicators. Finally, the human health state is assessed and quantified using the fuzzy comprehensive evaluation method, and the reliability and stability of the method are verified using actual monitoring data. The experimental results show that the composite scores of the proposed method under the four scenarios (exercise, rest, work/study, and recreation) are 63.78, 59.83, 58.71, and 59.21, respectively, indicating that the model has good accuracy and stability under different scenarios. The results of the physical state evaluation of the testers are analyzed, and some health suggestions are given. The model proposed in this study can comprehensively determine the health status of the human body under different scenarios and provide scientific health guidance. Thus, it provides a scientific basis for health management and disease prevention.