Computer Engineering

Select

Artificial Intelligence and Pattern Recognition

Graph Neural Network Recommendation Algorithm Based on Multimodal Fusion

Zhiqiang WU, Qing XIE, Lin LI, Yongjian LIU

Computer Engineering. 2024, 50(1): 91-100. https://doi.org/10.19678/j.issn.1000-3428.0066929

Abstract (1595) Download PDF (1775) HTML (143)

Knowledge map

Save

Many existing Graph Neural Network(GNN) recommendation algorithms use the node number information of the user-item interaction graph for training and learn the high-order connectivity among user and item nodes to enrich their representations. However, user preferences for different modal information are ignored, modal information such as images and text of items are not utilized, and the fusion of different modal features is summed without distinguishing the user preferences for different modal information types. A multimodal fusion GNN recommendation model is proposed to address this problem. First, for a single modality, a unimodal graph network is constructed by combining the user-item interaction bipartite graph, and the user preference for this modal information is learned in the unimodal graph. Graph ATtention(GAT) network is used to aggregate the neighbor information and enrich the local node representation, and the Gated Recurrent Unit(GRU) is used to decide whether to aggregate the neighbor information to achieve the denoising effect. Finally, the user and item representations learned from each modal graph are fused by the attention mechanism to obtain the final representation and then sent to the prediction module. Experimental results on the MovieLens-20M and H&M datasets show that the multimodal information and attention fusion mechanism can effectively improve the recommendation accuracy, and the algorithm model has significant improvements in Precision@K, Recall@K, and NDCG@K compared with the baseline optimal algorithm for the three indicators. When an evaluation index K value of 10 is selected, Precision@10, Recall@10, and NDCG@10 increase by 4.67%, 2.42%, 2.03%, and 2.49%, 5.24%, 2.05%, respectively, for the two datasets.

Select

Artificial Intelligence and Pattern Recognition

Unmanned Aerial Vehicle Image Target Detection Algorithm Based on YOLOv8

ZHAO Jida, ZHEN Guoyong, CHU Chengqun

Computer Engineering. 2024, 50(4): 113-120. https://doi.org/10.19678/j.issn.1000-3428.0068268

Abstract (1480) Download PDF (1629) HTML (162)

Knowledge map

Save

In the Unmanned Aerial Vehicle(UAV) target detection task, missed and false detections are caused by the small size of the detection target and complex background of the detection image. To address the problem of small target detection, the UAV image target detection algorithm is proposed by improving YOLOv8s. First, for application scenarios where drone shooting targets are generally small, the number of Backbone layers of the algorithm is reduced, and the size of the feature map to be detected is increased such that the network model can focus more on small targets. Second, because a certain number of low-quality examples commonly influence the training effect in the dataset, the Wise-IoU loss function is introduced to enhance the training effect of the dataset. Third, by introducing a context enhancement module, the characteristic information of small targets in different receptive fields is obtained, and the positioning and classification effect of the network model on small targets in complex environments is improved. Finally, a spatial-channel filtering module is designed to enhance the characteristic information of the target during the convolution process to filter out useless interference information and address the problem of some small target characteristic information being submerged and lost during the convolution process. Experiment results on the VisDrone2019 dataset demonstrate that the average detection accuracy(mAP@0.5) of the proposed algorithm reaches 45.4%, which is 7.3 percentage points higher than that of the original YOLOv8s algorithm, and the number of parameters is reduced by 26.13%. Under similar experimental conditions, compared with other common small target detection algorithms, the detection accuracy and speed are improved to a certain extent.

Select

Research Hotspots and Reviews

Review of Federated Learning and Its Security and Privacy Protection

XIONG Shiqiang, HE Daojing, WANG Zhendong, DU Runmeng

Computer Engineering. 2024, 50(5): 1-15. https://doi.org/10.19678/j.issn.1000-3428.0067782

Abstract (1351) Download PDF (2909) HTML (111)

Knowledge map

Save

Federated Learning (FL) is a new distributed machine earning technology that only requires local maintenance of data and can train a common model through the cooperation of all parties, which mitigates issues pertaining to data collection and privacy security in conventional machine learning. However, with the application and development of FL, it is still exposed to various attacks. To ensure the security of FL, the attack mode in FL and the corresponding privacy protection technology must be investigated. Herein, first, the background knowledge and relevant definitions of FL are introduced, and the development process and classification of FL are summarized. Second, the security three elements of FL are expounded, and the security issues and research progress of FL are summarized from two perspectives based on security sources and the security three elements. Subsequently, privacy protection technologies are classified. This paper summarizes four common privacy protection technologies used in FL: Secure Multiparty Computing (SMC), Homomorphic Encryption (HE), Differential Privacy (DP), and Trusted Execution Environment (TEE). Finally, the future research direction for FL is discussed.

Select

Artificial Intelligence and Pattern Recognition

Robot Path Planning Based on Improved DQN Algorithm

Qiru LI, Xia GENG

Computer Engineering. 2023, 49(12): 111-120. https://doi.org/10.19678/j.issn.1000-3428.0066348

Abstract (1201) Download PDF (1736) HTML (120)

Knowledge map

Save

The traditional Deep Q Network(DQN) algorithm solves the dimensionality problem of Q-learning algorithms in complex environments by integrating deep neural networks and reinforcement learning methods that are widely used in the path planning of mobile robots. However, the traditional DQN algorithm has a low network convergence speed and poor path planning effect, and consequently, obtaining the optimal path in a short training round is challenging. To solve these problems, an improved ERDQN algorithm is proposed. The Q value is recalculated by recording the frequency of the repeated states. The more times a state is repeated in the process of network training, the lower the probability of the next occurrence of the state. This phenomenon can improve the robot's ability to explore the environment, reduce the risk of network convergence to the local optima to a certain extent, and reduce the number of training rounds required for network convergence. The reward function is redesigned according to the moving direction of the robot, and the distance between the robot and target point. The robot obtains a positive reward when it is close to the target point and a negative reward when it is far from the target point. The absolute value of the reward is adjusted according to the current moving direction of the robot, and the distance between the robot and target point; thus, the robot can plan a better path while avoiding obstacles. The experimental results show that compared with the DQN algorithm, the average score of the ERDQN algorithm is increased by 18.9%, whereas the path length and number of planned rounds reduced by approximately 20.1% and 500, respectively. These results prove that the ERDQN algorithm can effectively improve network convergence speed and path planning performance.

Select

Cyberspace Security

Adversarial Example Generation Algorithm Based on Transformer and GAN

Shuaiwei LIU, Zhi LI, Guomei WANG, Li ZHANG

Computer Engineering. 2024, 50(2): 180-187. https://doi.org/10.19678/j.issn.1000-3428.0067077

Abstract (1189) Download PDF (1273) HTML (66)

Knowledge map

Save

Adversarial attack and defense is a popular research area in computer security. Trans-GAN, an adversarial example generation algorithm based on the combination of Transformer and Generate Adversarial Network(GAN), is proposed to address the problems of the poor visual quality of existing gradient-based adversarial example generation methods and the low generation efficiency of optimization-based methods. First, the algorithm utilizes the powerful visual representation capability of the Transformer as a reconstruction network for receiving clean images and generating adversarial noise. Second, the Transformer reconstruction network is combined with a deep convolutional network-based discriminator as a generator to form a GAN architecture, which improves the authenticity of the generated images and ensures the stability of training. Meanwhile, the improved attention mechanism, Targeted Self-Attention, is proposed to introduce target labels as a priori knowledge when training the network, which guides the network model to learn to generate adversarial perturbations with specific attack targets. Finally, adversarial noise is added to the clean examples using skip-connections to form adversarial examples. Experimental results demonstrate that the proposed algorithm achieves an attack success rate of more than 99.9% on both models used for the MNIST dataset and 96.36% and 98.47% on the two models used for the CIFAR10 dataset, outperforming the current state-of-the-art generative-based adversarial attack methods. The qualitative results show that compared to the Fast Gradient Sign Method(FGSM)and Projected Gradient Descent(PGD)algorithms, the generated adversarial noise of the Trans-GAN algorithm is less perturbed, and the formed adversarial examples are more natural and meet the requirements of human vision, which is not easily distinguished.

Select

Cyberspace Security

Federated Learning Optimization Method in Non-IID Scenarios

Huawei SONG, Shengqi LI, Fangjie WAN, Yuping WEI

Computer Engineering. 2024, 50(3): 166-172. https://doi.org/10.19678/j.issn.1000-3428.0067791

Abstract (1050) Download PDF (1173) HTML (66)

Knowledge map

Save

Federated Learning(FL) can collaborate to train global models without compromising data privacy. Nonetheless, this collaborative training approach faces the challenge of Non-IID in the real world; slow model convergence and low accuracy. Numerous existing FL methods improve only from one perspective of global model aggregation and local client update, and inevitably will not cause the impact of the other perspective and reduce the quality of the global model. In this context, we introduce a hierarchical continuous learning optimization method for FL, denoted as FedMas, which is based on the idea of hierarchical fusion. First, clients with similar data distribution are divided into different layers using the DBSCAN algorithm, and only part of clients of a certain layer are selected for training each time to avoid weight differences caused by different data distributions when the server global model is aggregated. Further, owing to the different data distributions of each layer, the client combines the solution of continuous learning catastrophic forgetting during local update to effectively integrate the differences between the data of different layers of clients, thus ensuring the performance of the global model. Experiments on MNIST and CIFAR-10 standard datasets demonstrate that the global model test accuracy is improved by 0.3-2.2 percentage points on average compared with FedProx, Scaffold, and FedCurv FL algorithms.

Select

Graphics and Image Processing

Remote Sensing Image Detection Based on Perceptually Enhanced Swin Transformer

Bingyan ZHU, Zhihua CHEN, Bin SHENG

Computer Engineering. 2024, 50(1): 216-223. https://doi.org/10.19678/j.issn.1000-3428.0066941

Abstract (1029) Download PDF (1538) HTML (88)

Knowledge map

Save

Owing to the rapid development of remote sensing technology, remote sensing image detection technology is being used extensively in agriculture, military, national defense security, and other fields. Compared with conventional images, remote sensing images are more difficult to detect; therefore, researchers have endeavored to detect remote sensing images efficiently and accurately. To address the high calculation complexity, large-scale range variation, and scale imbalance of remote sensing images, this study proposes a perceptually enhanced Swin Transformer network, which improves the detection of remote sensing images. Exploiting the hierarchical design and shift windows of the basic Swin Transformer, the network inserts spatial local perceptually blocks into each stage, thus enhancing local feature extraction while negligibly increasing the calculation amount. An area-distributed regression loss is introduced to assign larger weights to small objects for solving scale imbalance; additionally, the network is combined with an improved IoU-aware classification loss to eliminate the discrepancy between different branches and reduce the loss of classification and regression. Experimental results on the public dataset DOTA show that the proposed network yields a mean Average Precision(mAP) of 78.47% and a detection speed of 10.8 frame/s, thus demonstrating its superiority over classical object detection networks(i.e., Faster R-CNN and Mask R-CNN) and existing excellent remote sensing image detection networks. Additionally, the network performs well on all types of objects at different scales.

Select

Artificial Intelligence and Pattern Recognition

Chinese Cross-modal Entity Alignment Method Based on Multi-modal Knowledge Graph

Huan WANG, Lijuan SONG, Fang DU

Computer Engineering. 2023, 49(12): 88-95. https://doi.org/10.19678/j.issn.1000-3428.0066938

Abstract (966) Download PDF (1045) HTML (48)

Knowledge map

Save

Interactive tasks involving multi-modal data present advanced requirements for the comprehensive utilization of knowledge from different modalities, leading to the emergence of multi-modal knowledge graphs. When constructing these graphs, accurately determining whether image and text entities refer to the same object is particularly important for entity alignment of Chinese cross-modal entities. To address this problem, a Chinese cross-modal entity alignment method based on a multi-modal knowledge graph is proposed. Image information is introduced into the entity alignment task, and a single and dual-stream interactive pre-trained language model, namely CCMEA, is designed for domain-specific, fine-grained images and Chinese text. Utilizing a self-supervised learning method, Text-Visual features are extracted using Text-Visual Encoder, and fine-grained modeling is performed using cross-coders. Finally, a comparison learning method is employed to evaluate the degree of alignment between image and text entities. The experimental results show that the Mean Recall(MR) of the CCMEA model improved by 3.20 and 11.96 percentage points compared to that of the Wukong_ViT-B baseline model on the MUGE and Flickr30k-CN datasets, respectively. Furthermore, the model achieved a remarkable MR of 94.3% on the self-built TEXTILE dataset. These results demonstrate that the proposed method can effectively align Chinese cross-modal entities with high accuracy in practical applications.

Select

Research Hotspots and Reviews

Review of Text Similarity Calculation Methods

WEI Wei, DING Xiangxiang, GUO Mengxing, YANG Zhao, LIU Hui

Computer Engineering. 2024, 50(9): 18-32. https://doi.org/10.19678/j.issn.1000-3428.0068086

Abstract (935) Download PDF (1210) HTML (78)

Knowledge map

Save

Text similarity calculation is a part of natural language processing and is used to calculate the similarity between two words, sentences, or texts in many application scenarios. Research on text similarity calculation plays an important role in the development of artificial intelligence. Text similarity calculation has conventionally been based on character string surfaces. With the introduction of word vectors, text similarity calculation can be modeled and calculated based on statistics and deep learning, in addition to combining it with pre-trained models. First, text similarity calculation methods can be divided into five categories: character string-based, word vector-based, pre-trained model-based, deep learning-based, and other methods. Each category is briefly introduced. Subsequently, according to the principles of the different text similarity calculation methods, common methods such as the edit distance, Hamming distance, bag of words model, Vector Space Model (VSM), Deep Structured Semantic Model (DSSM), and Simple Contrastive learning of Sentence Embedding (SimCSE) are discussed. Finally, commonly used data sets and evaluation criteria for text similarity calculation are sorted and analyzed, and the future development of text similarity calculation is prospected.

Select

Research Hotspots and Reviews

Survey of Fatigue Driving Detection Based on Facial Multi-Feature Fusion

Chang WANG, Leixiao LI, Yanyan YANG

Computer Engineering. 2023, 49(11): 1-12. https://doi.org/10.19678/j.issn.1000-3428.0066661

Abstract (800) Download PDF (1477) HTML (118)

Knowledge map

Save

The fatigue driving detection method based on computer vision has the advantage of being noninvasive and does not affect driving behavior, making it easy to apply in practical scenarios.With the development of computer technology, an increasing number of researchers are studying fatigue driving detection methods based on computer vision. Fatigue driving behavior is mainly reflected in the face and limbs. Furthermore, in the field of computer vision, facial behavior is easier to obtain than physical behavior. Therefore, facial-feature-based fatigue driving detection methods have become an important research direction in the field of fatigue driving detection. Various fatigue driving detection methods are analyzed comprehensively based on multiple facial features of drivers, and the latest research results worldwide are summarized.The specific behaviors of drivers with different facial features under fatigue conditions are introduced, and the fatigue driving detection process is discussed based on multiple facial features. Results from research conducted worldwide are classified based on different facial features, and different feature extraction methods and state discrimination methods are classified. The parameters used to distinguish driver fatigue status are summarized based on the various behaviors generated by different features in a state of fatigue. Furthermore, current research results on the use of facial multi-feature comprehensive discrimination for fatigue driving are described, and the similarities and differences of different methods are analyzed. On this basis, the shortcomings in the current field of fatigue driving detection based on facial multi-feature fusion are discussed, and future research directions in this field are described.

Select

Artificial Intelligence and Pattern Recognition

Text-Relation-Extraction Algorithm Based on Large-Language Model and Semantic Enhancement

LI Jingcan, XIAO Cuilin, QIN Xiaoting, XIE Xia

Computer Engineering. 2024, 50(4): 87-94. https://doi.org/10.19678/j.issn.1000-3428.0068501

Abstract (762) Download PDF (1178) HTML (42)

Knowledge map

Save

Relation extraction is a basic and important task that aims to extract the relations between entities from unstructured text. Recent developments show that Large-Language Model (LLM) and basic models can improve the performance of several Natural Language Processing (NLP) tasks. These models utilize the language-representation ability of deep-learning and pre-training models and can automatically learn the semantic features of relations. A method to effectively use of a large model for solving the problems of entity overlap and unsatisfactory information exchange is yet to be revealed. Hence, a relational-extraction model based on large language is proposed. First, the Large-Language model Meta AI (LLaMA) is adapted to the task in this study via fine-tuning. To extract relations, the self-attention mechanism is used to enhance the correlation between entity pairs and information sharing between entities. Subsequently, average pooling is performed to generalize an entire sentence. A filtering matrix is designed for entity pairs, part-of-speech information is introduced to enhance semantics, and invalid triples are filtered out based on the relevance of entity pairs in the filtering matrix. Experimental results show that the F1 value results of the proposed model on the New York Times (NYT) and WebNLG open datasets are 93.1% and 90.4%, respectively. In the case where the LLaMA model becomes an encoder after fine-tuning, the proposed algorithm is superior to the baseline model in terms of accuracy and the F1 value index, thus verifying its effectiveness.

Select

Research Hotspots and Reviews

Review of Attention Mechanisms in Object Detection

REN Shuyu, WANG Xiaoding, LIN Hui

Computer Engineering. 2024, 50(12): 16-32. https://doi.org/10.19678/j.issn.1000-3428.0068553

Abstract (755) Download PDF (1140) HTML (81)

Knowledge map

Save

The superior performance of Transformer in natural language processing has inspired researchers to explore their applications in computer vision tasks. The Transformer-based object detection model, Detection Transformer (DETR), treats object detection as a set prediction problem, introducing the Transformer model to address this task and eliminating the proposal generation and post-processing steps that are typical of traditional methods. The original DETR model encounters issues related to slow training convergence and inefficiency in detecting small objects. To address these challenges, researchers have implemented various improvements to enhance DETR performance. This study conducts an in-depth investigation of both the basic and enhanced modules of DETR, including modifications to the backbone architecture, query design strategies, and improvements to the attention mechanism. Furthermore, it provides a comparative analysis of various detectors and evaluates their performance and network architecture. The potential and application prospects of DETR in computer vision tasks are discussed herein, along with its current limitations and challenges. Finally, this study analyzes and summarizes related models, assesses the advantages and limitations of attention models in the context of object detection, and outlines future research directions in this field.

Select

Research Hotspots and Reviews

Review of Natural Scene Text Detection Based on Deep Learning

Zhe LIAN, Yanjun YIN, Fei YUN, Min ZHI

Computer Engineering. 2024, 50(3): 16-27. https://doi.org/10.19678/j.issn.1000-3428.0067427

Abstract (730) Download PDF (1753) HTML (97)

Knowledge map

Save

Natural scene text detection technology based on deep learning has become a crucial research focal point in the fields of computer vision and natural language processing. Not only does it possess a wide range of potential applications but also serves as a new platform for researchers to explore neural network models and algorithms. First, this study introduces the relevant concepts, research background, and current developments in natural scene text detection technology. Subsequently, an analysis of recent deep learning-based text detection methods is performed, categorizing them into four classes: detection boxes-, segmentation-, detection-boxes and segmentation-based, and others. The fundamental concepts and main algorithmic processes of classical and mainstream methods within these four categories are elaborated, summarizing the usage mechanisms, applicable scenarios, advantages, disadvantages, simulation experimental results, and environment settings of different methods, while clarifying their interrelationships. Thereafter, common public datasets and performance evaluation methods for natural scene text detection are introduced. Finally, the major challenges facing current deep learning-based natural scene text detection technology are outlined, and future development directions are discussed.

Select

Research Hotspots and Reviews

Cloud Computing Resource Load Prediction Based on Improved Informer

Haoyang LI, Xiaowei HE, Bin WANG, Hao WU, Qi YOU

Computer Engineering. 2024, 50(2): 43-50. https://doi.org/10.19678/j.issn.1000-3428.0066399

Abstract (724) Download PDF (991) HTML (55)

Knowledge map

Save

Load prediction is an essential part of cloud computing resource management. Accurate prediction of cloud resource usage can improve cloud platform performance and prevent resource wastage. However, the dynamic and mutative use of cloud computing resources makes load prediction difficult, and managers cannot allocate resources reasonably. In addition, although Informer has achieved better results in time-series prediction, it does not impose restrictions on the causal dependence of time, causing future information leakage. Moreover, it does not consider the increase in network depth leading to model performance degradation. A multi-step load prediction model based on an improved Informer, known as Informer-DCR, is proposed. The regular convolution between attention blocks in the encoder is replaced by dilated causal convolution, such that the upper layer in the deep network can receive a wider range of input information to improve the prediction accuracy of the model, and ensure the causality of the time-series prediction process. Simultaneously, the residual connection is added to the encoder, such that the input information of the lower layer of the network is directly transmitted to the subsequent higher layer, and the deep network degradation is solved to improve the model performance. The experimental results demonstrate that compared with the mainstream prediction models such as Informer and Temporal Convolutional Network(TCN), the Mean Absolute Error(MAE) of the Informer-DCR model is reduced by 8.4%-40.0% under different prediction steps, and Informer-DCR exhibits better convergence than Informer during the training process.

Select

Graphics and Image Processing

Improved YOLOv8-based Algorithm for Instance Segmentation in Traffic Scenes

ZHAO Nannan, GAO Feichen

Computer Engineering. 2025, 51(1): 198-207. https://doi.org/10.19678/j.issn.1000-3428.0068677

Abstract (693) Download PDF (670) HTML (78)

Knowledge map

Save

An instance segmentation algorithm (DE-YOLO) based on the improved YOLOv8 is proposed. To decrease the effect of complex backgrounds in the images, efficient multiscale attention is introduced, and cross-dimensional interaction ensures an even spatial feature distribution within each feature group. In the backbone network, a deformable convolution using DCNv2 is combined with a C2f convolutional layer to overcome the limitations of traditional convolutions and increase flexibility. This is performed to reduce harmful gradient effects and improve the overall accuracy of the detector. The dynamic nonmonotonic Wise-Intersection-over-Union (WIoU) focusing mechanism is employed instead of the traditional Complete Intersection-over-Union (CIoU) loss function to evaluate the quality, optimize detection frame positioning, and improve segmentation accuracy. Meanwhile, Mixup data enhancement processing is enabled to enrich the training features of the dataset and improve the learning ability of the model. The experimental results demonstrate that DE-YOLO improves the mean Average Precision of mask(mAP_mask) and mAP_mask@0.5 by 2.0 and 3.2 percentage points compared with the benchmark model YOLOv8n-seg in the Cityscapes dataset of urban landscapes, respectively. Furthermore, DE-YOLO maintains an excellent detection speed and small parameter quantity while exhibiting improved accuracy, with the model requiring 2.2-31.3 percentage points fewer parameters than similar models.

Select

Artificial Intelligence and Pattern Recognition

Mobile Robot Path Planning by Improved A^* Algorithm Fused with Improved Dynamic Window Approach

Zhite WANG, Liping LUO, Yikui LIAO

Computer Engineering. 2024, 50(8): 86-101. https://doi.org/10.19678/j.issn.1000-3428.0068483

Abstract (681) Download PDF (2256) HTML (38)

Knowledge map

Save

To satisfy the performance requirements for robot path planning, an algorithm integrating improved A^* algorithm and improved Dynamic Window Approach(DWA) is proposed, which shortens the path length and improves the searching efficiency and path smoothness. To combat the challenges of the traditional A^* algorithm in complex scenarios, a new heuristic function is designed based on Manhattan distance and the diagonal distance. The weights are assigned dynamically, and the global shortest path and the least searching time are obtained. Next, an improved search strategy based on the 8-neighborhood is proposed, which involves dynamically assigning the optimal search direction to the current node, thus improving the searching efficiency and reducing the time consumption compared to the traditional 8-neighborhood 8-direction search method. Subsequently, the Floyd algorithm is employed to remove redundant nodes, reduce the steering times, and shorten the path distance. Additionally, the traditional DWA faces certain challenges; for instance, the path is not globally optimal, the path planning may fail, or the path length may increase. To solve these problems, a keypoint densification strategy is proposed to modify the deflective path. Finally, the proposed improved A^* algorithm and fusion algorithm are compared with existing methods. The simulation results show that the improved A^* algorithm can generate the shortest global path in complex environments, reducing the average steering time by 16.3% and shortening the average path searching time by 55.66%. For the fused algorithm, the average path length and average runtime shorten by 6.1% and 14.7% in the temporary obstacle environment, respectively, and shorten by 1.6% and 39.8%, respectively, in the moving obstacle environment.

Select

Smart Education

Educational Knowledge Graph: Research Progress and Future Development—Analysis of Articles Published in Core Chinese Journals from 2013 to 2023

Huiqian LI, Baichang ZHONG

Computer Engineering. 2024, 50(7): 1-12. https://doi.org/10.19678/j.issn.1000-3428.0069539

Abstract (675) Download PDF (645) HTML (61)

Knowledge map

Save

The deep integration of knowledge graphs with education has promoted the development of smart education. However, there is a lack of literature on educational knowledge graphs currently, necessitating its improvement with regard to research normativity and content perspective. Four conclusions are presented from a systematic literature review of 55 important Chinese journal articles from the previous decade. First, the development of educational knowledge graphs requires five key technologies: ontology construction, knowledge extraction, knowledge representation, knowledge fusion, and knowledge reasoning. Deep learning methods are becoming a popular research topic in this context. Second, in the context of applicability, the educational knowledge graphs cover six application scenarios: personalized learning recommendations, intelligent Question-Answering (Q&A), teaching resource management, intelligent search, intelligent learning diagnosis, and classroom teaching analysis, and the horizon of applications is continuously expanding. Third, regarding application effects, the educational knowledge graphs promote personalized learning and fragmented ubiquitous learning of students while improving their learning performance as well as professionalism of teachers. Fourth, the education knowledge graphs suffer from several problems and challenges, such as single data modality, lack of quality datasets, low level of automation and borderline technology, high level of difficulty in knowledge modeling, insufficient competence care, lack of interoperability standards, and low rate of educational adoption. Hence, for further insight into the study, future research should refine the theory and establish standards, optimize techniques, achieve accurate modeling, and strengthen applications and lifting effects.

Select

Graphics and Image Processing

Small Object Detection Algorithm for Agricultural Pest Images in Field Environments

Xinlu JIANG, Tianen CHEN, Cong WANG, Chunjiang ZHAO

Computer Engineering. 2024, 50(1): 232-241. https://doi.org/10.19678/j.issn.1000-3428.0067030

Abstract (659) Download PDF (1481) HTML (45)

Knowledge map

Save

Intelligent pest detection is an essential application of target detection technology in the agricultural field. This detection method effectively improves the efficiency and reliability of pest detection and reporting work and ensures crop yield and quality. Under fixed-trapping devices such as insect traps and sticky insect boards, the image background is simple, the lighting conditions are stable, and the pest features are significant and easy to extract. Pest detection can achieve high accuracy, but its application scenario is fixed, and the detection range is limited to the surrounding equipment and cannot adapt to complex field environments. A small object pest detection model called Pest-YOLOv5 is proposed to improve the flexibility of pest detection and prediction to address the difficulties and missed detections attributed to complex image backgrounds and small pest sizes in field environments. By adding a Coordinate Attention(CA) mechanism in the feature extraction network and combining spatial and channel information, the ability to extract small object pest features is enhanced. The Bidirectional Feature Pyramid Network(BiFPN) structure is used in the neck connection section, and multi-scale features are combined to alleviate the problem of small object information loss caused by multiple convolutions. Based on this, SIoU and VariFocal loss functions are used to calculate losses, and the optimal classification loss weight coefficients are obtained experimentally, making the model more focused on object samples that are difficult to classify. The experimental results on a subset of the publicly available dataset, AgriPest, show that the Pest-YOLOv5 model has mAP_0.5 and recall of 70.4% and 67.8%, respectively, which are superior to those of classical object detection models, such as the original YOLOv5s model, SSD, and Faster R-CNN. Compared with the YOLOv5s model, the Pest-YOLOv5 model improves the mAP_0.5, mAP_0.50∶0.95, and recall by 8.1%, 7.9%, and 12.8%, respectively, enhancing the ability to detect targets.

Select

Artificial Intelligence and Pattern Recognition

Text Classification Method Based on Contrastive Learning and Attention Mechanism

Lai QIAN, Weiwei ZHAO

Computer Engineering. 2024, 50(7): 104-111. https://doi.org/10.19678/j.issn.1000-3428.0068132

Abstract (632) Download PDF (984) HTML (46)

Knowledge map

Save

Text classification is a basic task in the field of natural language processing and plays an important role in information retrieval, machine translation, sentiment analysis, and other applications. However, most deep learning models do not fully consider the rich information in training instances during inference, resulting in inadequate text feature learning. To leverage training instance information fully, this paper proposes a text classification method based on contrastive learning and attention mechanism. First, a supervised contrastive learning training strategy is designed to optimize the retrieval of text vector representations, thereby improving the quality of the retrieved training instances during the inference process. Second, an attention mechanism is constructed to learn the attention distribution of the obtained training text features, focusing on adjacent instance information with stronger relevance and capturing more implicit similarity features. Finally, the attention mechanism is combined with the model network, fusing information from adjacent training instances to enhance the ability of the model to extract diverse features and achieve global and local feature extraction. The experimental results demonstrate that this method achieves significant improvements on various models, including Convolutional Neural Network(CNN), Bidirectional Long Short-Term Memory(BiLSTM), Graph Convolutional Network(GCN), Bidirectional Encoder Representations from Transformers(BERT), and RoBERTa. For the CNN model, the macro F1 value is increased by 4.15, 6.2, and 1.92 percentage points for the THUCNews, Toutiao, and Sogou datasets, respectively. Therefore, this method provides an effective solution for text classification tasks.

Select

Graphics and Image Processing

Lightweight Small Object Detection Algorithm for Aerial Photography Based on Improved YOLOv8n: PECS-YOLO

WANG Shumeng, XU Huiying, ZHU Xinzhong, HUANG Xiao, SONG Jie, LI Yi

Computer Engineering. 2025, 51(9): 280-293. https://doi.org/10.19678/j.issn.1000-3428.0069353

Abstract (624) Download PDF (234) HTML (8)

Knowledge map

Save

In Unmanned Aerial Vehicle (UAV) aerial photography, targets are usually small targets with dense distribution and unobvious features, and the object scale varies greatly. Therefore, the problems of missing detection and false detection are easy to occur in object detection. In order to solve these problems, a lightweight small object detection algorithm based on improved YOLOv8n, namely PECS-YOLO, is proposed for aerial photography. By adding P2 small object detection layer in the Neck part, the algorithm combines shallow and deep feature maps to better capture details of small targets. A lightweight convolution, namely PartialConv, is introduced to a new structure of Cross Stage Partial PartialConv (CSPPC), to replace Concatenation with Fusion (C2f) in the Neck network to realized lightweight of the model. By using a model of Spatial Pyramid Pooling with Efficient Layer Aggregation Network (SPPELAN), small object features can be captured effectively. By adding Squeeze-and-Excitation (SE)attention mechanism in front of each detection head in the Neck part, the network can better focus on useful channels and reduce the interference of background noise on small object detection tasks in complex environments. Finally, EfficiCIoU is used as the boundary frame loss function, and the shape difference of the boundary frame is also taken into account, which enhances the detection ability of the model for small targets. Experimental results show that, compared YOLOv8n, the mean Average Precision at Intersection over Union (IoU) of 0.5 (mAP@0.5) and the mean Average Precision at IoU of 0.5∶0.95 (mAP@0.5∶0.95) of PECS-YOLO object detection algorithm on VisDrone2019-DET dataset are increased by 3.5% and 3.7% respectively, the number of parameters is reduced by about 25.7%, and detection speed is increased by about 65.2%. In summary, PECS-YOLO model is suitable for small object detection in UAV aerial photography.

Select

Research Hotspots and Reviews

Review of Deep Learning Applications in Spinal Image Segmentation

Baihao JIANG, Jing LIU, Dawei QIU, Liang JIANG

Computer Engineering. 2024, 50(3): 1-15. https://doi.org/10.19678/j.issn.1000-3428.0067502

Abstract (615) Download PDF (914) HTML (50)

Knowledge map

Save

Deep learning algorithms have the advantages of strong learning, strong adaptive, and unique nonlinear mapping abilities in spinal image segmentation. Compared with traditional segmentation methods, they can better extract key information from spinal images and suppress irrelevant information, which can assist doctors in accurately locating focal areas and realizing accurate and efficient segmentation. The application status of deep learning in spinal image segmentation is summarized and analyzed as concerns deep learning algorithms, types of spinal diseases, types of images, experimental segmentation results, and performance evaluation indicators. First, the background of the deep learning model and spinal image segmentation is described, and thereafter, the application of deep learning in spinal image segmentation is introduced. Second, several common types of spinal diseases are introduced, the difficulties in image segmentation are described, and common open datasets, image segmentation method flow, and image segmentation evaluation indicators are introduced in spinal image segmentation. Combined with specific experiments, the application progress of the Convolutional Neural Network(CNN) model, the U-Net model, and their improved models in the image segmentation of vertebrae, intervertebral discs, and spinal tumors are summarized and analyzed. Combined with previous experimental results and the current research progress of deep learning models, this paper summarizes the limitations of current clinical studies and the reasons for the insufficient segmentation effect, and proposes corresponding solutions to the existing problems. Finally, prospects for future studies and development are proposed.

Select

Research Hotspots and Reviews

Review of 2D Image Matching Algorithms Based on Deep Learning Features

HUANG Kaiji, YANG Hua

Computer Engineering. 2024, 50(10): 16-34. https://doi.org/10.19678/j.issn.1000-3428.0068580

Abstract (614) Download PDF (2212) HTML (58)

Knowledge map

Save

The objective of image matching is to establish correspondences between similar structures across two or more images. This task is fundamental to computer vision, with applications in robotics, remote sensing, and autonomous driving. With the advancements in deep learning in recent years, Two-Dimensional (2D) image matching algorithms based on deep learning have seen regular improvements in feature extraction, description, and matching. The performance of these algorithms in terms of matching accuracy and robustness has surpassed that of traditional algorithms, leading to significant advancements. First, this study summarizes 2D image matching algorithms based on deep learning features from the past ten years and categorizes them into three types: two-stage image matching based on local features, image matching of joint detection and description, and image matching without feature detection. Second, the study details the development processes, classification methods, and performance evaluation metrics of these three categories and summarizes their advantages and limitations. Typical application scenarios of 2D image matching algorithms are then introduced, and the effects of research progress in 2D image matching on its application domains are analyzed. Finally, the study summarizes the development trends of 2D image matching algorithms and discusses future prospects.

Select

Artificial Intelligence and Pattern Recognition

Steel Defect Detection Based on Improved YOLOv8 Algorithm

PENG Juhong, ZHANG Chi, GAO Qian, ZHANG Guangming, TAN Donghua, ZHAO Mingjun

Computer Engineering. 2025, 51(7): 152-160. https://doi.org/10.19678/j.issn.1000-3428.0069283

Abstract (610) Download PDF (393) HTML (55)

Knowledge map

Save

Steel surface defect detection technology in industrial scenarios is hindered by low detection accuracy and slow convergence speed. To address these issues, this study presents an improved YOLOv8 algorithm, namely a YOLOv8n-MDC. First, a Multi-scale Cross-fusion Network (MCN) is added to the backbone network. Establishing closer connections between the feature layers promotes uniform information transmission and reduces semantic information loss during cross-layer feature fusion, thereby enhancing the ability of the model to perceive steel defects. Second, deformable convolution is introduced in the module to adaptively change the shape and position of the convolution kernel, enabling a more flexible capture of the edge features of irregular defects, reducing information loss, and improving detection accuracy. Finally, a Coordinate Attention (CA) mechanism is added to embed position information into channel attention, solving the problem of position information loss and enabling the model to perceive the position and morphological features of defects, thereby enhancing detection precision and stability. Experimental results on the NEU-DET dataset show that the YOLOv8n-MDC algorithm achieves mAP@0.5 of 81.0%, which is 4.2 percentage points higher than that of the original baseline network. The algorithm has a faster convergence speed and higher accuracy; therefore, it meets the requirements of practical industrial production.

Select

40th Anniversary Celebration of Shanghai Computer Society

Review of Application of Artificial Intelligence in University Informatization

QI Fenglin, SHEN Jiajie, WANG Maoyi, ZHANG Kai, WANG Xin

Computer Engineering. 2025, 51(4): 1-14. https://doi.org/10.19678/j.issn.1000-3428.0070222

Abstract (587) Download PDF (847) HTML (67)

Knowledge map

Save

The rapid development of Artificial Intelligence (AI) has empowered numerous fields and significantly impacted society, establishing a solid technological foundation for university informatization services. This study explores the historical development of both AI and university informatization by analyzing their respective trajectories and interconnections. Although universities worldwide may focus on different aspects of AI in their digital transformation efforts, they universally demonstrate vast potential of AI in enhancing education quality and streamlining management processes. Thus, this study focuses on five core areas: teaching, learning, administration, assessment, and examination. It comprehensively summarizes typical AI-empowered application cases to demonstrate how AI effectively improves educational quality and management efficiency. In addition, this study highlights the potential challenges associated with AI applications in university informatization, such as data privacy protection, algorithmic bias, and technology dependence. Furthermore, common strategies for addressing these issues such as enhancing data security, optimizing algorithm transparency and fairness, and fostering digital literacy among both teachers and students are elaborated upon in this study. Based on these analyses, the study explores future research directions for AI in university informatization, emphasizing the balance technological innovation and ethical standards. It advocates for the establishment of interdisciplinary collaboration mechanisms to promote the healthy and sustainable development of AI in the field of university informatization.

Select

Intelligent Transportation

Vehicle Violation Detection Based on Improved YOLOv8 in Highway Service Areas

Wei CHEN, Xiaolong WANG, Yanwei ZHANG, Guocheng AN, Bo JIANG

Computer Engineering. 2024, 50(4): 11-19. https://doi.org/10.19678/j.issn.1000-3428.0068901

Abstract (576) Download PDF (674) HTML (66)

Knowledge map

Save

In highway service areas, complex environments such as lighting and weather changes can cause a sharp decline in vehicle detection accuracy. In addition, factors such as the inclination angle of the camera and the height of installation can increase false-negative and false-positive rates. To this end, a vehicle violation detection algorithm based on the improved YOLOv8 is proposed for highway service areas. First, the feature pyramid pooling layer of the YOLOv8 network, a Dilated Space Pyramid Pooling(DSPP) module, and a DSPP based on branch Attention(DSPPA) module are constructed to reduce the loss of semantic information in the backbone. The Branch Attention(BA) mechanism in DSPPA assigns different weights to the branches with varying degrees of contribution, making the model focus more on features that are suitable for the target size. Second, a parking space allocation strategy based on global matching is designed to effectively reduce the false-negative and false-positive rates of illegal parking detection in situations involving tilted views and overlapping vehicles. The experimental results show that the improved algorithm reduces the false-negative rate of parking violation detection from 15% to 8% and the false-positive rate from 7.5% to 6.1%, demonstrating considerable performance improvement in vehicle violation detection.

Select

Development Research and Engineering Application

Pedestrian Detection Algorithm for Scenic Spots Based on Improved YOLOv8

Xiangquan GUI, Shiqing LIU, Li LI, Qingsong QIN, Tangyan LI

Computer Engineering. 2024, 50(7): 342-351. https://doi.org/10.19678/j.issn.1000-3428.0068125

Abstract (576) Download PDF (773) HTML (49)

Knowledge map

Save

The TAPDataset pedestrian detection dataset is used in this study to address the issues of low detection accuracy, large number of algorithm parameters, and limitations of existing public datasets for small target detection in current scenic pedestrian detection. This dataset addresses the deficiencies of existing datasets regarding small target detection. Based on the YOLOv8 algorithm, a new model with high detection accuracy and low hardware requirements, called YOLOv8-L, is proposed. First, the lightweight convolution module DepthSepConv is introduced to reduce the number of parameters and computations of the model. Second, the BiFormer attention mechanism and CARAFE upsampling operator are used to enhance the model's semantic understanding of images and information fusion capability, significantly improving detection accuracy. Finally, a small target detection layer is added to extract more shallow features, effectively improving the model's performance for small target detection. The effectiveness of the algorithm is verified using the TAPDataset, VOC 2007, and TAP+VOC datasets. The experimental results show that compared with YOLOv8, the number of model parameters is reduced by 18.06% on the TAPDataset with unchanged FPS, mAP@0.5 improves by 5.51%, and mAP@0.5∶0.95 improves by 6.03%. On the VOC 2007 dataset, the number of parameters is reduced by 13.6%, with mAP@0.5 improving by 3.96% and mAP@0.5∶0.95 improving by 6.39%. On the TAP+VOC dataset, the number of parameters is reduced by 14.02%, with mAP@0.5 improving by 4.49% and mAP@0.5∶0.95 improving by 5.68%. The improved algorithm demonstrates stronger generalization performance and can be better applied to scenic pedestrian detection tasks.

Select

Artificial Intelligence and Pattern Recognition

Knowledge Graph Completion Based on Contrastive Learning and Language Model-Enhanced Embedding

ZHANG Hongchen, LI Linyu, YANG Li, SAN Chenjun, YIN Chunlin, YAN Bing, YU Hong, ZHANG Xuan

Computer Engineering. 2024, 50(4): 168-176. https://doi.org/10.19678/j.issn.1000-3428.0067543

Abstract (565) Download PDF (751) HTML (18)

Knowledge map

Save

A knowledge graph is a structured knowledge base comprising various types of knowledge or data units obtained through extraction and other processes. It is used to describe and represent information, such as entities, concepts, facts, and relationships. The limitations of Natural Language Processing(NLP) technology and the presence of noise in the texts of various knowledge or information units affect the accuracy of information extraction. Existing Knowledge Graph Completion(KGC) methods typically account for only single structural information or text semantic information, whereas the structural and text semantic information in the entire knowledge graph is disregarded. Hence, a KGC model based on contrastive learning and language model-enhanced embedding is proposed. The input entities and relationships are obtained using a pretrained language model to obtain the textual semantic information of the entities and relationships. The distance scoring function of the translation model is used to capture the structured information in the knowledge graph. Two negative sampling methods for contrastive learning are used to fuse contrastive learning to train the model to improve its ability to represent positive and negative samples. Experimental results show that compared with the Bidirectional Encoder Representations from Transformers for Knowledge Graph completion(KG-BERT) model, this model improves the average proportion of triple with ranking less than or equal to 10(Hits@10) indicator by 31% and 23% on the WN18RR and FB15K-237 datasets, respectively, thus demonstrating its superiority over other similar models.

Select

Cyberspace Security

Privacy Preserving Algorithm Using Federated Learning Against Attacks

WU Ruolan, CHEN Yuling, DOU Hui, ZHANG Yangwen, LONG Zhong

Computer Engineering. 2025, 51(2): 179-187. https://doi.org/10.19678/j.issn.1000-3428.0068705

Abstract (558) Download PDF (10422) HTML (26)

Knowledge map

Save

Federated learning is an emerging distributed learning framework that facilitates the collective engagement of multiple clients in global model training without sharing raw data, thereby effectively safeguarding data privacy. However, traditional federated learning still harbors latent security vulnerabilities that are susceptible to poisoning and inference attacks. Therefore, enhancing the security and model performance of federated learning has become imperative for precisely identifying malicious client behavior by employing gradient noise as a countermeasure to prevent attackers from gaining access to client data through gradient monitoring. This study proposes a robust federated learning framework that combines mechanisms for malicious client detection with Local Differential Privacy (LDP) techniques. The algorithm initially employs gradient similarity to identify and classify potentially malicious clients, thereby minimizing their adverse impact on model training tasks. Subsequently, a dynamic privacy budget based on LDP is designed, to accommodate the sensitivity of different queries and individual privacy requirements, with the objective of achieving a balance between privacy preservation and data quality. Experimental results on the MNIST, CIFAR-10, and Movie Reviews (MR) text classification datasets demonstrate that compared to the three baseline algorithms, this algorithm results in an average 3 percentage points increase in accuracy for sP-type clients, thereby achieving a higher security level with significantly enhanced model performance within the federated learning framework.

Select

Research Hotspots and Reviews

Fabric-based Up-chain Preprocessing Mechanism for Mass Transaction Data

Ying LIU, Yupeng MA, Fan ZHAO, Yi WANG, Tonghai JIANG

Computer Engineering. 2024, 50(1): 39-49. https://doi.org/10.19678/j.issn.1000-3428.0067004

Abstract (537) Download PDF (741) HTML (29)

Knowledge map

Save

Hyperledger Fabric is an alliance chain framework widely adopted both domestically and internationally. It exhibits characteristics such as numerous participating organizations, frequent transaction operations, and increased transaction conflicts in certain businesses based on Fabric technology. The multi-version concurrency control technology used in Fabric can partially resolve transaction conflicts as well as enhance system concurrency. However, this mechanism is imperfect and certain transaction data cannot be properly stored on the chain. To achieve complete, efficient, and trustworthy up-chain storage of massive transaction data, a data preprocessing mechanism based on the Fabric oracle machine is proposed. The Massive Conflict Preprocessing(MCPP) method is designed to ensure the integrity of transaction data with primary key conflicts through techniques including detection, monitoring, delayed submission, transaction locking, and reordering caching. Data transmission protection measures are introduced to utilize asymmetric encryption technology during transmission, preventing malicious nodes from forging authentication information and ensuring consistency before and after off-chain processing of transaction data. Theoretical analysis and experimental results demonstrate that this mechanism can effectively address concurrent conflict issues regarding up-chain massive transaction data in alliance chain platforms. When the transaction data scales reach 1 000 and 10 000, the MCPP method achieves time efficiency improvements of 38% and 21.4%, respectively, compared with the LMLS algorithm, with a success rate close to 100%. Thus, the proposed method exhibits efficiency and security, and does not impact Fabric system performance when concurrent conflicts do not occur.

Select

Research Hotspots and Reviews

Application of Deep Learning in Fingerprint Recognition

LI Shuo, ZHAO Chaoyang, QU Yinxuan, LUO Yaping

Computer Engineering. 2024, 50(12): 33-47. https://doi.org/10.19678/j.issn.1000-3428.0068276

Abstract (535) Download PDF (1021) HTML (63)

Knowledge map

Save

Fingerprint recognition is one of the earliest and most mature biometric recognition technologies that is widely used in mobile payments, access control and attendance in the civilian field, and in criminal investigation to retrieve clues from suspects. Recently, deep learning technology has achieved excellent application results in the field of biometric recognition, and provided fingerprint researchers with new methods for automatic processing and the application of fusion features to effectively represent fingerprints, which have excellent application results at all stages of the fingerprint recognition process. This paper outlines the development history and application background of fingerprint recognition, expounds the main processing processes of the three stages of fingerprint recognition, which are image preprocessing, feature extraction, and fingerprint matching, summarizes the application status of deep learning technology in specific links at different stages, and compares the advantages and disadvantages of different deep neural networks in specific links, such as image segmentation, image enhancement, direction field estimation, minutiae extraction, and fingerprint matching. Finally, some of the current problems and challenges in the field of fingerprint recognition are analyzed, and future development directions, such as building public fingerprint datasets, multi-scale fingerprint feature extraction, and training end-to-end fingerprint recognition models, are prospected.

Select

Research Hotspots and Reviews

Research on Heterogeneous Computing Scheduling Strategy for Kubeflow

Yi SUN, Huimei WANG, Ming XIAN, Hang XIANG

Computer Engineering. 2024, 50(2): 25-32. https://doi.org/10.19678/j.issn.1000-3428.0067396

Abstract (507) Download PDF (465) HTML (21)

Knowledge map

Save

Kubeflow is a project that integrates machine learning and cloud computing technology, integrating a large number of machine learning tools and providing a feasible solution for the deployment of production-grade machine learning platforms. Machine learning relies on specialized Graphics Processing Unit(GPU)s to improve training and inference speed. As the size of cloud computing clusters is dynamically adjusted, computing nodes of different computing architectures can be added or removed from the cluster, and traditional round-robin scheduling strategies cannot realize the dynamic adjustment of heterogeneous computing power resources. To solve the allocation and optimization problems of Kubeflow's heterogeneous computing power, improve the utilization rate of platform resources, and achieve load balancing, a cloud-based Central Processing Unit-GPU(CPU-GPU) heterogeneous computing power scheduling strategy is proposed. This scheduling strategy adopts two judgment indicators: weighted load balancing degree and priority, and fine-grained allocation of display memory to achieve granularity of computing power resources. The optimal deployment scheme of Pod is designed according to the resource weight matrix of each node in the cluster, and an improved genetic algorithm is used for optimal deployment. The experimental results show that this scheduling strategy performs better for parallel tasks. It can execute optimal loads under overflow of resource requests. Compared with the original platform-native strategy, the degree of resource fine-tuning is one order of magnitude higher, and the cluster load balancing performance is also significantly improved.

Select

Research Hotspots and Reviews

Survey of Uyghur Machine Translation Research

Halidanmu ABUDUKELIMU, Yutao HOU, Dengfeng YAO, Abudukelimu ABULIZI, Jishang CHEN

Computer Engineering. 2024, 50(1): 1-16. https://doi.org/10.19678/j.issn.1000-3428.0068124

Abstract (507) Download PDF (209) HTML (34)

Knowledge map

Save

As one of the important tasks in China's low-resource machine translation research, the development and application of Uyghur machine translation can better promote cultural exchanges and trade between different regions and ethnic groups.However, Uyghur, as an adhesive language, has problems such as complex morphology and a scarce corpus in the field of machine translation. In recent years, at different stages of the development of Uyghur machine translation, researchers have optimized and innovated algorithms and models to address its characteristics and achieved various research results; however, no systematic review has been conducted. The paper comprehensively reviews the related research on Uyghur machine translation and categorizes it into three types according to methods used: rule- and example-based Uyghur machine translation, statistics-based Uyghur machine translation, and neural network-based Uyghur machine translation. Related academic activities and corpus resources are also summarized. To further explore the potential of Uyghur machine translation, the ChatGPT model is adopted as a preliminary attempt of the Uyghur-Chinese machine translation task.The experimental results show that in the Few-shot scenario, the translation performance is higher and then decreases with an increase in the number of examples, and the best performance is for 10-shot. Also, the chain-of-thought approach does not demonstrate better translation ability in the Uyghur machine translation task. Finally, future research directions for Uyghur machine translation are proposed.

Select

Development Research and Engineering Application

Improved Multistage Edge-Enhanced Medical Image Segmentation Network of U-Net

HU Shuai, LI Hualing, HAO Dechen

Computer Engineering. 2024, 50(4): 286-293. https://doi.org/10.19678/j.issn.1000-3428.0067779

Abstract (503) Download PDF (1049) HTML (36)

Knowledge map

Save

Medical image segmentation accuracy plays a key role in clinical diagnosis and treatment. However, because of the complexity of medical images and diversity of target regions, existing medical image segmentation methods are limited to incomplete edge region segmentation and insufficient use of image context feature information. An improved Multistage Edge-Enhanced(MEE) medical image segmentation network of the U-Net, known as MDU-Net model, is proposed to solve these problems. First, a MEE module is added to the encoder structure to extract double-layer low-stage feature information, and the rich edge information in the feature layer is obtained by expanding the convolution blocks at different expansion rates. Second, a Detailed Feature Association(DFA) module integrating the feature information of adjacent layers is embedded in the skip connection to obtain deep-stage and multiscale context feature information. Finally, the feature information extracted from the different modules is aggregated in the corresponding feature layer of the decoder structure, and the final segmentation result is obtained by an upsampling operation. The experimental results on two public datasets show that compared with other models, such as Transformers make strong encoders for medical image segmentation(TransUNet), the MDU-Net model can efficiently use the feature information of different feature layers in medical images and achieve an improved segmentation effect in the edge region.

Select

Graphics and Image Processing

PCB Defect Detection Algorithm Based on Improved YOLOv7

ZHANG Xu, CHEN Cifa, DONG Fangmin

Computer Engineering. 2024, 50(12): 318-328. https://doi.org/10.19678/j.issn.1000-3428.0068588

Abstract (497) Download PDF (502) HTML (26)

Knowledge map

Save

Achieving enhanced detection accuracy is a challenging task in the field of PCB defect detection. To address this problem, this study proposes a series of improvement methods based on PCB defect detection. First, a novel attention mechanism, referred to as BiFormer, is introduced. This mechanism uses dual-layer routing to achieve dynamic sparse attention, thereby reducing the amount of computation required. Second, an innovative upsampling operator called CARAFE is employed. This operator combines semantic and content information for upsampling, thereby making the upsampling process more comprehensive and efficient. Finally, a new loss function based on the MPDIoU metric, referred to as the LMPDIoU loss function, is adopted. This loss function effectively addresses unbalanced categories, small targets, and denseness problems, thereby further improving image detection performance. The experimental results reveal that the model achieves a significant improvement in mean Average Precision (mAP) with a score of 93.91%, 13.12 percentage points higher than that of the original model. In terms of recognition accuracy, the new model reached a score of 90.55%, representing an improvement of 8.74 percentage points. These results show that the introduction of the BiFormer attention mechanism, CARAFE upsampling operator, and LMPDIoU loss function effectively improves the accuracy and efficiency of PCB defect detection. Thus, the proposed methods provide valuable references for research in industrial inspection, laying the foundation for future research and applications.

Select

Artificial Intelligence and Pattern Recognition

Chinese Scientific Literature Annotation Method Based on Large Language Model

YANG Dongju, HUANG Juntao

Computer Engineering. 2024, 50(9): 113-120. https://doi.org/10.19678/j.issn.1000-3428.0068400

Abstract (496) Download PDF (1103) HTML (36)

Knowledge map

Save

High-quality annotated data are crucial for Natural Language Processing(NLP) tasks in the field of Chinese scientific literature. A method of annotation based on a Large Language Model(LLM) was proposed to address the lack of high-quality annotated corpora and the issues of inconsistent and inefficient manual annotation in Chinese scientific literature. First, a fine-grained annotation specification suitable for multi-domain Chinese scientific literature was established to clarify entity types and annotation granularity. Second, a structured text annotation prompt template and a generation parser were designed. The annotation task of Chinese scientific literature was set up as a single-stage, single-round question-and-answer process in which the annotation specifications and text to be annotated were filled into the corresponding slots of the prompt template to construct the task prompt. This prompt was then injected into the LLM to generate output text containing annotation information. Finally, the structured annotation data were obtained by the parser. Subsequently, using prompt learning based on LLM, the Annotated Chinese Scientific Literature(ACSL) entity dataset was generated, which contains 10 000 annotated documents and 72 536 annotated entities distributed across 48 disciplines. For ACSL, three baseline models based on RoBERTa-wwm-ext, a configuration of the Robustly optimized Bidirectional Encoder Representations from Transformers(RoBERT) approach, were proposed. The experimental results demonstrate that the BERT+Span model performs best on long-span entity recognition in Chinese scientific literature, achieving an F1 value of 0.335. These results serve as benchmarks for future research.

Select

Artificial Intelligence and Pattern Recognition

Multi-Agent Reinforcement Learning Value Function Factorization Approach Based on Graph Neural Network

SUN Wenjie, LI Zongmin, SUN Haomiao

Computer Engineering. 2024, 50(5): 62-70. https://doi.org/10.19678/j.issn.1000-3428.0067919

Abstract (494) Download PDF (1085) HTML (37)

Knowledge map

Save

Collaborative cooperation between agents in partially observable situations is an important problem in Multi-Agent Reinforcement Learning(MARL). The value function factorization approach solves the credit assignment problem and effectively achieves collaborative cooperation between agents. However, existing value function factorization approaches depend only on individual value functions with local information and do not allow explicit information exchange between agents, making them unsuitable for complex scenarios. To address this problem, this study introduces communication in the value function factorization approach to provide effective nonlocal information to agents, helping them understand complex environments. Furthermore, unlike existing communication approaches, the proposed approach uses a multi-layer message passing architecture based on Graph Neural Network(GNN), which extracts useful information that must be exchanged between neighboring agents. Simultaneously, the model realizes the transition from non-communication to full communication and achieves global cooperation with a limited communication range, which is suitable for real-world applications where the communication range is constrained. The results of experiments in the StarCraft II Multi-Agent Challenge(SMAC) and Predator-Prey(PP) environments demonstrate that the average winning rate of this approach improves by 2-40 percentage points compared with those of baseline algorithms, such as QMIX and VBC, in four different scenarios of SMAC. Furthermore, the proposed approach effectively solves the PP problem in non-monotonic environments.

Select

Research Hotspots and Reviews

Survey on GPGPU and CUDA Unified Memory Research Status

PANG Wenhao, WANG Jialun, WENG Chuliang

Computer Engineering. 2024, 50(12): 1-15. https://doi.org/10.19678/j.issn.1000-3428.0068694

Abstract (491) Download PDF (628) HTML (47)

Knowledge map

Save

In the context of big data, the rapid advancement of fields such as scientific computing and artificial intelligence, there is an increasing demand for high computational power across various domains. The unique hardware architecture of the Graphics Processing Unit (GPU) makes it suitable for parallel computing. In recent years, the concurrent development of GPUs and fields such as artificial intelligence and scientific computing has enhanced GPU capabilities, leading to the emergence of mature General-Purpose Graphics Processing Units (GPGPUs). Currently, GPGPUs are one of the most important co-processors for Central Processing Units (CPUs). However, the fixed hardware configuration of the GPU after delivery and its limited memory capacity can significantly hinder its performance, particularly when dealing with large datasets. To address this issue, Compute Unified Device Architecture (CUDA) 6.0 introduces unified memory, allowing GPGPU and CPU to share a virtual memory space, thereby simplifying heterogeneous programming and expanding the GPGPU-accessible memory space. Unified memory offers a solution for processing large datasets on GPGPUs and alleviates the constraints of limited GPGPU memory capacity. However, the use of unified memory introduces performance issues. Effective data management within unified memory is the key to enhancing performance. This article provides an overview of the development and application of CUDA unified memory. It covers topics such as the features and evolution of unified memory, its advantages and limitations, its applications in artificial intelligence and big data processing systems, and its prospects. This article provides a valuable reference for future work on applying and optimizing CUDA unified memory.

Select

Graphics and Image Processing

Research on Traffic Sign Detection Based on CGS-Ghost YOLO

Hong ZHAO, Yubo FENG

Computer Engineering. 2023, 49(12): 194-204. https://doi.org/10.19678/j.issn.1000-3428.0066520

Abstract (489) Download PDF (617) HTML (29)

Knowledge map

Save

In tasks involving traffic sign detection, the YOLOv5 detection algorithm encounters several issues including missed detections, erroneous detections, and a complex model in complex environments and road conditions. To address these challenges, an improved CGS-Ghost YOLO detection model is proposed. YOLOv5 uses the focus module for sampling, which introduces more parameters. In this study, the StemBlock module is used to replace the focus module for sampling after input, which can reduce the number of parameters while maintaining the accuracy. CGS-Ghost YOLO uses a Coordinate Attention(CA) mechanism, which improves the semantic and location information within the features and enhances the feature extraction ability of the model. Additionally, a CGS convolution module, which combines the SMU activation function with GroupNorm(GN) normalization, is proposed. The CGS convolution module is designed to avoid the influence of the batch Size on the model during training and improve model performance. This study aims to use GhostConv to reduce the number of model parameters and effectively improve the detection accuracy of the model.The loss function, $ \alpha $-CIoU Loss+VFocal Loss, is used to solve the problem of unbalanced positive and negative samples in traffic sign detection tasks and improve the overall performance of the model. The neck part uses a Bi-FPN bidirectional feature pyramid network, ensuring that the multi-scale features of the detection target are effectively fused. The results of an experiment on the TT100K traffic sign detection dataset show that the detection accuracy of the improved CGS-Ghost YOLO model reaches 93.1%, which is 11.3 percentage points higher than the accuracy achieved by the original model. Additionally, the proposed network model reduces the model parameter quantity by 21.2 percentage points compared to the original model. In summary, the network model proposed in this study optimizes the convolution layer and the downsampling part, thus considerably reducing the model parameters while enhancing the model detection accuracy.

Select

Artificial Intelligence and Pattern Recognition

Deepfake Cross-Model Defense Method Based on Generative Adversarial Network

DAI Lei, CAO Lin, GUO Yanan, ZHANG Fan, DU Kangning

Computer Engineering. 2024, 50(10): 100-109. https://doi.org/10.19678/j.issn.1000-3428.0068106

Abstract (487) Download PDF (81) HTML (13)

Knowledge map

Save

To reduce social risks caused by the abuse of deepfake technology, an active defense method against deep forgery based on a Generative Adversarial Network (GAN) is proposed. Adversarial samples are created by adding imperceptible perturbation to original images, which significantly distorts the output of multiple forgery models. The proposed model comprises an adversarial sample generation module and an adversarial sample optimization module. The adversarial-sample generation module includes a generator and discriminator. After the generator receives an original image to generate a perturbation, the spatial distribution of the perturbation is constrained through adversarial training. By reducing the visual perception of the perturbation, the authenticity of the adversarial sample is improved. The adversarial sample optimization module comprises basic adversarial watermarking, deep forgery models, and discriminators. This module simulates black-box scenarios to attack multiple deep forgery models, thereby improving the attack and migration of adversarial samples. Training and testing are conducted on commonly used deepfake datasets Celebfaces Attributes (CelebA) and Labeled Faces in the Wild (LFW). Experimental results show that compared with existing active defense methods, the proposed method achieves a defense success rate exceeding 85% based on the cross-model active defense method and generates adversarial samples. Additionally, the method improves efficiency by 20-30 times compared with those of conventional algorithms.

Select

GraphicsandImage Processing

Multimodal Sentiment Analysis for Video Data

WU Xing, YIN Haoyu, YAO Junfeng, LI Weimin, QIAN Quan

Computer Engineering. 2024, 50(6): 218-227. https://doi.org/10.19678/j.issn.1000-3428.0067874

Abstract (482) Download PDF (572) HTML (20)

Knowledge map

Save

Multimodal sentiment analysis aims to extract and integrate semantic information from text, images, and audio data in order to identify the emotional states of speakers in online videos. Although, multimodal fusion methods have shown definite outcomes in this research area, previous studies have not adequately addressed the distribution differences between modes and the fusion of relational knowledge. Therefore, a multimodal sentiment analysis method is recommended. In this context, this study proposes the design of a Multimodal Prompt Gate (MPG) module. The proposed module can convert nonverbal information into prompts that fuse the context, filter the noise of nonverbal signals using text information, and obtain prompts containing rich semantic information to enhance information integration between the modalities. In addition, a contrastive learning framework from instance to label is proposed. This framework is used to distinguish the different labels in latent space at the semantic level to further optimize the model output. Experiments on three large-scale sentiment analysis datasets are conducted. The results show that the binary classification accuracy of the proposed method improves by approximately 0.7% compared to the suboptimal model, and the ternary classification accuracy improves by more than 2.5%, reaching 0.671. This method can provide a reference for introducing multimodal sentiment analysis in the fields of user profiling, video understanding, and AI interviews.

Select

Artificial Intelligence and Pattern Recognition

Robot Local Path Planning Based on Improved Artificial Potential Field Method

ZHANG Guosheng, LI Caihong, ZHANG Yaoyu, ZHOU Ruihong, LIANG Zhenying

Computer Engineering. 2025, 51(1): 88-97. https://doi.org/10.19678/j.issn.1000-3428.0068738

Abstract (481) Download PDF (1023) HTML (53)

Knowledge map

Save

This study proposes an improved Artificial Potential Field (APF) algorithm (called FC-V-APF) based on Fuzzy Control (FC) and a virtual target point method to solve the local minimum trap and path redundancy issues of the APF method in robot local path planning. First, a virtual target point obstacle avoidance strategy is designed, and the V-APF algorithm is constructed to help the robot overcome local minimum traps by adding an obstacle crossing mechanism and a target point update threshold. Second, a control strategy based on the cumulative angle sum is proposed to assist the robot in exiting a multi-U complex obstacle area. Subsequently, the V-APF and FC algorithms are combined to construct the FC-V-APF algorithm. The corresponding environment is evaluated using real-time data from the radar sensor and designed weight function, and a fuzzy controller is selected to output the auxiliary force to avoid obstacles in advance. Finally, a simulation environment is built on the Robot Operating System (ROS) platform to compare the path planning performance of the FC-V-APF algorithm with that of other algorithms. Considering path length, running time, and speed curves, the designed FC-V-APF algorithm can quickly eliminate traps, reduce redundant paths, improve path smoothness, and reduce planning time.

Select

Computer Architecture and Software Technology

Design of PCIe Verification Platform in SoC Environment Based on UVM

GAO Qiuchen, HU Yonghua

Computer Engineering. 2024, 50(9): 189-196. https://doi.org/10.19678/j.issn.1000-3428.0068240

Abstract (481) Download PDF (860) HTML (21)

Knowledge map

Save

The System of Chip (SoC) integrates multiple peripheral interfaces, the verification of which has become one of the most time-consuming steps in chip development. The PCIe protocol provides high-speed peer-to-peer serial interconnection services within the system, while supporting hot swapping, which has gradually become a universal bus protocol. When using conventional Hardware Description Languages (HDL) to validate PCIe interface designs, problems usually arise, such as difficulty in covering multiple design scenarios and boundary conditions in a short period, leading to insufficient verification. To address the above issues, this study utilizes Universal Verification Methodology (UVM) to build a PCIe interface validation platform. This platform adopts a UVM-defined framework and test classes, achieving top-level environmental integration and design of test constraints, with strong reusability and comprehensive verification. This implementation includes SoC system-level environmental integration, design, and connection of the modules to be tested, implementation of sequencer and monitor classes in the verification platform, and partial interface design. To ensure that the test cases cover as many design states and paths as possible, different functional points are divided deliberately, and constraint conditions are designed to evaluate the effectiveness and coverage of test cases using various coverage indicators. The experimental results show that the verification platform can curtail the verification cycle and increase the comprehensive coverage by more than 30%.

Select

Artificial Intelligence and Pattern Recognition

Enhanced Domain Multi-modal Entity Recognition Based on Knowledge Graph

Huayu LI, Zhikang ZHANG, Yang YAN, Yang YUE

Computer Engineering. 2024, 50(8): 31-39. https://doi.org/10.19678/j.issn.1000-3428.0068225

Abstract (476) Download PDF (953) HTML (44)

Knowledge map

Save

Addressing the limitations of Chinese Named Entity Recognition(NER) within specific domains, this paper proposes a model to enhance entity recognition accuracy by utilizing domain-specific Knowledge Graphs(KGs) and images. The proposed model leverages domain graphs and images to improve entity recognition accuracy in short texts related to computer science. The model employs a Bidirectional Encoder Representations from Transformers(BERT)-Bidirectional Long Short-Term Memory(BiLSTM)-Attention-based model to extract textual features, a ResNet152-based approach to extract image features, and a word segmentation tool to obtain noun entities from sentences. These noun entities are then embedded with KG nodes using BERT. The model uses cosine similarity to determine the most similar nodes in the KG for the segmented words in the sentence. It retains neighboring nodes with a distance of 1 from this node to generate an optimal matching subgraph for semantic enrichment of the sentence. A Multi-Layer Perceptron(MLP) is employed to map the textual, image, and subgraph features into the same space. A unique gating mechanism is utilized to achieve fine-grained cross-modal feature fusion between textual and image features. Finally, multimodal features are fused with subgraph features by using a cross-attention mechanism and are then fed into the decoder for entity labeling. Experimental comparisons with relevant baseline models conducted on Twitter2015, Twitter2017, and a self-constructed computer science dataset are presented. The results indicate that the proposed approach achieved precision, recall, and F1 value of 88.56%, 87.47%, and 88.01% on the domain dataset compared to the optimal baseline model, its F1 value increased by 1.36 percentage points, demonstrating the effectiveness of incorporating domain KGs for entity recognition.

Select

Development Research and Engineering Application

Similar Case Matching Model for Lending Cases

Faxin CAO, Yuanyuan SUN, Zhizheng WANG, Dinghao PAN, Hongfei LIN

Computer Engineering. 2024, 50(1): 306-312. https://doi.org/10.19678/j.issn.1000-3428.0066055

Abstract (475) Download PDF (153) HTML (14)

Knowledge map

Save

The purpose of Similar Case Matching(SCM) is to distinguish whether legal documents are similar, which is a specific application of text matching and is vital to the retrieval of similar cases. Compared with conventional texts, legal texts are typically longer, and SCM aims to realize matching for the same case. Moreover, the difference between case texts is negligible; therefore, calculating text similarity using previous text-matching methods is challenging. This study establishes a SCM model that integrates key elements of lending cases to address the issues of text matching in lending cases. To obtain richer semantic features from texts, regular expressions are constructed to obtain specific case elements of lending cases, such as the loan-delivery form and the basic attributes of borrowers, which are then combined with the original case text to jointly learn the semantic features of the legal text and key elements of the case. Additionally, pretrained models with shared weights are used to encode different instruments separately, and the outputs of specific encoding layers of the pretrained models are fused to obtain richer semantic information. Finally, the proposed model incorporates a supervised comparison learning framework to utilize the text information more effectively and further improve the performance of SCM. Experiments on the CAIL2019-SCM dataset show that this model improves the accuracy of the test set by 1.05 percentage points compared with LFESM models.

Select

Development Research and Engineering Application

Ship AIS Trajectory Prediction Algorithm Based on Federated Learning

Chenjun ZHENG, Yan ZENG, Junfeng YUAN, Jilin ZHANG, Xin WANG, Meng HAN

Computer Engineering. 2024, 50(2): 298-307. https://doi.org/10.19678/j.issn.1000-3428.0067829

Abstract (473) Download PDF (1196) HTML (17)

Knowledge map

Save

Federated learning, a distributed machine learning method, effectively addresses the data island problem in environments with weak communication. This study introduces an algorithm for predicting ship trajectories, employing the Fedves federated learning framework and a Convolutional Neural Network-Gated Recurrent Unit(CNN-GRU) model, called E-FVTP. The Fedves framework standardizes dataset sizes and client regularization terms, mitigating the influence of non-independent and identically distributed features on the global model. This approach preserves original client data features, thereby accelerating the convergence speed. In maritime scenarios with limited communication resources, the CNN-GRU model utilizes Automatic Identification System(AIS) data to overcome the computational limitations of vessel terminals. Experimental evaluations on the open-source MarineCadastre and Zhoushan marine ship navigation AIS datasets demonstrate that E-FVTP reduces prediction error by 40% compared to centralized training methods. It also achieves a 67% faster convergence rate and reduces communication costs by 76.32%. These advancements enable accurate vessel trajectory predictions in complex maritime settings, significantly ensuring maritime traffic safety.

Select

Cyberspace Security

Encrypted Malicious Traffic Identification Based on CNN CBAM-BiGRU Attention

Xin DENG, Zhaohui LIU, Yan OUYANG, Jianhua CHEN

Computer Engineering. 2023, 49(11): 178-186. https://doi.org/10.19678/j.issn.1000-3428.0066558

Abstract (469) Download PDF (195) HTML (15)

Knowledge map

Save

Encrypting network traffic helps protect data security and user privacy; however, encryption also hides the characteristics of the data, making it difficult to identify malicious traffic. To address the problem of reliance on expert experience in traditional machine learning methods and insufficient representation of traffic in existing deep learning methods, this paper proposes a CNN CBAM-BiGRU Attention model to automatically extract spatial and temporal features without decryption, thereby enhancing the characterization of encrypted traffic features. The model is divided into two parts: spatial and temporal feature extraction. The spatial features are extracted by one-dimensional convolution kernels of different sizes. To prevent loss of spatial features, the parameters of the convolutional layer are modified to replace the feature compression and redundancy removal of the pooling layer, and CBAM is used to weight the extracted spatial features of different scales, so that the model can focus on spatial features with high differentiation. The time sequence feature selects the BiGRU to characterize the timing dependencies between data packets, whereby Attention is used to strengthen the role of important data packets. Finally, the two feature vectors are fused, and the Softmax classifier is used for binary classification as well as multi-classification. In the experiments conducted on public datasets, the proposed model achieved an accuracy of 99.95% in identifying encrypted malicious traffic in binary classification tasks, and an overall accuracy of 99.39% in multi-classification tasks. The F1 scores for encrypted malicious traffic in the Dridex and Zbot categories were significantly improved compared to those of 1D_CNN and BiGRU models.

Select

Artificial Intelligence and Pattern Recognition

Graph Attention Mechanism-based Method for Trajectory Prediction in Map-Free Scenes

Jianmin LIU, Hui LIN, Xiaoding WANG

Computer Engineering. 2024, 50(7): 144-153. https://doi.org/10.19678/j.issn.1000-3428.0068163

Abstract (466) Download PDF (309) HTML (19)

Knowledge map

Save

Existing trajectory prediction methods rely heavily on high-definition maps, which are time-consuming, costly, and complex to acquire. This makes it difficult for them to quickly adapt to the widespread adoption of intelligent transportation. To address the problem of vehicle trajectory prediction in map-free scenes, a trajectory prediction method based on spatio-temporal features of multi-modal data is proposed in this paper. Multiple spatio-temporal interaction graphs are constructed from the history of the trajectory, temporal and spatial attention are cross-utilized and deeply fused to model the spatio-temporal correlations between vehicles on the road. Finally, a residual network is used for a multi-objective and multi-modal trajectory generation. The model is trained and tested on the real dataset, Argoverse 2, and the experimental results show that compared with the CRAT-Pred, this model can improve minADE, minFDE and Miss Rate(MR) metrics in single-modal prediction by 3.86%, 3.89%, and 0.48%, and in multi-modal prediction by 0.78%, 0.96% and 0.42%. Hence, the proposed trajectory prediction method can efficiently capture the temporal and spatial characteristics of vehicle movement trajectories and can be effectively applied in related fields such as autonomous driving.

Select

Graphics and Image Processing

Improved YOLOv7 Algorithm for Crowded Pedestrian Detection

Fangxin XU, Rong FAN, Xiaolu MA

Computer Engineering. 2024, 50(3): 250-258. https://doi.org/10.19678/j.issn.1000-3428.0067741

Abstract (455) Download PDF (727) HTML (53)

Knowledge map

Save

Aiming at the problem that the detection algorithm is prone to omission and false detection in crowded pedestrian detection scenarios, this study proposes an improved YOLOv7 crowded pedestrian detection algorithm. Introducing a BiFormer visual transformer and an improved RepConv and Channel Space Attention Module (CSAM)-based Efficient Layer Aggregation Network (RC-ELAN) module in the backbone network, the self-attention mechanism and the attention module enable the backbone network to focus more on the important features of the occluded pedestrians, effectively mitigating the adverse effects of the missing target features on the detection. The improved neck network based on the idea of a Bidirectional Feature Pyramid Network (BiFPN) is used, and the transposed convolution and improved Rep-ELAN-W module enable the model to efficiently utilize the small-target feature information in the middle and low-dimensional feature maps, effectively improving the small-target pedestrian detection performance of the model. The introduction of an Efficient Complete Intersection-over-Union (E-CIoU) loss function allows the model to further converge to a higher accuracy. Experimental results on the WiderPerson dataset containing a large number of small target-obscuring pedestrians demonstrate that the average accuracies of the improved YOLOv7 algorithm when the IoU thresholds are set to 0.5 and 0.5-0.95 are improved by 2.5 and 2.8, 9.9 and 7.1, and 12.3 and 10.7 percentage points compared with the YOLOv7, YOLOv5, and YOLOX algorithms, respectively, which can be better applied to crowded pedestrian detection scenarios.

Select

Frontiers in Computer Systems

Research on Manycore On-chip Storage Hierarchy for Exascale Supercomputer Systems

Yanfei FANG, Qi LIU, Enming DONG, Yanbing LI, Feng GUO, Di WANG, Wangquan HE, Fengbin QI

Computer Engineering. 2023, 49(12): 10-24. https://doi.org/10.19678/j.issn.1000-3428.0066548

Abstract (452) Download PDF (909) HTML (64)

Knowledge map

Save

Manycore has become the mainstream processor architecture for building HPC supercomputer systems, providing powerful computing power for High Performance Computing(HPC) exascale supercomputers. With the increasing number of cores integrated on manycore processor chips, the competition for large-scale cores for memory resources has become more intense. Manycore on-chip memory hierarchy is an important structure that alleviates the "memory wall" problem, aids HPC applications better play the computing advantages of manycore processors, and improves the performance of practical applications. The design has a significant impact on the performance, power consumption, and area of an on-chip system. The design of a many-call on-chip memory hierarchy has a significant impact on the performance, power consumption, and area of manycore systems. It is an important part of the structural design of manycore systems and is a research interest in the industry. Owing to the differences in the development history of manycore chips, the design technology of on-chip microarchitecture, and the different requirements of the application fields, the current HPC mainstream manycore on-chip storage hierarchy is different; however, from the perspective of horizontal comparison and the vertical development trend of each processor, as well as from the changes in application requirements brought by the continuous integration and development of HPC, data science, and machine learning, the hybrid structure of the SPM+Cache would most likely become the mainstream choice for the on-chip storage hierarchy designs of manycore processors in HPC exascale supercomputer systems in the future. For exascale computing software and algorithms, the designs and optimization based on the characteristics of the manycore memory hierarchy can aid HPC applications benefit from the computing advantages of manycore processors, thus effectively improving the performance of practical applications. Therefore, software, algorithm design, and optimization technology for the characteristics of the manycore on-chip storage hierarchy is also a research interest in the industry. This study first partitioned the on-chip memory hierarchy into multilevel Cache, SPM, and SPM+Cache hybrid structures according to different organizations, and then summarized and analyzed the advantages and disadvantages of these structures. This study analyzed the current status and development trend of the memory hierarchy designs of the chips of mainstream exascale supercomputer systems, such as the international mainstream GPU, homogeneous manycore, and domestic manycore. In summary, the research status of software and hardware technologies is related to the design and optimization of the memory hierarchy from the manycore of the manycore LLC management and cache consistency protocol, SPM management and data movement optimization, and the global perspective optimization of the SPM+cache hybrid architecture. Thus, this study looks forward to the future research direction of on-chip memory hierarchy based on different perspectives, such as hardware, software, and algorithm designs.

Select

Artificial Intelligence and Pattern Recognition

Improved Honey Badger Algorithm Based on Multi-Strategy and Its Applications

Haiyun XIANG, Hongxin LI, Xiao FU, Xiaoping SU

Computer Engineering. 2023, 49(12): 78-87. https://doi.org/10.19678/j.issn.1000-3428.0066465

Abstract (448) Download PDF (137) HTML (29)

Knowledge map

Save

The Honey Badger Algorithm(HBA) is a new type of intelligent optimization algorithm that simulates the foraging behavior of honey badgers. It has the characteristics of a simple structure and fast convergence speed. A multi-strategy improved Honey Badger algorithm(MSHBA) is proposed to address the issues of low convergence accuracy, slow convergence speed, and insufficient global optimization ability of the HBA to solve high-dimensional complex problems. It designs a restricted reverse learning mechanism that updates the population with restricted reverse solutions generated through algorithm iteration, improved population quality, and accelerated algorithm convergence speed. MSHBA introduces adaptive weight factors to adjust the optimization step size for different optimization paths as the number of iterations changes, thus coordinating different exploration stages of the algorithm, improving stability, and accelerating convergence speed; and construct a new hungry search strategy that changes the optimization step size for the optimization path based on population energy and global worst-case position to prevent premature convergence. Based on nine standard test functions, simulation experiments are conducted on the MSHBA, HBA, Whale Optimization, Harris Hawks, and single strategy in different dimensions. The results show that the MSHBA has better stability and convergence accuracy. The algorithm is applied to mechanical design optimization problems and the results are compared. Compared with the original HBA, the MSHBA achieved 88% performance optimization, confirming its suitability for solving high-dimensional complex problems.

Most Read

Please choose a citation manager

Content to export

模态框（Modal）标题

Most Read

Please choose a citation manager

Content to export