Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Most Read

  • Published in last 1 year
  • In last 2 years
  • In last 3 years
  • All

Please wait a minute...
  • Select all
    |
  • Graphics and Image Processing
    Yang LIU, Jun CHEN, Shijia HU, Jiahua LAI
    Computer Engineering. 2023, 49(10): 247-254. https://doi.org/10.19678/j.issn.1000-3428.0065825
    Abstract (1857) Download PDF (1183) HTML (86)   Knowledge map   Save

    In the mainstream feature-based Simultaneous Localization and Mapping(SLAM) method, feature matching is a key step in estimating camera motion. However, the local characteristics of image features cause widespread mismatch and have become a major bottleneck in visual SLAM. In addition, the sparse maps generated by the feature-based method can only be used for localization, as they do not satisfy higher-level requirements. To address the problems of low efficiency in ORB feature point matching and failure to generate dense maps in ORB-SLAM3, an improved ORB Grid-based Motion Statistics(ORB-GMS) matching strategy is proposed, whereby a dense point cloud construction thread is added to ORB-SLAM3 to realize dense mapping. The motion smoothness constraint is used for the feature point motion statistics method, and the number of matches in the feature point neighborhood and threshold are compared to efficiently determine whether the current match is correct. The gridded images are used for fast computation to perform camera pose estimation. Finally, the dense point cloud map is constructed according to the key frame and the corresponding pose, using the outlier point removal and voxel-grid filters to reduce the size of the point cloud. The experimental results on the RGB-D dataset of TUM show that compared with ORB-SLAM3, the proposed algorithm can reduce matching time by approximately 50% and average positioning error by 32%, while increasing the number of matches by an average of 60%. In addition, compared to sparse maps, this method generates dense point cloud maps that are easy for secondary processing, thereby expanding the application scenarios of the algorithm.

  • Artificial Intelligence and Pattern Recognition
    Zhiqiang WU, Qing XIE, Lin LI, Yongjian LIU
    Computer Engineering. 2024, 50(1): 91-100. https://doi.org/10.19678/j.issn.1000-3428.0066929
    Abstract (1524) Download PDF (1745) HTML (142)   Knowledge map   Save

    Many existing Graph Neural Network(GNN) recommendation algorithms use the node number information of the user-item interaction graph for training and learn the high-order connectivity among user and item nodes to enrich their representations. However, user preferences for different modal information are ignored, modal information such as images and text of items are not utilized, and the fusion of different modal features is summed without distinguishing the user preferences for different modal information types. A multimodal fusion GNN recommendation model is proposed to address this problem. First, for a single modality, a unimodal graph network is constructed by combining the user-item interaction bipartite graph, and the user preference for this modal information is learned in the unimodal graph. Graph ATtention(GAT) network is used to aggregate the neighbor information and enrich the local node representation, and the Gated Recurrent Unit(GRU) is used to decide whether to aggregate the neighbor information to achieve the denoising effect. Finally, the user and item representations learned from each modal graph are fused by the attention mechanism to obtain the final representation and then sent to the prediction module. Experimental results on the MovieLens-20M and H&M datasets show that the multimodal information and attention fusion mechanism can effectively improve the recommendation accuracy, and the algorithm model has significant improvements in Precision@K, Recall@K, and NDCG@K compared with the baseline optimal algorithm for the three indicators. When an evaluation index K value of 10 is selected, Precision@10, Recall@10, and NDCG@10 increase by 4.67%, 2.42%, 2.03%, and 2.49%, 5.24%, 2.05%, respectively, for the two datasets.

  • Artificial Intelligence and Pattern Recognition
    ZHAO Jida, ZHEN Guoyong, CHU Chengqun
    Computer Engineering. 2024, 50(4): 113-120. https://doi.org/10.19678/j.issn.1000-3428.0068268
    Abstract (1412) Download PDF (1609) HTML (157)   Knowledge map   Save
    In the Unmanned Aerial Vehicle(UAV) target detection task, missed and false detections are caused by the small size of the detection target and complex background of the detection image. To address the problem of small target detection, the UAV image target detection algorithm is proposed by improving YOLOv8s. First, for application scenarios where drone shooting targets are generally small, the number of Backbone layers of the algorithm is reduced, and the size of the feature map to be detected is increased such that the network model can focus more on small targets. Second, because a certain number of low-quality examples commonly influence the training effect in the dataset, the Wise-IoU loss function is introduced to enhance the training effect of the dataset. Third, by introducing a context enhancement module, the characteristic information of small targets in different receptive fields is obtained, and the positioning and classification effect of the network model on small targets in complex environments is improved. Finally, a spatial-channel filtering module is designed to enhance the characteristic information of the target during the convolution process to filter out useless interference information and address the problem of some small target characteristic information being submerged and lost during the convolution process. Experiment results on the VisDrone2019 dataset demonstrate that the average detection accuracy(mAP@0.5) of the proposed algorithm reaches 45.4%, which is 7.3 percentage points higher than that of the original YOLOv8s algorithm, and the number of parameters is reduced by 26.13%. Under similar experimental conditions, compared with other common small target detection algorithms, the detection accuracy and speed are improved to a certain extent.
  • Graphics and Image Processing
    Jiaxin LI, Jin HOU, Boying SHENG, Yuhang ZHOU
    Computer Engineering. 2023, 49(9): 256-264. https://doi.org/10.19678/j.issn.1000-3428.0065935
    Abstract (1299) Download PDF (1185) HTML (97)   Knowledge map   Save

    In remote sensing imagery, the detection of small objects poses significant challenges due to factors such as complex background, high resolution, and limited effective information. Based on YOLOv5, this study proposes an advanced approach, referred to as YOLOv5-RS, to enhance small object detection in remote sensing images. The presented approach employs a parallel mixed attention module to address issues arising from complex backgrounds and negative samples. This module optimizes the generation of a weighted feature map by substituting fully connected layers with convolutions and eliminating pooling layers. To capture the nuanced characteristics of small targets, the downsampling factor is tailored, and shallow features are incorporated during model training. At the same time, a unique feature extraction module combining convolution and Multi-Head Self-Attention (MHSA) is designed to overcome the limitations of ordinary convolution extraction by jointly representing local and global information, thereby extending the model's receptive field. The EIoU loss function is employed to optimize the regression process for both prediction and detection frames to enhance the localization capacity of small objects. The efficacy of the proposed algorithm is verified via experiments on datasets comprising small target remote sensing images. The results show that compared with YOLOv5s, the proposed algorithm has an average detection accuracy improvement of 1.5 percentage points, coupled with a 20% reduction in parameter count. Particularly, the proposed algorithm's average detection accuracy of small vehicle targets increased by 3.2 percentage points. Comparative evaluations against established methodologies such as EfficientDet, YOLOx, and YOLOv7 underscore the proposed algorithm's capacity to adeptly balance the dual objectives of detection accuracy and real-time performance.

  • Research Hotspots and Reviews
    XIONG Shiqiang, HE Daojing, WANG Zhendong, DU Runmeng
    Computer Engineering. 2024, 50(5): 1-15. https://doi.org/10.19678/j.issn.1000-3428.0067782
    Abstract (1239) Download PDF (2891) HTML (110)   Knowledge map   Save
    Federated Learning (FL) is a new distributed machine earning technology that only requires local maintenance of data and can train a common model through the cooperation of all parties, which mitigates issues pertaining to data collection and privacy security in conventional machine learning. However, with the application and development of FL, it is still exposed to various attacks. To ensure the security of FL, the attack mode in FL and the corresponding privacy protection technology must be investigated. Herein, first, the background knowledge and relevant definitions of FL are introduced, and the development process and classification of FL are summarized. Second, the security three elements of FL are expounded, and the security issues and research progress of FL are summarized from two perspectives based on security sources and the security three elements. Subsequently, privacy protection technologies are classified. This paper summarizes four common privacy protection technologies used in FL: Secure Multiparty Computing (SMC), Homomorphic Encryption (HE), Differential Privacy (DP), and Trusted Execution Environment (TEE). Finally, the future research direction for FL is discussed.
  • Artificial Intelligence and Pattern Recognition
    Qiru LI, Xia GENG
    Computer Engineering. 2023, 49(12): 111-120. https://doi.org/10.19678/j.issn.1000-3428.0066348
    Abstract (1139) Download PDF (1686) HTML (116)   Knowledge map   Save

    The traditional Deep Q Network(DQN) algorithm solves the dimensionality problem of Q-learning algorithms in complex environments by integrating deep neural networks and reinforcement learning methods that are widely used in the path planning of mobile robots. However, the traditional DQN algorithm has a low network convergence speed and poor path planning effect, and consequently, obtaining the optimal path in a short training round is challenging. To solve these problems, an improved ERDQN algorithm is proposed. The Q value is recalculated by recording the frequency of the repeated states. The more times a state is repeated in the process of network training, the lower the probability of the next occurrence of the state. This phenomenon can improve the robot's ability to explore the environment, reduce the risk of network convergence to the local optima to a certain extent, and reduce the number of training rounds required for network convergence. The reward function is redesigned according to the moving direction of the robot, and the distance between the robot and target point. The robot obtains a positive reward when it is close to the target point and a negative reward when it is far from the target point. The absolute value of the reward is adjusted according to the current moving direction of the robot, and the distance between the robot and target point; thus, the robot can plan a better path while avoiding obstacles. The experimental results show that compared with the DQN algorithm, the average score of the ERDQN algorithm is increased by 18.9%, whereas the path length and number of planned rounds reduced by approximately 20.1% and 500, respectively. These results prove that the ERDQN algorithm can effectively improve network convergence speed and path planning performance.

  • Cyberspace Security
    Shuaiwei LIU, Zhi LI, Guomei WANG, Li ZHANG
    Computer Engineering. 2024, 50(2): 180-187. https://doi.org/10.19678/j.issn.1000-3428.0067077
    Abstract (1121) Download PDF (1261) HTML (64)   Knowledge map   Save

    Adversarial attack and defense is a popular research area in computer security. Trans-GAN, an adversarial example generation algorithm based on the combination of Transformer and Generate Adversarial Network(GAN), is proposed to address the problems of the poor visual quality of existing gradient-based adversarial example generation methods and the low generation efficiency of optimization-based methods. First, the algorithm utilizes the powerful visual representation capability of the Transformer as a reconstruction network for receiving clean images and generating adversarial noise. Second, the Transformer reconstruction network is combined with a deep convolutional network-based discriminator as a generator to form a GAN architecture, which improves the authenticity of the generated images and ensures the stability of training. Meanwhile, the improved attention mechanism, Targeted Self-Attention, is proposed to introduce target labels as a priori knowledge when training the network, which guides the network model to learn to generate adversarial perturbations with specific attack targets. Finally, adversarial noise is added to the clean examples using skip-connections to form adversarial examples. Experimental results demonstrate that the proposed algorithm achieves an attack success rate of more than 99.9% on both models used for the MNIST dataset and 96.36% and 98.47% on the two models used for the CIFAR10 dataset, outperforming the current state-of-the-art generative-based adversarial attack methods. The qualitative results show that compared to the Fast Gradient Sign Method(FGSM)and Projected Gradient Descent(PGD)algorithms, the generated adversarial noise of the Trans-GAN algorithm is less perturbed, and the formed adversarial examples are more natural and meet the requirements of human vision, which is not easily distinguished.

  • Graphics and Image Processing
    Bingyan ZHU, Zhihua CHEN, Bin SHENG
    Computer Engineering. 2024, 50(1): 216-223. https://doi.org/10.19678/j.issn.1000-3428.0066941

    Owing to the rapid development of remote sensing technology, remote sensing image detection technology is being used extensively in agriculture, military, national defense security, and other fields. Compared with conventional images, remote sensing images are more difficult to detect; therefore, researchers have endeavored to detect remote sensing images efficiently and accurately. To address the high calculation complexity, large-scale range variation, and scale imbalance of remote sensing images, this study proposes a perceptually enhanced Swin Transformer network, which improves the detection of remote sensing images. Exploiting the hierarchical design and shift windows of the basic Swin Transformer, the network inserts spatial local perceptually blocks into each stage, thus enhancing local feature extraction while negligibly increasing the calculation amount. An area-distributed regression loss is introduced to assign larger weights to small objects for solving scale imbalance; additionally, the network is combined with an improved IoU-aware classification loss to eliminate the discrepancy between different branches and reduce the loss of classification and regression. Experimental results on the public dataset DOTA show that the proposed network yields a mean Average Precision(mAP) of 78.47% and a detection speed of 10.8 frame/s, thus demonstrating its superiority over classical object detection networks(i.e., Faster R-CNN and Mask R-CNN) and existing excellent remote sensing image detection networks. Additionally, the network performs well on all types of objects at different scales.

  • Cyberspace Security
    Huawei SONG, Shengqi LI, Fangjie WAN, Yuping WEI
    Computer Engineering. 2024, 50(3): 166-172. https://doi.org/10.19678/j.issn.1000-3428.0067791

    Federated Learning(FL) can collaborate to train global models without compromising data privacy. Nonetheless, this collaborative training approach faces the challenge of Non-IID in the real world; slow model convergence and low accuracy. Numerous existing FL methods improve only from one perspective of global model aggregation and local client update, and inevitably will not cause the impact of the other perspective and reduce the quality of the global model. In this context, we introduce a hierarchical continuous learning optimization method for FL, denoted as FedMas, which is based on the idea of hierarchical fusion. First, clients with similar data distribution are divided into different layers using the DBSCAN algorithm, and only part of clients of a certain layer are selected for training each time to avoid weight differences caused by different data distributions when the server global model is aggregated. Further, owing to the different data distributions of each layer, the client combines the solution of continuous learning catastrophic forgetting during local update to effectively integrate the differences between the data of different layers of clients, thus ensuring the performance of the global model. Experiments on MNIST and CIFAR-10 standard datasets demonstrate that the global model test accuracy is improved by 0.3-2.2 percentage points on average compared with FedProx, Scaffold, and FedCurv FL algorithms.

  • Artificial Intelligence and Pattern Recognition
    Huan WANG, Lijuan SONG, Fang DU
    Computer Engineering. 2023, 49(12): 88-95. https://doi.org/10.19678/j.issn.1000-3428.0066938

    Interactive tasks involving multi-modal data present advanced requirements for the comprehensive utilization of knowledge from different modalities, leading to the emergence of multi-modal knowledge graphs. When constructing these graphs, accurately determining whether image and text entities refer to the same object is particularly important for entity alignment of Chinese cross-modal entities. To address this problem, a Chinese cross-modal entity alignment method based on a multi-modal knowledge graph is proposed. Image information is introduced into the entity alignment task, and a single and dual-stream interactive pre-trained language model, namely CCMEA, is designed for domain-specific, fine-grained images and Chinese text. Utilizing a self-supervised learning method, Text-Visual features are extracted using Text-Visual Encoder, and fine-grained modeling is performed using cross-coders. Finally, a comparison learning method is employed to evaluate the degree of alignment between image and text entities. The experimental results show that the Mean Recall(MR) of the CCMEA model improved by 3.20 and 11.96 percentage points compared to that of the WukongViT-B baseline model on the MUGE and Flickr30k-CN datasets, respectively. Furthermore, the model achieved a remarkable MR of 94.3% on the self-built TEXTILE dataset. These results demonstrate that the proposed method can effectively align Chinese cross-modal entities with high accuracy in practical applications.

  • Research Hotspots and Reviews
    WEI Wei, DING Xiangxiang, GUO Mengxing, YANG Zhao, LIU Hui
    Computer Engineering. 2024, 50(9): 18-32. https://doi.org/10.19678/j.issn.1000-3428.0068086

    Text similarity calculation is a part of natural language processing and is used to calculate the similarity between two words, sentences, or texts in many application scenarios. Research on text similarity calculation plays an important role in the development of artificial intelligence. Text similarity calculation has conventionally been based on character string surfaces. With the introduction of word vectors, text similarity calculation can be modeled and calculated based on statistics and deep learning, in addition to combining it with pre-trained models. First, text similarity calculation methods can be divided into five categories: character string-based, word vector-based, pre-trained model-based, deep learning-based, and other methods. Each category is briefly introduced. Subsequently, according to the principles of the different text similarity calculation methods, common methods such as the edit distance, Hamming distance, bag of words model, Vector Space Model (VSM), Deep Structured Semantic Model (DSSM), and Simple Contrastive learning of Sentence Embedding (SimCSE) are discussed. Finally, commonly used data sets and evaluation criteria for text similarity calculation are sorted and analyzed, and the future development of text similarity calculation is prospected.

  • Development Research and Engineering Application
    Shui HU
    Computer Engineering. 2023, 49(9): 303-312. https://doi.org/10.19678/j.issn.1000-3428.0067067

    Wargame deduction is an important method for cultivating modern military commanders. Introducing artificial intelligence technology in wargame deduction can simplify organizational processes and improve deduction efficiency. Owing to the complex situational information and incomplete inference information, intelligent wargame based on machine learning often reduces the sample efficiency of autonomous decision-making models. This paper proposes an intelligent wargame deduction decision-making method based on deep reinforcement learning. In response to the efficiency issue of intelligent wargame deduction and combat decision-making, a baseline is introduced into the strategy network, and the training of the policy network is accelerated. Subsequently, derivation and proof are presented, and a method for updating the parameters of the policy network after adding the baseline is proposed. The process of introducing the state-value function in the wargame deduction environment into the model is analyzed. Construct a Low Advantage Policy-Value Network(LAPVN) model and its training framework for wargame deduction under traditional policy-value networks, and construct the model using battlefield situational awareness methods. In a wargame combat experimental environment that approximately conforms to military operational rules, the traditional policy-value network and LAPVN are compared for training. In 400 self-game training sessions, the loss value of the LAPVN model decreases from 5.3 to 2.3, and the convergence is faster than that of the traditional policy-value network. The KL divergence of the LAPVN model is very close to zero during the training process.

  • Graphics and Image Processing
    Chunbo XU, Juan YAN, Huibin YANG, Bo WANG, Han WU
    Computer Engineering. 2023, 49(8): 199-206, 214. https://doi.org/10.19678/j.issn.1000-3428.0065522

    Currently, most Visual Simultaneous Localization And Mapping(VSLAM) algorithms are based on static scene design and do not consider dynamic objects in a scene.However, dynamic objects in an actual scene cause mismatches among the feature points of the visual odometer, which affects the positioning and mapping accuracy of the SLAM system and reduce its robustness in practical applications. Aimed at an indoor dynamic environment, a VSLAM algorithm based on the ORB-SLAM3 main framework, known as RDTS-SLAM, is proposed. An improved YOLOv5 target detection and semantic segmentation network is used to accurately and rapidly segment objects in the environment.Simultaneously, the target detection results are combined with the local optical flow method to accurately identify dynamic objects, and the feature points in the dynamic object area are eliminated. Only static feature points are used for feature point matching and subsequent positioning and mapping.Experimental results on the TUM RGB dataset and actual environment data show that compared to ORB-SLAM3 and RDS-SLAM algorithms, the Root Mean Square Error(RMSE) of trajectory estimation for sequence walking_rpy of RDTS-SLAM algorithm is reduced by 95.38% and 86.20%, respectively, which implies that it can significantly improve the robustness and accuracy of the VSLAM system in a dynamic environment.

  • Development Research and Engineering Application
    Jianhao ZHAN, Lipeng GAN, Yonghui BI, Peng ZENG, Xiaochao LI
    Computer Engineering. 2023, 49(10): 280-288, 297. https://doi.org/10.19678/j.issn.1000-3428.0065152

    The multi-modality fusion method is a core technique for effectively exploring complementary features from multiple modalities to improve action recognition performance at data-, feature-, and decision-level fusion. This study mainly investigated the multimodality fusion method at the feature and decision levels through knowledge distillation, transferring feature learning from other modalities to the RGB model, including the effects of different loss functions and fusion strategies. A multi-modality distillation fusion method is proposed for action recognition, whereby knowledge distillation is performed using the MSE loss function at the feature level, KL divergence at the decision-prediction level, and a combination of the original skeleton and optical flow modalities as multi-teacher networks so that the RGB student network can simultaneously learn with better recognition accuracy. Extensive experiments show that the proposed method achieved state-of-the-art performance with 90.09%, 95.12%, 97.82%, and 81.26% accuracies on the NTU RGB+D 60, UTD-MHAD, N-UCLA, and HMDB51 datasets, respectively. The recognition accuracy on the UTD-MHAD dataset has increased by 3.49, 2.54, 3.21, and 7.34 percentage points compared to single mode RGB data, respectively.

  • Research Hotspots and Reviews
    Chang WANG, Leixiao LI, Yanyan YANG
    Computer Engineering. 2023, 49(11): 1-12. https://doi.org/10.19678/j.issn.1000-3428.0066661
    Abstract (763) Download PDF (1456) HTML (116)   Knowledge map   Save

    The fatigue driving detection method based on computer vision has the advantage of being noninvasive and does not affect driving behavior, making it easy to apply in practical scenarios.With the development of computer technology, an increasing number of researchers are studying fatigue driving detection methods based on computer vision. Fatigue driving behavior is mainly reflected in the face and limbs. Furthermore, in the field of computer vision, facial behavior is easier to obtain than physical behavior. Therefore, facial-feature-based fatigue driving detection methods have become an important research direction in the field of fatigue driving detection. Various fatigue driving detection methods are analyzed comprehensively based on multiple facial features of drivers, and the latest research results worldwide are summarized.The specific behaviors of drivers with different facial features under fatigue conditions are introduced, and the fatigue driving detection process is discussed based on multiple facial features. Results from research conducted worldwide are classified based on different facial features, and different feature extraction methods and state discrimination methods are classified. The parameters used to distinguish driver fatigue status are summarized based on the various behaviors generated by different features in a state of fatigue. Furthermore, current research results on the use of facial multi-feature comprehensive discrimination for fatigue driving are described, and the similarities and differences of different methods are analyzed. On this basis, the shortcomings in the current field of fatigue driving detection based on facial multi-feature fusion are discussed, and future research directions in this field are described.

  • Artificial Intelligence and Pattern Recognition
    Lu HAN, Weigang HUO, Yonghui ZHANG, Tao LIU
    Computer Engineering. 2023, 49(9): 99-108. https://doi.org/10.19678/j.issn.1000-3428.0065846

    Each subsequence of the Multivariate Time Series(MTS) contains multi-scale characteristics of different time spans, comprising information such as development process, direction, and trend. However, existing time series prediction models cannot effectively capture multi-scale features and evaluate their importance. In this study, a MTS prediction network, FFANet, is proposed based on multi-scale temporal feature fusion and a Dual-Attention Mechanism(DAM).FFANet effectively integrates multi-scale features and focuses on important parts.Utilizing the parallel temporal dilation convolution layer in the multi-scale temporal feature fusion module endows the model with multiple receptive domains to extract features of temporal data at different scales and adaptively fuse them based on their importance. Using a DAM to recalibrate the fused temporal features, FFANet focuses on features that make significant contributions to prediction by assigning temporal and channel attention weights and weighting them to the corresponding temporal features. The experimental results show that compared with AR, VARMLP, RNN-GRU, LSTNet-skip, TPA-LSTM, MTGNN, and AttnAR time series prediction models, FFANet achieves average reduction of 0.152 3、0.120 0、0.074 3、0.035 4、0.021 5、0.012 1、0.020 0 in RRSE prediction error on Traffic, Solar Energy, and Electricity datasets, respectively.

  • Research Hotspots and Reviews
    REN Shuyu, WANG Xiaoding, LIN Hui
    Computer Engineering. 2024, 50(12): 16-32. https://doi.org/10.19678/j.issn.1000-3428.0068553

    The superior performance of Transformer in natural language processing has inspired researchers to explore their applications in computer vision tasks. The Transformer-based object detection model, Detection Transformer (DETR), treats object detection as a set prediction problem, introducing the Transformer model to address this task and eliminating the proposal generation and post-processing steps that are typical of traditional methods. The original DETR model encounters issues related to slow training convergence and inefficiency in detecting small objects. To address these challenges, researchers have implemented various improvements to enhance DETR performance. This study conducts an in-depth investigation of both the basic and enhanced modules of DETR, including modifications to the backbone architecture, query design strategies, and improvements to the attention mechanism. Furthermore, it provides a comparative analysis of various detectors and evaluates their performance and network architecture. The potential and application prospects of DETR in computer vision tasks are discussed herein, along with its current limitations and challenges. Finally, this study analyzes and summarizes related models, assesses the advantages and limitations of attention models in the context of object detection, and outlines future research directions in this field.

  • Research Hotspots and Reviews
    Zhe LIAN, Yanjun YIN, Fei YUN, Min ZHI
    Computer Engineering. 2024, 50(3): 16-27. https://doi.org/10.19678/j.issn.1000-3428.0067427

    Natural scene text detection technology based on deep learning has become a crucial research focal point in the fields of computer vision and natural language processing. Not only does it possess a wide range of potential applications but also serves as a new platform for researchers to explore neural network models and algorithms. First, this study introduces the relevant concepts, research background, and current developments in natural scene text detection technology. Subsequently, an analysis of recent deep learning-based text detection methods is performed, categorizing them into four classes: detection boxes-, segmentation-, detection-boxes and segmentation-based, and others. The fundamental concepts and main algorithmic processes of classical and mainstream methods within these four categories are elaborated, summarizing the usage mechanisms, applicable scenarios, advantages, disadvantages, simulation experimental results, and environment settings of different methods, while clarifying their interrelationships. Thereafter, common public datasets and performance evaluation methods for natural scene text detection are introduced. Finally, the major challenges facing current deep learning-based natural scene text detection technology are outlined, and future development directions are discussed.

  • Graphics and Image Processing
    Wenzhuo FAN, Tao WU, Junping XU, Qingqing LI, Jianlin ZHANG, Meihui LI, Yuxing WEI
    Computer Engineering. 2023, 49(9): 217-225. https://doi.org/10.19678/j.issn.1000-3428.0065689

    Traditional deep learning image super-resolution reconstruction network only extracts features at a fixed resolution and cannot integrate advanced semantic information. The challenges include difficulties integrating advanced semantic information, reconstructing images with specific scale factors, limited generalization capability, and managing an excessive number of network parameters. An arbitrary scale image super-resolution reconstruction algorithm based on multi-resolution feature fusion is proposed, termed as MFSR. In the phase of multi-resolution feature fusion encoding, a multi-resolution feature extraction module is designed to extract different resolution features. A dual attention module is constructed to enhance the network feature extraction ability. The information-rich fused feature map is obtained by fully interacting with different resolution features. In the phase of image reconstruction, the fused feature map is decoded by a multi-layer perception machine to realize a super-resolution image at any scale. The experimental results indicate that tests were conducted on the Set5 data set with scaling factors of 2, 3, 4, 6, 8, and the Peak Signal-to-Noise Ratios (PSNR) of the proposed algorithm were 38.62, 34.70, 32.41, 28.96, and 26.62 dB, respectively. The model parameters correspond to 0.72×106, which significantly reduce the number of parameters, maintain the reconstruction quality, and realize super-resolution image reconstruction at any scale. Furthermore, the model can realize better performance than mainstream algorithms, such as SRCNN, VDSR, and EDSR.

  • Artificial Intelligence and Pattern Recognition
    LI Jingcan, XIAO Cuilin, QIN Xiaoting, XIE Xia
    Computer Engineering. 2024, 50(4): 87-94. https://doi.org/10.19678/j.issn.1000-3428.0068501
    Relation extraction is a basic and important task that aims to extract the relations between entities from unstructured text. Recent developments show that Large-Language Model (LLM) and basic models can improve the performance of several Natural Language Processing (NLP) tasks. These models utilize the language-representation ability of deep-learning and pre-training models and can automatically learn the semantic features of relations. A method to effectively use of a large model for solving the problems of entity overlap and unsatisfactory information exchange is yet to be revealed. Hence, a relational-extraction model based on large language is proposed. First, the Large-Language model Meta AI (LLaMA) is adapted to the task in this study via fine-tuning. To extract relations, the self-attention mechanism is used to enhance the correlation between entity pairs and information sharing between entities. Subsequently, average pooling is performed to generalize an entire sentence. A filtering matrix is designed for entity pairs, part-of-speech information is introduced to enhance semantics, and invalid triples are filtered out based on the relevance of entity pairs in the filtering matrix. Experimental results show that the F1 value results of the proposed model on the New York Times (NYT) and WebNLG open datasets are 93.1% and 90.4%, respectively. In the case where the LLaMA model becomes an encoder after fine-tuning, the proposed algorithm is superior to the baseline model in terms of accuracy and the F1 value index, thus verifying its effectiveness.
  • Research Hotspots and Reviews
    Haoyang LI, Xiaowei HE, Bin WANG, Hao WU, Qi YOU
    Computer Engineering. 2024, 50(2): 43-50. https://doi.org/10.19678/j.issn.1000-3428.0066399

    Load prediction is an essential part of cloud computing resource management. Accurate prediction of cloud resource usage can improve cloud platform performance and prevent resource wastage. However, the dynamic and mutative use of cloud computing resources makes load prediction difficult, and managers cannot allocate resources reasonably. In addition, although Informer has achieved better results in time-series prediction, it does not impose restrictions on the causal dependence of time, causing future information leakage. Moreover, it does not consider the increase in network depth leading to model performance degradation. A multi-step load prediction model based on an improved Informer, known as Informer-DCR, is proposed. The regular convolution between attention blocks in the encoder is replaced by dilated causal convolution, such that the upper layer in the deep network can receive a wider range of input information to improve the prediction accuracy of the model, and ensure the causality of the time-series prediction process. Simultaneously, the residual connection is added to the encoder, such that the input information of the lower layer of the network is directly transmitted to the subsequent higher layer, and the deep network degradation is solved to improve the model performance. The experimental results demonstrate that compared with the mainstream prediction models such as Informer and Temporal Convolutional Network(TCN), the Mean Absolute Error(MAE) of the Informer-DCR model is reduced by 8.4%-40.0% under different prediction steps, and Informer-DCR exhibits better convergence than Informer during the training process.

  • Development Research and Engineering Application
    Lumeng CHEN, Yanyan CAO, Min HUANG, Xingang XIE
    Computer Engineering. 2023, 49(8): 291-301, 309. https://doi.org/10.19678/j.issn.1000-3428.0065025

    The existing image-based flame detection approach finds it challenging to balance real-time and precision, and it is incapable of accurately identifying small flame targets, making it ineffective for application situations such as small fire extinguishing. In terms of real-time detection, the YOLOv5 algorithm provides significant benefits over conventional techniques. A real-time flame detection method based on improved YOLOv5 is proposed to increase flame detection accuracy. First, to help the model locate the flame features more accurately, a coordinate attention mechanism module is embedded in the feature extraction portion of the YOLOv5 model.This module can reduce feature redundancy without sacrificing the feature information. Second, to help the model successfully obtain flame features with a receptive field smaller than 8×8 pixels, a detection layer specifically designed for small flame targets is added to the feature fusion portion of the algorithm along with the corresponding feature extraction and feature fusion modules. Finally, to increase the model's speed of convergence and robustness to small datasets, α-CIoU is employed as a new bounding box loss function in the computation phase of the loss function.Additionally, model pretraining and transfer learning techniques are used to initialize the weight parameters of each layer structure of the flame detection model to prevent the gradient from dissipating and enhance the training effect. According to the experimental findings, the proposed flame detection model shows an accuracy rate of 96.6%, which is 7.4 percentage points higher than that of the YOLOv5 original model.Additionally, the detection speed of this model is 68 frame/s, and its size is only 15.4 MB. On the basic of significantly improving accuracy, it can also meet the requirements of firefighting robots for real-time and lightweight flame detection.

  • Artificial Intelligence and Pattern Recognition
    Zhite WANG, Liping LUO, Yikui LIAO
    Computer Engineering. 2024, 50(8): 86-101. https://doi.org/10.19678/j.issn.1000-3428.0068483

    To satisfy the performance requirements for robot path planning, an algorithm integrating improved A* algorithm and improved Dynamic Window Approach(DWA) is proposed, which shortens the path length and improves the searching efficiency and path smoothness. To combat the challenges of the traditional A* algorithm in complex scenarios, a new heuristic function is designed based on Manhattan distance and the diagonal distance. The weights are assigned dynamically, and the global shortest path and the least searching time are obtained. Next, an improved search strategy based on the 8-neighborhood is proposed, which involves dynamically assigning the optimal search direction to the current node, thus improving the searching efficiency and reducing the time consumption compared to the traditional 8-neighborhood 8-direction search method. Subsequently, the Floyd algorithm is employed to remove redundant nodes, reduce the steering times, and shorten the path distance. Additionally, the traditional DWA faces certain challenges; for instance, the path is not globally optimal, the path planning may fail, or the path length may increase. To solve these problems, a keypoint densification strategy is proposed to modify the deflective path. Finally, the proposed improved A* algorithm and fusion algorithm are compared with existing methods. The simulation results show that the improved A* algorithm can generate the shortest global path in complex environments, reducing the average steering time by 16.3% and shortening the average path searching time by 55.66%. For the fused algorithm, the average path length and average runtime shorten by 6.1% and 14.7% in the temporary obstacle environment, respectively, and shorten by 1.6% and 39.8%, respectively, in the moving obstacle environment.

  • Graphics and Image Processing
    Xinlu JIANG, Tianen CHEN, Cong WANG, Chunjiang ZHAO
    Computer Engineering. 2024, 50(1): 232-241. https://doi.org/10.19678/j.issn.1000-3428.0067030

    Intelligent pest detection is an essential application of target detection technology in the agricultural field. This detection method effectively improves the efficiency and reliability of pest detection and reporting work and ensures crop yield and quality. Under fixed-trapping devices such as insect traps and sticky insect boards, the image background is simple, the lighting conditions are stable, and the pest features are significant and easy to extract. Pest detection can achieve high accuracy, but its application scenario is fixed, and the detection range is limited to the surrounding equipment and cannot adapt to complex field environments. A small object pest detection model called Pest-YOLOv5 is proposed to improve the flexibility of pest detection and prediction to address the difficulties and missed detections attributed to complex image backgrounds and small pest sizes in field environments. By adding a Coordinate Attention(CA) mechanism in the feature extraction network and combining spatial and channel information, the ability to extract small object pest features is enhanced. The Bidirectional Feature Pyramid Network(BiFPN) structure is used in the neck connection section, and multi-scale features are combined to alleviate the problem of small object information loss caused by multiple convolutions. Based on this, SIoU and VariFocal loss functions are used to calculate losses, and the optimal classification loss weight coefficients are obtained experimentally, making the model more focused on object samples that are difficult to classify. The experimental results on a subset of the publicly available dataset, AgriPest, show that the Pest-YOLOv5 model has mAP0.5 and recall of 70.4% and 67.8%, respectively, which are superior to those of classical object detection models, such as the original YOLOv5s model, SSD, and Faster R-CNN. Compared with the YOLOv5s model, the Pest-YOLOv5 model improves the mAP0.5, mAP0.50∶0.95, and recall by 8.1%, 7.9%, and 12.8%, respectively, enhancing the ability to detect targets.

  • Graphics and Image Processing
    ZHAO Nannan, GAO Feichen
    Computer Engineering. 2025, 51(1): 198-207. https://doi.org/10.19678/j.issn.1000-3428.0068677

    An instance segmentation algorithm (DE-YOLO) based on the improved YOLOv8 is proposed. To decrease the effect of complex backgrounds in the images, efficient multiscale attention is introduced, and cross-dimensional interaction ensures an even spatial feature distribution within each feature group. In the backbone network, a deformable convolution using DCNv2 is combined with a C2f convolutional layer to overcome the limitations of traditional convolutions and increase flexibility. This is performed to reduce harmful gradient effects and improve the overall accuracy of the detector. The dynamic nonmonotonic Wise-Intersection-over-Union (WIoU) focusing mechanism is employed instead of the traditional Complete Intersection-over-Union (CIoU) loss function to evaluate the quality, optimize detection frame positioning, and improve segmentation accuracy. Meanwhile, Mixup data enhancement processing is enabled to enrich the training features of the dataset and improve the learning ability of the model. The experimental results demonstrate that DE-YOLO improves the mean Average Precision of mask(mAPmask) and mAPmask@0.5 by 2.0 and 3.2 percentage points compared with the benchmark model YOLOv8n-seg in the Cityscapes dataset of urban landscapes, respectively. Furthermore, DE-YOLO maintains an excellent detection speed and small parameter quantity while exhibiting improved accuracy, with the model requiring 2.2-31.3 percentage points fewer parameters than similar models.

  • Artificial Intelligence and Pattern Recognition
    Lai QIAN, Weiwei ZHAO
    Computer Engineering. 2024, 50(7): 104-111. https://doi.org/10.19678/j.issn.1000-3428.0068132

    Text classification is a basic task in the field of natural language processing and plays an important role in information retrieval, machine translation, sentiment analysis, and other applications. However, most deep learning models do not fully consider the rich information in training instances during inference, resulting in inadequate text feature learning. To leverage training instance information fully, this paper proposes a text classification method based on contrastive learning and attention mechanism. First, a supervised contrastive learning training strategy is designed to optimize the retrieval of text vector representations, thereby improving the quality of the retrieved training instances during the inference process. Second, an attention mechanism is constructed to learn the attention distribution of the obtained training text features, focusing on adjacent instance information with stronger relevance and capturing more implicit similarity features. Finally, the attention mechanism is combined with the model network, fusing information from adjacent training instances to enhance the ability of the model to extract diverse features and achieve global and local feature extraction. The experimental results demonstrate that this method achieves significant improvements on various models, including Convolutional Neural Network(CNN), Bidirectional Long Short-Term Memory(BiLSTM), Graph Convolutional Network(GCN), Bidirectional Encoder Representations from Transformers(BERT), and RoBERTa. For the CNN model, the macro F1 value is increased by 4.15, 6.2, and 1.92 percentage points for the THUCNews, Toutiao, and Sogou datasets, respectively. Therefore, this method provides an effective solution for text classification tasks.

  • Smart Education
    Huiqian LI, Baichang ZHONG
    Computer Engineering. 2024, 50(7): 1-12. https://doi.org/10.19678/j.issn.1000-3428.0069539

    The deep integration of knowledge graphs with education has promoted the development of smart education. However, there is a lack of literature on educational knowledge graphs currently, necessitating its improvement with regard to research normativity and content perspective. Four conclusions are presented from a systematic literature review of 55 important Chinese journal articles from the previous decade. First, the development of educational knowledge graphs requires five key technologies: ontology construction, knowledge extraction, knowledge representation, knowledge fusion, and knowledge reasoning. Deep learning methods are becoming a popular research topic in this context. Second, in the context of applicability, the educational knowledge graphs cover six application scenarios: personalized learning recommendations, intelligent Question-Answering (Q&A), teaching resource management, intelligent search, intelligent learning diagnosis, and classroom teaching analysis, and the horizon of applications is continuously expanding. Third, regarding application effects, the educational knowledge graphs promote personalized learning and fragmented ubiquitous learning of students while improving their learning performance as well as professionalism of teachers. Fourth, the education knowledge graphs suffer from several problems and challenges, such as single data modality, lack of quality datasets, low level of automation and borderline technology, high level of difficulty in knowledge modeling, insufficient competence care, lack of interoperability standards, and low rate of educational adoption. Hence, for further insight into the study, future research should refine the theory and establish standards, optimize techniques, achieve accurate modeling, and strengthen applications and lifting effects.

  • Development Research and Engineering Application
    Xinyi ZHANG, Fei ZHANG, Bin HAO, Lu GAO, Xiaoying REN
    Computer Engineering. 2023, 49(8): 265-274. https://doi.org/10.19678/j.issn.1000-3428.0065701

    In dense crowd scenes in public places, face mask wearing detection algorithms have poor detection results because of missing information caused by target occlusion and the problems of small detection targets and low resolution. To improve the detection accuracy and speed of the model as well as to reduce the hardware footprint, an improved mask wearing detection algorithm based on YOLOv5s is proposed. The conventional convolution is replaced with Ghost-Shadowed wash Convolution(GSConv), combining Standard Convolution(SConv)and Depth-Wise separable Convolution(DWConv) with channel blending, thereby improving the network speed with guaranteed accuracy. The nearest neighbor upsampling method is replaced with a lightweight universal upsampling operator to make full use of the semantic feature information. Adaptive Spatial Feature Fusion(ASFF) is added at the end of the neck layer of the improved YOLOv5s model, which allows better fusion of features at different scales and improves the network detection accuracy.In addition, adaptive image sampling is used to alleviate the problem of data imbalance. Mosaic data enhancement is used to make full use of small targets.Experimental results show that the model achieves a mean Average Precision(mAP) value of 93% on the AIZOO dataset, a 2 percentage points improvement over the original YOLOv5 model.It achieves 97.7% detection accuracy for faces wearing masks and outperforms the detection results of the YOLO series, SSD, and RetinaFace in the same situation. It also runs on a GPU with a 16.7 percentage points inference speedup. The model weights file uses 23.5 MB memory for real-time mask wearing detection.

  • Research Hotspots and Reviews
    Baihao JIANG, Jing LIU, Dawei QIU, Liang JIANG
    Computer Engineering. 2024, 50(3): 1-15. https://doi.org/10.19678/j.issn.1000-3428.0067502

    Deep learning algorithms have the advantages of strong learning, strong adaptive, and unique nonlinear mapping abilities in spinal image segmentation. Compared with traditional segmentation methods, they can better extract key information from spinal images and suppress irrelevant information, which can assist doctors in accurately locating focal areas and realizing accurate and efficient segmentation. The application status of deep learning in spinal image segmentation is summarized and analyzed as concerns deep learning algorithms, types of spinal diseases, types of images, experimental segmentation results, and performance evaluation indicators. First, the background of the deep learning model and spinal image segmentation is described, and thereafter, the application of deep learning in spinal image segmentation is introduced. Second, several common types of spinal diseases are introduced, the difficulties in image segmentation are described, and common open datasets, image segmentation method flow, and image segmentation evaluation indicators are introduced in spinal image segmentation. Combined with specific experiments, the application progress of the Convolutional Neural Network(CNN) model, the U-Net model, and their improved models in the image segmentation of vertebrae, intervertebral discs, and spinal tumors are summarized and analyzed. Combined with previous experimental results and the current research progress of deep learning models, this paper summarizes the limitations of current clinical studies and the reasons for the insufficient segmentation effect, and proposes corresponding solutions to the existing problems. Finally, prospects for future studies and development are proposed.

  • Research Hotspots and Reviews
    Jian CAO, Yimei CHEN, Haisheng LI, Qiang CAI
    Computer Engineering. 2023, 49(10): 1-12. https://doi.org/10.19678/j.issn.1000-3428.0065984

    Small target detection in complex road scenes can improve the vehicle's perception of the surrounding environment. Thus, it is an important research direction in the field of computer vision and intelligent transportation. With the development of deep learning technology, a combination of deep learning and small target detection on roads can effectively improve detection accuracy, allowing the vehicle to quickly respond to the surrounding environment. Starting with the latest classic research results in small target detection, this research provides two definitions for small targets and analyzes the reasons for the difficulty encountered in small target detection on roads. Subsequently, five types of optimization methods based on deep learning are expounded upon to improve detection accuracy of small targets on roads. The optimization methods include enhanced data, multi-scale strategy, generated Super-Resolution(SR) detail information, strengthened contextual information connection and improved loss function. The core ideas of various methods and the latest research progress at home and abroad are summarized. Large and public datasets commonly used in road small target detection are introduced along with corresponding indicators to evaluate the performance of small target detection. In comparing and analyzing the performance detection results of various methods on different datasets, this research presents the current research on road small target and associated problems, looking forward to future research directions from multiple perspectives.

  • Research Hotspots and Reviews
    HUANG Kaiji, YANG Hua
    Computer Engineering. 2024, 50(10): 16-34. https://doi.org/10.19678/j.issn.1000-3428.0068580

    The objective of image matching is to establish correspondences between similar structures across two or more images. This task is fundamental to computer vision, with applications in robotics, remote sensing, and autonomous driving. With the advancements in deep learning in recent years, Two-Dimensional (2D) image matching algorithms based on deep learning have seen regular improvements in feature extraction, description, and matching. The performance of these algorithms in terms of matching accuracy and robustness has surpassed that of traditional algorithms, leading to significant advancements. First, this study summarizes 2D image matching algorithms based on deep learning features from the past ten years and categorizes them into three types: two-stage image matching based on local features, image matching of joint detection and description, and image matching without feature detection. Second, the study details the development processes, classification methods, and performance evaluation metrics of these three categories and summarizes their advantages and limitations. Typical application scenarios of 2D image matching algorithms are then introduced, and the effects of research progress in 2D image matching on its application domains are analyzed. Finally, the study summarizes the development trends of 2D image matching algorithms and discusses future prospects.

  • Artificial Intelligence and Pattern Recognition
    ZHANG Hongchen, LI Linyu, YANG Li, SAN Chenjun, YIN Chunlin, YAN Bing, YU Hong, ZHANG Xuan
    Computer Engineering. 2024, 50(4): 168-176. https://doi.org/10.19678/j.issn.1000-3428.0067543
    A knowledge graph is a structured knowledge base comprising various types of knowledge or data units obtained through extraction and other processes. It is used to describe and represent information, such as entities, concepts, facts, and relationships. The limitations of Natural Language Processing(NLP) technology and the presence of noise in the texts of various knowledge or information units affect the accuracy of information extraction. Existing Knowledge Graph Completion(KGC) methods typically account for only single structural information or text semantic information, whereas the structural and text semantic information in the entire knowledge graph is disregarded. Hence, a KGC model based on contrastive learning and language model-enhanced embedding is proposed. The input entities and relationships are obtained using a pretrained language model to obtain the textual semantic information of the entities and relationships. The distance scoring function of the translation model is used to capture the structured information in the knowledge graph. Two negative sampling methods for contrastive learning are used to fuse contrastive learning to train the model to improve its ability to represent positive and negative samples. Experimental results show that compared with the Bidirectional Encoder Representations from Transformers for Knowledge Graph completion(KG-BERT) model, this model improves the average proportion of triple with ranking less than or equal to 10(Hits@10) indicator by 31% and 23% on the WN18RR and FB15K-237 datasets, respectively, thus demonstrating its superiority over other similar models.
  • Development Research and Engineering Application
    Xiangquan GUI, Shiqing LIU, Li LI, Qingsong QIN, Tangyan LI
    Computer Engineering. 2024, 50(7): 342-351. https://doi.org/10.19678/j.issn.1000-3428.0068125

    The TAPDataset pedestrian detection dataset is used in this study to address the issues of low detection accuracy, large number of algorithm parameters, and limitations of existing public datasets for small target detection in current scenic pedestrian detection. This dataset addresses the deficiencies of existing datasets regarding small target detection. Based on the YOLOv8 algorithm, a new model with high detection accuracy and low hardware requirements, called YOLOv8-L, is proposed. First, the lightweight convolution module DepthSepConv is introduced to reduce the number of parameters and computations of the model. Second, the BiFormer attention mechanism and CARAFE upsampling operator are used to enhance the model's semantic understanding of images and information fusion capability, significantly improving detection accuracy. Finally, a small target detection layer is added to extract more shallow features, effectively improving the model's performance for small target detection. The effectiveness of the algorithm is verified using the TAPDataset, VOC 2007, and TAP+VOC datasets. The experimental results show that compared with YOLOv8, the number of model parameters is reduced by 18.06% on the TAPDataset with unchanged FPS, mAP@0.5 improves by 5.51%, and mAP@0.5∶0.95 improves by 6.03%. On the VOC 2007 dataset, the number of parameters is reduced by 13.6%, with mAP@0.5 improving by 3.96% and mAP@0.5∶0.95 improving by 6.39%. On the TAP+VOC dataset, the number of parameters is reduced by 14.02%, with mAP@0.5 improving by 4.49% and mAP@0.5∶0.95 improving by 5.68%. The improved algorithm demonstrates stronger generalization performance and can be better applied to scenic pedestrian detection tasks.

  • Research Hotspots and Reviews
    Ying LIU, Yupeng MA, Fan ZHAO, Yi WANG, Tonghai JIANG
    Computer Engineering. 2024, 50(1): 39-49. https://doi.org/10.19678/j.issn.1000-3428.0067004

    Hyperledger Fabric is an alliance chain framework widely adopted both domestically and internationally. It exhibits characteristics such as numerous participating organizations, frequent transaction operations, and increased transaction conflicts in certain businesses based on Fabric technology. The multi-version concurrency control technology used in Fabric can partially resolve transaction conflicts as well as enhance system concurrency. However, this mechanism is imperfect and certain transaction data cannot be properly stored on the chain. To achieve complete, efficient, and trustworthy up-chain storage of massive transaction data, a data preprocessing mechanism based on the Fabric oracle machine is proposed. The Massive Conflict Preprocessing(MCPP) method is designed to ensure the integrity of transaction data with primary key conflicts through techniques including detection, monitoring, delayed submission, transaction locking, and reordering caching. Data transmission protection measures are introduced to utilize asymmetric encryption technology during transmission, preventing malicious nodes from forging authentication information and ensuring consistency before and after off-chain processing of transaction data. Theoretical analysis and experimental results demonstrate that this mechanism can effectively address concurrent conflict issues regarding up-chain massive transaction data in alliance chain platforms. When the transaction data scales reach 1 000 and 10 000, the MCPP method achieves time efficiency improvements of 38% and 21.4%, respectively, compared with the LMLS algorithm, with a success rate close to 100%. Thus, the proposed method exhibits efficiency and security, and does not impact Fabric system performance when concurrent conflicts do not occur.

  • Intelligent Transportation
    Wei CHEN, Xiaolong WANG, Yanwei ZHANG, Guocheng AN, Bo JIANG
    Computer Engineering. 2024, 50(4): 11-19. https://doi.org/10.19678/j.issn.1000-3428.0068901

    In highway service areas, complex environments such as lighting and weather changes can cause a sharp decline in vehicle detection accuracy. In addition, factors such as the inclination angle of the camera and the height of installation can increase false-negative and false-positive rates. To this end, a vehicle violation detection algorithm based on the improved YOLOv8 is proposed for highway service areas. First, the feature pyramid pooling layer of the YOLOv8 network, a Dilated Space Pyramid Pooling(DSPP) module, and a DSPP based on branch Attention(DSPPA) module are constructed to reduce the loss of semantic information in the backbone. The Branch Attention(BA) mechanism in DSPPA assigns different weights to the branches with varying degrees of contribution, making the model focus more on features that are suitable for the target size. Second, a parking space allocation strategy based on global matching is designed to effectively reduce the false-negative and false-positive rates of illegal parking detection in situations involving tilted views and overlapping vehicles. The experimental results show that the improved algorithm reduces the false-negative rate of parking violation detection from 15% to 8% and the false-positive rate from 7.5% to 6.1%, demonstrating considerable performance improvement in vehicle violation detection.

  • Research Hotspots and Reviews
    LI Shuo, ZHAO Chaoyang, QU Yinxuan, LUO Yaping
    Computer Engineering. 2024, 50(12): 33-47. https://doi.org/10.19678/j.issn.1000-3428.0068276

    Fingerprint recognition is one of the earliest and most mature biometric recognition technologies that is widely used in mobile payments, access control and attendance in the civilian field, and in criminal investigation to retrieve clues from suspects. Recently, deep learning technology has achieved excellent application results in the field of biometric recognition, and provided fingerprint researchers with new methods for automatic processing and the application of fusion features to effectively represent fingerprints, which have excellent application results at all stages of the fingerprint recognition process. This paper outlines the development history and application background of fingerprint recognition, expounds the main processing processes of the three stages of fingerprint recognition, which are image preprocessing, feature extraction, and fingerprint matching, summarizes the application status of deep learning technology in specific links at different stages, and compares the advantages and disadvantages of different deep neural networks in specific links, such as image segmentation, image enhancement, direction field estimation, minutiae extraction, and fingerprint matching. Finally, some of the current problems and challenges in the field of fingerprint recognition are analyzed, and future development directions, such as building public fingerprint datasets, multi-scale fingerprint feature extraction, and training end-to-end fingerprint recognition models, are prospected.

  • Graphics and Image Processing
    Xianguo LI, Bin LI
    Computer Engineering. 2023, 49(9): 226-233, 245. https://doi.org/10.19678/j.issn.1000-3428.0065513

    Convolutional Neural Network(CNN) has limitations when applied solely to image deblurring tasks with restricted receptive fields.Transformer can effectively mitigate these limitations.However, the computational complexity increases quadratically as the spatial resolution of the input image increases.Therefore, this study proposes an image deblurring network based on Transformer and multi-scale CNN called T-MIMO-UNet. The multi-scale CNN is used to extract spatial features while the global feature of the Transformer is employed to capture remote pixel information.The local enhanced Transformer module, local Multi-Head Self-Attention(MHSA) computing network, and Enhanced Feed-Forward Network(EFFN) are designed.The block-by-block MHSA computation is performed using a windowing approach. The information interaction between different windows is enhanced by increasing the depth of the separable convolution layer.The results of the experiment conducted using the GoPro test dataset demonstrate that the Peak Signal-to-Noise Ratio(PSNR) of the T-MIMO-UNet increases by 0.39 dB, 2.89 dB, 3.42 dB, and 1.86 dB compared to the MIMO-UNet, DeepDeblur, DeblurGAN, and SRN networks, respectively.Additionally, the number of parameters is reduced by 1/2 compared to MPRNet.These findings prove that the T-MIMO-UNet effectively addresses the challenge of image blurring in dynamic scenes.

  • Mobile Internet and Communication Technology
    Linghui KONG, Zheheng RAO, Yanyan XU, Shaoming PAN
    Computer Engineering. 2023, 49(9): 199-207, 216. https://doi.org/10.19678/j.issn.1000-3428.0066301

    Intelligent routing algorithm based on Deep Reinforcement Learning(DRL) has become an important development direction for intelligent routing algorithms due to its combination of deep learning perception ability and reinforcement learning decision-making ability.However, existing DRL-based intelligent routing algorithms cannot adapt to the dynamically changing network topology in wireless networks, making it difficult to make appropriate routing decisions.To address this issue, this paper proposes an intelligent routing algorithm called MPNN-DQN, which combines the Message Passing Neural Network (MPNN) and DRL.MPNN-DQN uses MPNN to learn irregular network topology, enabling it to make effective decisions even when the network topology changes dynamically.Moreover, a hop-by-hop routing generation method based on k-order neighbor information aggregation is designed to improve the scalability of the algorithm while ensuring decision-making effectiveness; thus, the algorithm can be better applied to medium- to large-sized network topologies.Experimental results show that compared to routing algorithms such as GCN, DRSIR, and DQN, MPNN-DQN has superior average latency, packet loss rate, and network throughput indicators.In three different network scenarios, Germany, GBN, and synth50, the throughput of the proposed algorithm has been improved by 3.27%-23.03%, and has strong adaptability to dynamic network topologies.

  • Research Hotspots and Reviews
    Meiguang ZHENG, Yong YANG
    Computer Engineering. 2023, 49(8): 20-28. https://doi.org/10.19678/j.issn.1000-3428.0066689

    Federated learning is a distributed machine learning technique for collaboratively training machine learning models for multiple clients while protecting the privacy of client data. However, the heterogeneity inherent in client data limits the full application potential of federated learning, for which personalized federated learning is a viable solution. The traditional clustering-based personalized federated learning schemes group clients with the same data distribution into one cluster, exploiting the homogeneous nature of some client data and reducing the impact of data heterogeneity on federated learning; however, this approach fails to account for the possibility of clients belonging to multiple clusters. Based on the concept that client data approximate adhere to multiple data distributions, a personalized Federated learning algorithm is proposed based on Mutual information and Soft clustering(pFedMS).A mutual information formula based on model features is introduced to address the shortcomings of current federated learning client clustering indices, which can not accurately reflect the similarity of model features.This formula serves as a clustering index that effectively distinguishes similar clients. A clustering rationality measurement method based on intra-class and inter-class distances is proposed to dynamically adjust the clustering results. The similarity between clients and clusters is calculated using affiliation, which allows clients to belong to multiple clusters simultaneously and improves the performance of the clustering algorithm. Experimental results on CIFAR-10 and Fashion-MNIST(FMNIST) datasets show that the pFedMS improves the Best Mean Testing Accuracy(BMTA) of clients by 2.4 to 3.0 percentage points compared to the comparison algorithms such as FedAvg, CFL.

  • Development Research and Engineering Application
    HU Shuai, LI Hualing, HAO Dechen
    Computer Engineering. 2024, 50(4): 286-293. https://doi.org/10.19678/j.issn.1000-3428.0067779
    Medical image segmentation accuracy plays a key role in clinical diagnosis and treatment. However, because of the complexity of medical images and diversity of target regions, existing medical image segmentation methods are limited to incomplete edge region segmentation and insufficient use of image context feature information. An improved Multistage Edge-Enhanced(MEE) medical image segmentation network of the U-Net, known as MDU-Net model, is proposed to solve these problems. First, a MEE module is added to the encoder structure to extract double-layer low-stage feature information, and the rich edge information in the feature layer is obtained by expanding the convolution blocks at different expansion rates. Second, a Detailed Feature Association(DFA) module integrating the feature information of adjacent layers is embedded in the skip connection to obtain deep-stage and multiscale context feature information. Finally, the feature information extracted from the different modules is aggregated in the corresponding feature layer of the decoder structure, and the final segmentation result is obtained by an upsampling operation. The experimental results on two public datasets show that compared with other models, such as Transformers make strong encoders for medical image segmentation(TransUNet), the MDU-Net model can efficiently use the feature information of different feature layers in medical images and achieve an improved segmentation effect in the edge region.
  • Graphics and Image Processing
    Jianwei LI, Xiaoqi LÜ, Yu GU
    Computer Engineering. 2023, 49(10): 239-246, 254. https://doi.org/10.19678/j.issn.1000-3428.0066050

    Skin cancer is one of the deadliest cancers, and it is particularly critical to accurately classify dermoscopy images. However, the existing dermoscopy images have complex shapes and a small number of samples, which makes it difficult for the existing automatic classification methods to extract image feature information; these methods also have a high error rate. To solve this problem, this paper proposes an improved ConvNeXt method and build, SE-SimAM-ConvNeXt model. First, with ConvNeXt as the basic network, the SimAM nonparametric attention module is added to improve the network's feature extraction capability. Second, channel attention is added to the basic network to enhance the mining ability of ConvNeXt for potential key features. Finally, the Cosine Warmup mechanism is added at the beginning of training, and the cosine function value is used to attenuate the learning rate during the process, further accelerating the convergence of ConvNeXt and improving the classification ability of the ConvNeXt model. The experimental results on the HAM10000 skin dataset show that the classification accuracy, precision, recall, and specificity of the model reach 92.9%, 85.3%, 78.0%, and 97.5%, respectively, and is demonstrated effective classification capability for dermoscopy images. This bears significant potential in aiding the auxiliary diagnosis of skin cancer lesions, providing valuable assistance to dermatologists in making accurate diagnoses of skin cancer.

  • Graphics and Image Processing
    Fangyu FENG, Xiaoshu LUO, Zhiming MENG, Guangyu WANG
    Computer Engineering. 2023, 49(8): 190-198. https://doi.org/10.19678/j.issn.1000-3428.0065224

    As it is difficult to extract effective features in facial expression recognition and the high similarity between categories and easy confusion lead to low accuracy of facial expression recognition, a facial expression recognition method based on anti-aliasing residual attention network is proposed. First, in view of the problem that the traditional subsampling method can easily cause the loss of expression discriminative features, an anti-aliasing residual network is constructed to improve the feature extraction ability of expression images and enhance the representation of expression features, enabling more effective global facial expression information to be extracted.At the same time, the improved channel attention mechanism and label smoothing regularization strategy are used to enhance the attention to the local key expression regions of the face: the improved channel attention focuses on the highly discriminative expression features and suppresses the weight of non-expressive regions, so as to locate more detailed local expression regions in the global information extracted by the network, and the label smoothing technology corrects the prediction probability by increasing the amount of information of the decision-making expression category, avoiding too absolute prediction results, which reduces misjudgment between similar expressions. Experimental results show that, the recognition accuracies of this method on the facial expression datasets RAF-DB and FERPlus reach 88.14% and 89.31%, respectively.Compared with advanced methods such as DACT and VTFF, this method has better performance. Compared with the original residual network, the accuracy and robustness of facial expression recognition are effectively improved.

  • Research Hotspots and Reviews
    Halidanmu ABUDUKELIMU, Yutao HOU, Dengfeng YAO, Abudukelimu ABULIZI, Jishang CHEN
    Computer Engineering. 2024, 50(1): 1-16. https://doi.org/10.19678/j.issn.1000-3428.0068124

    As one of the important tasks in China's low-resource machine translation research, the development and application of Uyghur machine translation can better promote cultural exchanges and trade between different regions and ethnic groups.However, Uyghur, as an adhesive language, has problems such as complex morphology and a scarce corpus in the field of machine translation. In recent years, at different stages of the development of Uyghur machine translation, researchers have optimized and innovated algorithms and models to address its characteristics and achieved various research results; however, no systematic review has been conducted. The paper comprehensively reviews the related research on Uyghur machine translation and categorizes it into three types according to methods used: rule- and example-based Uyghur machine translation, statistics-based Uyghur machine translation, and neural network-based Uyghur machine translation. Related academic activities and corpus resources are also summarized. To further explore the potential of Uyghur machine translation, the ChatGPT model is adopted as a preliminary attempt of the Uyghur-Chinese machine translation task.The experimental results show that in the Few-shot scenario, the translation performance is higher and then decreases with an increase in the number of examples, and the best performance is for 10-shot. Also, the chain-of-thought approach does not demonstrate better translation ability in the Uyghur machine translation task. Finally, future research directions for Uyghur machine translation are proposed.

  • Graphics and Image Processing
    Hong ZHAO, Yubo FENG
    Computer Engineering. 2023, 49(12): 194-204. https://doi.org/10.19678/j.issn.1000-3428.0066520

    In tasks involving traffic sign detection, the YOLOv5 detection algorithm encounters several issues including missed detections, erroneous detections, and a complex model in complex environments and road conditions. To address these challenges, an improved CGS-Ghost YOLO detection model is proposed. YOLOv5 uses the focus module for sampling, which introduces more parameters. In this study, the StemBlock module is used to replace the focus module for sampling after input, which can reduce the number of parameters while maintaining the accuracy. CGS-Ghost YOLO uses a Coordinate Attention(CA) mechanism, which improves the semantic and location information within the features and enhances the feature extraction ability of the model. Additionally, a CGS convolution module, which combines the SMU activation function with GroupNorm(GN) normalization, is proposed. The CGS convolution module is designed to avoid the influence of the batch Size on the model during training and improve model performance. This study aims to use GhostConv to reduce the number of model parameters and effectively improve the detection accuracy of the model.The loss function, $ \alpha $-CIoU Loss+VFocal Loss, is used to solve the problem of unbalanced positive and negative samples in traffic sign detection tasks and improve the overall performance of the model. The neck part uses a Bi-FPN bidirectional feature pyramid network, ensuring that the multi-scale features of the detection target are effectively fused. The results of an experiment on the TT100K traffic sign detection dataset show that the detection accuracy of the improved CGS-Ghost YOLO model reaches 93.1%, which is 11.3 percentage points higher than the accuracy achieved by the original model. Additionally, the proposed network model reduces the model parameter quantity by 21.2 percentage points compared to the original model. In summary, the network model proposed in this study optimizes the convolution layer and the downsampling part, thus considerably reducing the model parameters while enhancing the model detection accuracy.

  • Artificial Intelligence and Pattern Recognition
    YANG Dongju, HUANG Juntao
    Computer Engineering. 2024, 50(9): 113-120. https://doi.org/10.19678/j.issn.1000-3428.0068400

    High-quality annotated data are crucial for Natural Language Processing(NLP) tasks in the field of Chinese scientific literature. A method of annotation based on a Large Language Model(LLM) was proposed to address the lack of high-quality annotated corpora and the issues of inconsistent and inefficient manual annotation in Chinese scientific literature. First, a fine-grained annotation specification suitable for multi-domain Chinese scientific literature was established to clarify entity types and annotation granularity. Second, a structured text annotation prompt template and a generation parser were designed. The annotation task of Chinese scientific literature was set up as a single-stage, single-round question-and-answer process in which the annotation specifications and text to be annotated were filled into the corresponding slots of the prompt template to construct the task prompt. This prompt was then injected into the LLM to generate output text containing annotation information. Finally, the structured annotation data were obtained by the parser. Subsequently, using prompt learning based on LLM, the Annotated Chinese Scientific Literature(ACSL) entity dataset was generated, which contains 10 000 annotated documents and 72 536 annotated entities distributed across 48 disciplines. For ACSL, three baseline models based on RoBERTa-wwm-ext, a configuration of the Robustly optimized Bidirectional Encoder Representations from Transformers(RoBERT) approach, were proposed. The experimental results demonstrate that the BERT+Span model performs best on long-span entity recognition in Chinese scientific literature, achieving an F1 value of 0.335. These results serve as benchmarks for future research.

  • Development Research and Engineering Application
    Long SUN, Rongfen ZHANG, Yuhong LIU, Tingli RAO
    Computer Engineering. 2023, 49(9): 313-320. https://doi.org/10.19678/j.issn.1000-3428.0065697

    In dense crowds scenario, dense targets under the monitoring perspective, mutual occlusion, small targets, and face perspective distortion cause problems in mask wearing detection. Meanwhile, public datasets covering incorrectly worn masks are also lacking. Therefore, this paper proposes a mask wearing detection algorithm from a monitoring perspective, MDDC-YOLO, based on the YOLO-v5 improvement. In view of the large proportion of small- and medium-sized targets in dense population, the conventional C3 module in YOLO-v5 is replaced with the MRF-C3 module of the atrous convolutional structure. The anti-occlusion ability of the model is also improved by using Repulsion Loss based on the principle of repulsion attraction of the sample bounding box, and the masking positive sample is fully utilized during the training process. An Efficient Channel Attention(ECA) mechanism is further introduced for optimal selection of feature channels. Finally, to address the lack of mask wearing data in the crowd from a monitoring perspective, an offline data enhancement method based on perspective transformation is proposed. The proposed Mosaic-9 data enhancement generates additional small target samples to address this problem. The experimental results show that the MDDC-YOLO algorithm provides 6.5 percentage points mAP improvement compared with YOLO-v5, thereby reaching a detection speed of 32 frame/s, which satisfies the application requirements of mask-wearing detection in dense populations.

  • Research Hotspots and Reviews
    Yi SUN, Huimei WANG, Ming XIAN, Hang XIANG
    Computer Engineering. 2024, 50(2): 25-32. https://doi.org/10.19678/j.issn.1000-3428.0067396

    Kubeflow is a project that integrates machine learning and cloud computing technology, integrating a large number of machine learning tools and providing a feasible solution for the deployment of production-grade machine learning platforms. Machine learning relies on specialized Graphics Processing Unit(GPU)s to improve training and inference speed. As the size of cloud computing clusters is dynamically adjusted, computing nodes of different computing architectures can be added or removed from the cluster, and traditional round-robin scheduling strategies cannot realize the dynamic adjustment of heterogeneous computing power resources. To solve the allocation and optimization problems of Kubeflow's heterogeneous computing power, improve the utilization rate of platform resources, and achieve load balancing, a cloud-based Central Processing Unit-GPU(CPU-GPU) heterogeneous computing power scheduling strategy is proposed. This scheduling strategy adopts two judgment indicators: weighted load balancing degree and priority, and fine-grained allocation of display memory to achieve granularity of computing power resources. The optimal deployment scheme of Pod is designed according to the resource weight matrix of each node in the cluster, and an improved genetic algorithm is used for optimal deployment. The experimental results show that this scheduling strategy performs better for parallel tasks. It can execute optimal loads under overflow of resource requests. Compared with the original platform-native strategy, the degree of resource fine-tuning is one order of magnitude higher, and the cluster load balancing performance is also significantly improved.

  • Artificial Intelligence and Pattern Recognition
    SUN Wenjie, LI Zongmin, SUN Haomiao
    Computer Engineering. 2024, 50(5): 62-70. https://doi.org/10.19678/j.issn.1000-3428.0067919
    Collaborative cooperation between agents in partially observable situations is an important problem in Multi-Agent Reinforcement Learning(MARL). The value function factorization approach solves the credit assignment problem and effectively achieves collaborative cooperation between agents. However, existing value function factorization approaches depend only on individual value functions with local information and do not allow explicit information exchange between agents, making them unsuitable for complex scenarios. To address this problem, this study introduces communication in the value function factorization approach to provide effective nonlocal information to agents, helping them understand complex environments. Furthermore, unlike existing communication approaches, the proposed approach uses a multi-layer message passing architecture based on Graph Neural Network(GNN), which extracts useful information that must be exchanged between neighboring agents. Simultaneously, the model realizes the transition from non-communication to full communication and achieves global cooperation with a limited communication range, which is suitable for real-world applications where the communication range is constrained. The results of experiments in the StarCraft II Multi-Agent Challenge(SMAC) and Predator-Prey(PP) environments demonstrate that the average winning rate of this approach improves by 2-40 percentage points compared with those of baseline algorithms, such as QMIX and VBC, in four different scenarios of SMAC. Furthermore, the proposed approach effectively solves the PP problem in non-monotonic environments.
  • Graphics and Image Processing
    Ben HONG, Xusheng QIAN, Minglei SHEN, Jisu HU, Chen GENG, Yakang DAI, Zhiyong ZHOU
    Computer Engineering. 2023, 49(9): 234-245. https://doi.org/10.19678/j.issn.1000-3428.0065678

    Medical image registration and segmentation are important tasks in medical image analysis.The accuracy of the tasks can be improved effectively by their combination.However, the existing joint registration and segmentation framework of single-modal images is difficult to apply to multi-modal images.To address these problems, a Computed Tomography-Magnetic Resonance(CT-MR) image-based joint registration and segmentation framework based on modality-consistent supervision and a multi-scale modality-independent neighborhood descriptor is proposed.It consists of a multimodal image registration network and two segmentation networks.The deformation field generated by the multi-modal registration is used to establish the corresponding deformation relationship between the segmentation network results of the two modalities.Modality consistency supervision loss is constructed, which improves the accuracy of multi-modal segmentation because the two segmentation networks supervise each other.In the multimodal image registration network, a multi-scale modality-independent neighborhood descriptor is constructed to enhance the representation ability of cross-modal information.The descriptor is added to the registration network as a structural loss term to constrain the local structure correspondence of multimodal images more accurately.Experiments were performed on a dataset of 118 CT-MR multimodal liver images.When 30% segmentation labels are provided, the Dice Similarity Coefficient(DSC) of liver registration of this method reaches 94.66(±0.84)%, and the Target Registration Error(TRE) reaches 5.191(±1.342) mm.The DSC of liver segmentation of this method reaches 94.68(±0.82)% and 94.12%(±1.06)% in CT and MR images.These results are superior to those of the comparable registration and segmentation method.

  • Artificial Intelligence and Pattern Recognition
    Zhangjie RAN, Linfu SUN, Yisheng ZOU, Yulin MA
    Computer Engineering. 2023, 49(9): 52-59. https://doi.org/10.19678/j.issn.1000-3428.0065745

    A Knowledge Graph(KG) is composed of a large number of fact triples, which often contain a large number of few-shot relations that rarely appear in the real world. For these few-shot relations, it is challenging to complete the missing triples in the KG, and existing few-shot Knowledge Graph Completion(KGC) models cannot effectively extract the representation of few-shot relations. To address this problem, a few-shot KGC model based on a relation learning network is proposed. Considering the relevance of the relations, neighbor aggregation encoding is performed on the reference and query triples to obtain an enhanced entity embedding representation. The structure that integrates a Transformer encoder and Long Short-Term Memory(LSTM) neural network, allows the relation representation of triples to be encoded and output. The semantic similarity between query and dynamic reference relations is obtained using the attention mechanism and combined with the hypothesis of the translation model, whereby the possibility of establishing query triples is comprehensively scored. The experimental results show that the model can effectively extract the fine-grained semantics of few-shot relations by integrating path-finding and context semantics. Compared with the optimal value of the evaluation metrics in baseline models, the average improvement of few-shot link prediction tasks reach 9.5 percentage points with the proposed model.