Author Login Editor-in-Chief Peer Review Editor Work Office Work

Most accessed

  • Published in last 1 year
  • In last 2 years
  • In last 3 years
  • All

Please wait a minute...
  • Select all
    |
  • Graphics and Image Processing
    Yang LIU, Jun CHEN, Shijia HU, Jiahua LAI
    Computer Engineering. 2023, 49(10): 247-254. https://doi.org/10.19678/j.issn.1000-3428.0065825

    In the mainstream feature-based Simultaneous Localization and Mapping(SLAM) method, feature matching is a key step in estimating camera motion. However, the local characteristics of image features cause widespread mismatch and have become a major bottleneck in visual SLAM. In addition, the sparse maps generated by the feature-based method can only be used for localization, as they do not satisfy higher-level requirements. To address the problems of low efficiency in ORB feature point matching and failure to generate dense maps in ORB-SLAM3, an improved ORB Grid-based Motion Statistics(ORB-GMS) matching strategy is proposed, whereby a dense point cloud construction thread is added to ORB-SLAM3 to realize dense mapping. The motion smoothness constraint is used for the feature point motion statistics method, and the number of matches in the feature point neighborhood and threshold are compared to efficiently determine whether the current match is correct. The gridded images are used for fast computation to perform camera pose estimation. Finally, the dense point cloud map is constructed according to the key frame and the corresponding pose, using the outlier point removal and voxel-grid filters to reduce the size of the point cloud. The experimental results on the RGB-D dataset of TUM show that compared with ORB-SLAM3, the proposed algorithm can reduce matching time by approximately 50% and average positioning error by 32%, while increasing the number of matches by an average of 60%. In addition, compared to sparse maps, this method generates dense point cloud maps that are easy for secondary processing, thereby expanding the application scenarios of the algorithm.

  • Graphics and Image Processing
    Jiaxin LI, Jin HOU, Boying SHENG, Yuhang ZHOU
    Computer Engineering. 2023, 49(9): 256-264. https://doi.org/10.19678/j.issn.1000-3428.0065935

    In remote sensing imagery, the detection of small objects poses significant challenges due to factors such as complex background, high resolution, and limited effective information. Based on YOLOv5, this study proposes an advanced approach, referred to as YOLOv5-RS, to enhance small object detection in remote sensing images. The presented approach employs a parallel mixed attention module to address issues arising from complex backgrounds and negative samples. This module optimizes the generation of a weighted feature map by substituting fully connected layers with convolutions and eliminating pooling layers. To capture the nuanced characteristics of small targets, the downsampling factor is tailored, and shallow features are incorporated during model training. At the same time, a unique feature extraction module combining convolution and Multi-Head Self-Attention (MHSA) is designed to overcome the limitations of ordinary convolution extraction by jointly representing local and global information, thereby extending the model's receptive field. The EIoU loss function is employed to optimize the regression process for both prediction and detection frames to enhance the localization capacity of small objects. The efficacy of the proposed algorithm is verified via experiments on datasets comprising small target remote sensing images. The results show that compared with YOLOv5s, the proposed algorithm has an average detection accuracy improvement of 1.5 percentage points, coupled with a 20% reduction in parameter count. Particularly, the proposed algorithm's average detection accuracy of small vehicle targets increased by 3.2 percentage points. Comparative evaluations against established methodologies such as EfficientDet, YOLOx, and YOLOv7 underscore the proposed algorithm's capacity to adeptly balance the dual objectives of detection accuracy and real-time performance.

  • Artificial Intelligence and Pattern Recognition
    Zhiqiang WU, Qing XIE, Lin LI, Yongjian LIU
    Computer Engineering. 2024, 50(1): 91-100. https://doi.org/10.19678/j.issn.1000-3428.0066929

    Many existing Graph Neural Network(GNN) recommendation algorithms use the node number information of the user-item interaction graph for training and learn the high-order connectivity among user and item nodes to enrich their representations. However, user preferences for different modal information are ignored, modal information such as images and text of items are not utilized, and the fusion of different modal features is summed without distinguishing the user preferences for different modal information types. A multimodal fusion GNN recommendation model is proposed to address this problem. First, for a single modality, a unimodal graph network is constructed by combining the user-item interaction bipartite graph, and the user preference for this modal information is learned in the unimodal graph. Graph ATtention(GAT) network is used to aggregate the neighbor information and enrich the local node representation, and the Gated Recurrent Unit(GRU) is used to decide whether to aggregate the neighbor information to achieve the denoising effect. Finally, the user and item representations learned from each modal graph are fused by the attention mechanism to obtain the final representation and then sent to the prediction module. Experimental results on the MovieLens-20M and H&M datasets show that the multimodal information and attention fusion mechanism can effectively improve the recommendation accuracy, and the algorithm model has significant improvements in Precision@K, Recall@K, and NDCG@K compared with the baseline optimal algorithm for the three indicators. When an evaluation index K value of 10 is selected, Precision@10, Recall@10, and NDCG@10 increase by 4.67%, 2.42%, 2.03%, and 2.49%, 5.24%, 2.05%, respectively, for the two datasets.

  • Artificial Intelligence and Pattern Recognition
    Qiru LI, Xia GENG
    Computer Engineering. 2023, 49(12): 111-120. https://doi.org/10.19678/j.issn.1000-3428.0066348

    The traditional Deep Q Network(DQN) algorithm solves the dimensionality problem of Q-learning algorithms in complex environments by integrating deep neural networks and reinforcement learning methods that are widely used in the path planning of mobile robots. However, the traditional DQN algorithm has a low network convergence speed and poor path planning effect, and consequently, obtaining the optimal path in a short training round is challenging. To solve these problems, an improved ERDQN algorithm is proposed. The Q value is recalculated by recording the frequency of the repeated states. The more times a state is repeated in the process of network training, the lower the probability of the next occurrence of the state. This phenomenon can improve the robot's ability to explore the environment, reduce the risk of network convergence to the local optima to a certain extent, and reduce the number of training rounds required for network convergence. The reward function is redesigned according to the moving direction of the robot, and the distance between the robot and target point. The robot obtains a positive reward when it is close to the target point and a negative reward when it is far from the target point. The absolute value of the reward is adjusted according to the current moving direction of the robot, and the distance between the robot and target point; thus, the robot can plan a better path while avoiding obstacles. The experimental results show that compared with the DQN algorithm, the average score of the ERDQN algorithm is increased by 18.9%, whereas the path length and number of planned rounds reduced by approximately 20.1% and 500, respectively. These results prove that the ERDQN algorithm can effectively improve network convergence speed and path planning performance.

  • Development Research and Engineering Application
    Jianhao ZHAN, Lipeng GAN, Yonghui BI, Peng ZENG, Xiaochao LI
    Computer Engineering. 2023, 49(10): 280-288, 297. https://doi.org/10.19678/j.issn.1000-3428.0065152

    The multi-modality fusion method is a core technique for effectively exploring complementary features from multiple modalities to improve action recognition performance at data-, feature-, and decision-level fusion. This study mainly investigated the multimodality fusion method at the feature and decision levels through knowledge distillation, transferring feature learning from other modalities to the RGB model, including the effects of different loss functions and fusion strategies. A multi-modality distillation fusion method is proposed for action recognition, whereby knowledge distillation is performed using the MSE loss function at the feature level, KL divergence at the decision-prediction level, and a combination of the original skeleton and optical flow modalities as multi-teacher networks so that the RGB student network can simultaneously learn with better recognition accuracy. Extensive experiments show that the proposed method achieved state-of-the-art performance with 90.09%, 95.12%, 97.82%, and 81.26% accuracies on the NTU RGB+D 60, UTD-MHAD, N-UCLA, and HMDB51 datasets, respectively. The recognition accuracy on the UTD-MHAD dataset has increased by 3.49, 2.54, 3.21, and 7.34 percentage points compared to single mode RGB data, respectively.

  • Graphics and Image Processing
    Bingyan ZHU, Zhihua CHEN, Bin SHENG
    Computer Engineering. 2024, 50(1): 216-223. https://doi.org/10.19678/j.issn.1000-3428.0066941

    Owing to the rapid development of remote sensing technology, remote sensing image detection technology is being used extensively in agriculture, military, national defense security, and other fields. Compared with conventional images, remote sensing images are more difficult to detect; therefore, researchers have endeavored to detect remote sensing images efficiently and accurately. To address the high calculation complexity, large-scale range variation, and scale imbalance of remote sensing images, this study proposes a perceptually enhanced Swin Transformer network, which improves the detection of remote sensing images. Exploiting the hierarchical design and shift windows of the basic Swin Transformer, the network inserts spatial local perceptually blocks into each stage, thus enhancing local feature extraction while negligibly increasing the calculation amount. An area-distributed regression loss is introduced to assign larger weights to small objects for solving scale imbalance; additionally, the network is combined with an improved IoU-aware classification loss to eliminate the discrepancy between different branches and reduce the loss of classification and regression. Experimental results on the public dataset DOTA show that the proposed network yields a mean Average Precision(mAP) of 78.47% and a detection speed of 10.8 frame/s, thus demonstrating its superiority over classical object detection networks(i.e., Faster R-CNN and Mask R-CNN) and existing excellent remote sensing image detection networks. Additionally, the network performs well on all types of objects at different scales.

  • Graphics and Image Processing
    Chunbo XU, Juan YAN, Huibin YANG, Bo WANG, Han WU
    Computer Engineering. 2023, 49(8): 199-206, 214. https://doi.org/10.19678/j.issn.1000-3428.0065522

    Currently, most Visual Simultaneous Localization And Mapping(VSLAM) algorithms are based on static scene design and do not consider dynamic objects in a scene.However, dynamic objects in an actual scene cause mismatches among the feature points of the visual odometer, which affects the positioning and mapping accuracy of the SLAM system and reduce its robustness in practical applications. Aimed at an indoor dynamic environment, a VSLAM algorithm based on the ORB-SLAM3 main framework, known as RDTS-SLAM, is proposed. An improved YOLOv5 target detection and semantic segmentation network is used to accurately and rapidly segment objects in the environment.Simultaneously, the target detection results are combined with the local optical flow method to accurately identify dynamic objects, and the feature points in the dynamic object area are eliminated. Only static feature points are used for feature point matching and subsequent positioning and mapping.Experimental results on the TUM RGB dataset and actual environment data show that compared to ORB-SLAM3 and RDS-SLAM algorithms, the Root Mean Square Error(RMSE) of trajectory estimation for sequence walking_rpy of RDTS-SLAM algorithm is reduced by 95.38% and 86.20%, respectively, which implies that it can significantly improve the robustness and accuracy of the VSLAM system in a dynamic environment.

  • Cyberspace Security
    Shuaiwei LIU, Zhi LI, Guomei WANG, Li ZHANG
    Computer Engineering. 2024, 50(2): 180-187. https://doi.org/10.19678/j.issn.1000-3428.0067077

    Adversarial attack and defense is a popular research area in computer security. Trans-GAN, an adversarial example generation algorithm based on the combination of Transformer and Generate Adversarial Network(GAN), is proposed to address the problems of the poor visual quality of existing gradient-based adversarial example generation methods and the low generation efficiency of optimization-based methods. First, the algorithm utilizes the powerful visual representation capability of the Transformer as a reconstruction network for receiving clean images and generating adversarial noise. Second, the Transformer reconstruction network is combined with a deep convolutional network-based discriminator as a generator to form a GAN architecture, which improves the authenticity of the generated images and ensures the stability of training. Meanwhile, the improved attention mechanism, Targeted Self-Attention, is proposed to introduce target labels as a priori knowledge when training the network, which guides the network model to learn to generate adversarial perturbations with specific attack targets. Finally, adversarial noise is added to the clean examples using skip-connections to form adversarial examples. Experimental results demonstrate that the proposed algorithm achieves an attack success rate of more than 99.9% on both models used for the MNIST dataset and 96.36% and 98.47% on the two models used for the CIFAR10 dataset, outperforming the current state-of-the-art generative-based adversarial attack methods. The qualitative results show that compared to the Fast Gradient Sign Method(FGSM)and Projected Gradient Descent(PGD)algorithms, the generated adversarial noise of the Trans-GAN algorithm is less perturbed, and the formed adversarial examples are more natural and meet the requirements of human vision, which is not easily distinguished.

  • Research Hotspots and Reviews
    Chang WANG, Leixiao LI, Yanyan YANG
    Computer Engineering. 2023, 49(11): 1-12. https://doi.org/10.19678/j.issn.1000-3428.0066661

    The fatigue driving detection method based on computer vision has the advantage of being noninvasive and does not affect driving behavior, making it easy to apply in practical scenarios.With the development of computer technology, an increasing number of researchers are studying fatigue driving detection methods based on computer vision. Fatigue driving behavior is mainly reflected in the face and limbs. Furthermore, in the field of computer vision, facial behavior is easier to obtain than physical behavior. Therefore, facial-feature-based fatigue driving detection methods have become an important research direction in the field of fatigue driving detection. Various fatigue driving detection methods are analyzed comprehensively based on multiple facial features of drivers, and the latest research results worldwide are summarized.The specific behaviors of drivers with different facial features under fatigue conditions are introduced, and the fatigue driving detection process is discussed based on multiple facial features. Results from research conducted worldwide are classified based on different facial features, and different feature extraction methods and state discrimination methods are classified. The parameters used to distinguish driver fatigue status are summarized based on the various behaviors generated by different features in a state of fatigue. Furthermore, current research results on the use of facial multi-feature comprehensive discrimination for fatigue driving are described, and the similarities and differences of different methods are analyzed. On this basis, the shortcomings in the current field of fatigue driving detection based on facial multi-feature fusion are discussed, and future research directions in this field are described.

  • Graphics and Image Processing
    Wenzhuo FAN, Tao WU, Junping XU, Qingqing LI, Jianlin ZHANG, Meihui LI, Yuxing WEI
    Computer Engineering. 2023, 49(9): 217-225. https://doi.org/10.19678/j.issn.1000-3428.0065689

    Traditional deep learning image super-resolution reconstruction network only extracts features at a fixed resolution and cannot integrate advanced semantic information. The challenges include difficulties integrating advanced semantic information, reconstructing images with specific scale factors, limited generalization capability, and managing an excessive number of network parameters. An arbitrary scale image super-resolution reconstruction algorithm based on multi-resolution feature fusion is proposed, termed as MFSR. In the phase of multi-resolution feature fusion encoding, a multi-resolution feature extraction module is designed to extract different resolution features. A dual attention module is constructed to enhance the network feature extraction ability. The information-rich fused feature map is obtained by fully interacting with different resolution features. In the phase of image reconstruction, the fused feature map is decoded by a multi-layer perception machine to realize a super-resolution image at any scale. The experimental results indicate that tests were conducted on the Set5 data set with scaling factors of 2, 3, 4, 6, 8, and the Peak Signal-to-Noise Ratios (PSNR) of the proposed algorithm were 38.62, 34.70, 32.41, 28.96, and 26.62 dB, respectively. The model parameters correspond to 0.72×106, which significantly reduce the number of parameters, maintain the reconstruction quality, and realize super-resolution image reconstruction at any scale. Furthermore, the model can realize better performance than mainstream algorithms, such as SRCNN, VDSR, and EDSR.

  • Research Hotspots and Reviews
    Zhe LIAN, Yanjun YIN, Fei YUN, Min ZHI
    Computer Engineering. 2024, 50(3): 16-27. https://doi.org/10.19678/j.issn.1000-3428.0067427

    Natural scene text detection technology based on deep learning has become a crucial research focal point in the fields of computer vision and natural language processing. Not only does it possess a wide range of potential applications but also serves as a new platform for researchers to explore neural network models and algorithms. First, this study introduces the relevant concepts, research background, and current developments in natural scene text detection technology. Subsequently, an analysis of recent deep learning-based text detection methods is performed, categorizing them into four classes: detection boxes-, segmentation-, detection-boxes and segmentation-based, and others. The fundamental concepts and main algorithmic processes of classical and mainstream methods within these four categories are elaborated, summarizing the usage mechanisms, applicable scenarios, advantages, disadvantages, simulation experimental results, and environment settings of different methods, while clarifying their interrelationships. Thereafter, common public datasets and performance evaluation methods for natural scene text detection are introduced. Finally, the major challenges facing current deep learning-based natural scene text detection technology are outlined, and future development directions are discussed.

  • Development Research and Engineering Application
    Xingya YAN, Yaxi KUANG, Guangrui BAI, Yue LI
    Computer Engineering. 2023, 49(7): 251-258. https://doi.org/10.19678/j.issn.1000-3428.0065369

    Student classroom behaviors can directly reflect the quality of the class, whereby the analysis and evaluation of classroom behaviors through artificial intelligence and big data can help improve the quality of teaching. Traditional student classroom behavior recognition methods rely on teachers' direct observation of students or an analysis of student surveillance videos after class. This method is time-consuming, labor-intensive, and has low recognition rate, making it difficult to follow problems in the classroom and during exams in real time. This study proposes a posture recognition method based on deep learning BetaPose. The data enhancement technology is used to improve the robustness of the subsequent detection model. The improved YOLOv5 target detection algorithm is used to obtain the human detection frame. Based on the MobileNetV3 model, the lightweight posture recognition model is designed to improve the accuracy of posture recognition in crowded scenes. The keypoints of the human body thus obtained are input into the linear classifier with improved modeling and expression ability to determine the final behavior results. The experimental show that the proposed lightweight posture recognition model BetaPose had the highest average recognition accuracy of 82.6% for various parts of the human body, and the recognition rates for various behaviors in simple and crowded scenes are above 91% and 85%, respectively. Therefore, the proposed model can be effectively recognize multiple behaviors in the classroom.

  • Research Hotspots and Reviews
    Haoyang LI, Xiaowei HE, Bin WANG, Hao WU, Qi YOU
    Computer Engineering. 2024, 50(2): 43-50. https://doi.org/10.19678/j.issn.1000-3428.0066399

    Load prediction is an essential part of cloud computing resource management. Accurate prediction of cloud resource usage can improve cloud platform performance and prevent resource wastage. However, the dynamic and mutative use of cloud computing resources makes load prediction difficult, and managers cannot allocate resources reasonably. In addition, although Informer has achieved better results in time-series prediction, it does not impose restrictions on the causal dependence of time, causing future information leakage. Moreover, it does not consider the increase in network depth leading to model performance degradation. A multi-step load prediction model based on an improved Informer, known as Informer-DCR, is proposed. The regular convolution between attention blocks in the encoder is replaced by dilated causal convolution, such that the upper layer in the deep network can receive a wider range of input information to improve the prediction accuracy of the model, and ensure the causality of the time-series prediction process. Simultaneously, the residual connection is added to the encoder, such that the input information of the lower layer of the network is directly transmitted to the subsequent higher layer, and the deep network degradation is solved to improve the model performance. The experimental results demonstrate that compared with the mainstream prediction models such as Informer and Temporal Convolutional Network(TCN), the Mean Absolute Error(MAE) of the Informer-DCR model is reduced by 8.4%-40.0% under different prediction steps, and Informer-DCR exhibits better convergence than Informer during the training process.

  • Graphics and Image Processing
    Xinlu JIANG, Tianen CHEN, Cong WANG, Chunjiang ZHAO
    Computer Engineering. 2024, 50(1): 232-241. https://doi.org/10.19678/j.issn.1000-3428.0067030

    Intelligent pest detection is an essential application of target detection technology in the agricultural field. This detection method effectively improves the efficiency and reliability of pest detection and reporting work and ensures crop yield and quality. Under fixed-trapping devices such as insect traps and sticky insect boards, the image background is simple, the lighting conditions are stable, and the pest features are significant and easy to extract. Pest detection can achieve high accuracy, but its application scenario is fixed, and the detection range is limited to the surrounding equipment and cannot adapt to complex field environments. A small object pest detection model called Pest-YOLOv5 is proposed to improve the flexibility of pest detection and prediction to address the difficulties and missed detections attributed to complex image backgrounds and small pest sizes in field environments. By adding a Coordinate Attention(CA) mechanism in the feature extraction network and combining spatial and channel information, the ability to extract small object pest features is enhanced. The Bidirectional Feature Pyramid Network(BiFPN) structure is used in the neck connection section, and multi-scale features are combined to alleviate the problem of small object information loss caused by multiple convolutions. Based on this, SIoU and VariFocal loss functions are used to calculate losses, and the optimal classification loss weight coefficients are obtained experimentally, making the model more focused on object samples that are difficult to classify. The experimental results on a subset of the publicly available dataset, AgriPest, show that the Pest-YOLOv5 model has mAP0.5 and recall of 70.4% and 67.8%, respectively, which are superior to those of classical object detection models, such as the original YOLOv5s model, SSD, and Faster R-CNN. Compared with the YOLOv5s model, the Pest-YOLOv5 model improves the mAP0.5, mAP0.50∶0.95, and recall by 8.1%, 7.9%, and 12.8%, respectively, enhancing the ability to detect targets.

  • Research Hotspots and Reviews
    Jian CAO, Yimei CHEN, Haisheng LI, Qiang CAI
    Computer Engineering. 2023, 49(10): 1-12. https://doi.org/10.19678/j.issn.1000-3428.0065984

    Small target detection in complex road scenes can improve the vehicle's perception of the surrounding environment. Thus, it is an important research direction in the field of computer vision and intelligent transportation. With the development of deep learning technology, a combination of deep learning and small target detection on roads can effectively improve detection accuracy, allowing the vehicle to quickly respond to the surrounding environment. Starting with the latest classic research results in small target detection, this research provides two definitions for small targets and analyzes the reasons for the difficulty encountered in small target detection on roads. Subsequently, five types of optimization methods based on deep learning are expounded upon to improve detection accuracy of small targets on roads. The optimization methods include enhanced data, multi-scale strategy, generated Super-Resolution(SR) detail information, strengthened contextual information connection and improved loss function. The core ideas of various methods and the latest research progress at home and abroad are summarized. Large and public datasets commonly used in road small target detection are introduced along with corresponding indicators to evaluate the performance of small target detection. In comparing and analyzing the performance detection results of various methods on different datasets, this research presents the current research on road small target and associated problems, looking forward to future research directions from multiple perspectives.

  • Cyberspace Security
    Huawei SONG, Shengqi LI, Fangjie WAN, Yuping WEI
    Computer Engineering. 2024, 50(3): 166-172. https://doi.org/10.19678/j.issn.1000-3428.0067791

    Federated Learning(FL) can collaborate to train global models without compromising data privacy. Nonetheless, this collaborative training approach faces the challenge of Non-IID in the real world; slow model convergence and low accuracy. Numerous existing FL methods improve only from one perspective of global model aggregation and local client update, and inevitably will not cause the impact of the other perspective and reduce the quality of the global model. In this context, we introduce a hierarchical continuous learning optimization method for FL, denoted as FedMas, which is based on the idea of hierarchical fusion. First, clients with similar data distribution are divided into different layers using the DBSCAN algorithm, and only part of clients of a certain layer are selected for training each time to avoid weight differences caused by different data distributions when the server global model is aggregated. Further, owing to the different data distributions of each layer, the client combines the solution of continuous learning catastrophic forgetting during local update to effectively integrate the differences between the data of different layers of clients, thus ensuring the performance of the global model. Experiments on MNIST and CIFAR-10 standard datasets demonstrate that the global model test accuracy is improved by 0.3-2.2 percentage points on average compared with FedProx, Scaffold, and FedCurv FL algorithms.

  • Development Research and Engineering Application
    Long SUN, Rongfen ZHANG, Yuhong LIU, Tingli RAO
    Computer Engineering. 2023, 49(9): 313-320. https://doi.org/10.19678/j.issn.1000-3428.0065697

    In dense crowds scenario, dense targets under the monitoring perspective, mutual occlusion, small targets, and face perspective distortion cause problems in mask wearing detection. Meanwhile, public datasets covering incorrectly worn masks are also lacking. Therefore, this paper proposes a mask wearing detection algorithm from a monitoring perspective, MDDC-YOLO, based on the YOLO-v5 improvement. In view of the large proportion of small- and medium-sized targets in dense population, the conventional C3 module in YOLO-v5 is replaced with the MRF-C3 module of the atrous convolutional structure. The anti-occlusion ability of the model is also improved by using Repulsion Loss based on the principle of repulsion attraction of the sample bounding box, and the masking positive sample is fully utilized during the training process. An Efficient Channel Attention(ECA) mechanism is further introduced for optimal selection of feature channels. Finally, to address the lack of mask wearing data in the crowd from a monitoring perspective, an offline data enhancement method based on perspective transformation is proposed. The proposed Mosaic-9 data enhancement generates additional small target samples to address this problem. The experimental results show that the MDDC-YOLO algorithm provides 6.5 percentage points mAP improvement compared with YOLO-v5, thereby reaching a detection speed of 32 frame/s, which satisfies the application requirements of mask-wearing detection in dense populations.

  • Development Research and Engineering Application
    Xinyi ZHANG, Fei ZHANG, Bin HAO, Lu GAO, Xiaoying REN
    Computer Engineering. 2023, 49(8): 265-274. https://doi.org/10.19678/j.issn.1000-3428.0065701

    In dense crowd scenes in public places, face mask wearing detection algorithms have poor detection results because of missing information caused by target occlusion and the problems of small detection targets and low resolution. To improve the detection accuracy and speed of the model as well as to reduce the hardware footprint, an improved mask wearing detection algorithm based on YOLOv5s is proposed. The conventional convolution is replaced with Ghost-Shadowed wash Convolution(GSConv), combining Standard Convolution(SConv)and Depth-Wise separable Convolution(DWConv) with channel blending, thereby improving the network speed with guaranteed accuracy. The nearest neighbor upsampling method is replaced with a lightweight universal upsampling operator to make full use of the semantic feature information. Adaptive Spatial Feature Fusion(ASFF) is added at the end of the neck layer of the improved YOLOv5s model, which allows better fusion of features at different scales and improves the network detection accuracy.In addition, adaptive image sampling is used to alleviate the problem of data imbalance. Mosaic data enhancement is used to make full use of small targets.Experimental results show that the model achieves a mean Average Precision(mAP) value of 93% on the AIZOO dataset, a 2 percentage points improvement over the original YOLOv5 model.It achieves 97.7% detection accuracy for faces wearing masks and outperforms the detection results of the YOLO series, SSD, and RetinaFace in the same situation. It also runs on a GPU with a 16.7 percentage points inference speedup. The model weights file uses 23.5 MB memory for real-time mask wearing detection.

  • Artificial Intelligence and Pattern Recognition
    Huan WANG, Lijuan SONG, Fang DU
    Computer Engineering. 2023, 49(12): 88-95. https://doi.org/10.19678/j.issn.1000-3428.0066938

    Interactive tasks involving multi-modal data present advanced requirements for the comprehensive utilization of knowledge from different modalities, leading to the emergence of multi-modal knowledge graphs. When constructing these graphs, accurately determining whether image and text entities refer to the same object is particularly important for entity alignment of Chinese cross-modal entities. To address this problem, a Chinese cross-modal entity alignment method based on a multi-modal knowledge graph is proposed. Image information is introduced into the entity alignment task, and a single and dual-stream interactive pre-trained language model, namely CCMEA, is designed for domain-specific, fine-grained images and Chinese text. Utilizing a self-supervised learning method, Text-Visual features are extracted using Text-Visual Encoder, and fine-grained modeling is performed using cross-coders. Finally, a comparison learning method is employed to evaluate the degree of alignment between image and text entities. The experimental results show that the Mean Recall(MR) of the CCMEA model improved by 3.20 and 11.96 percentage points compared to that of the WukongViT-B baseline model on the MUGE and Flickr30k-CN datasets, respectively. Furthermore, the model achieved a remarkable MR of 94.3% on the self-built TEXTILE dataset. These results demonstrate that the proposed method can effectively align Chinese cross-modal entities with high accuracy in practical applications.

  • Frontiers in Computer Systems
    Yanfei FANG, Qi LIU, Enming DONG, Yanbing LI, Feng GUO, Di WANG, Wangquan HE, Fengbin QI
    Computer Engineering. 2023, 49(12): 10-24. https://doi.org/10.19678/j.issn.1000-3428.0066548

    Manycore has become the mainstream processor architecture for building HPC supercomputer systems, providing powerful computing power for High Performance Computing(HPC) exascale supercomputers. With the increasing number of cores integrated on manycore processor chips, the competition for large-scale cores for memory resources has become more intense. Manycore on-chip memory hierarchy is an important structure that alleviates the "memory wall" problem, aids HPC applications better play the computing advantages of manycore processors, and improves the performance of practical applications. The design has a significant impact on the performance, power consumption, and area of an on-chip system. The design of a many-call on-chip memory hierarchy has a significant impact on the performance, power consumption, and area of manycore systems. It is an important part of the structural design of manycore systems and is a research interest in the industry. Owing to the differences in the development history of manycore chips, the design technology of on-chip microarchitecture, and the different requirements of the application fields, the current HPC mainstream manycore on-chip storage hierarchy is different; however, from the perspective of horizontal comparison and the vertical development trend of each processor, as well as from the changes in application requirements brought by the continuous integration and development of HPC, data science, and machine learning, the hybrid structure of the SPM+Cache would most likely become the mainstream choice for the on-chip storage hierarchy designs of manycore processors in HPC exascale supercomputer systems in the future. For exascale computing software and algorithms, the designs and optimization based on the characteristics of the manycore memory hierarchy can aid HPC applications benefit from the computing advantages of manycore processors, thus effectively improving the performance of practical applications. Therefore, software, algorithm design, and optimization technology for the characteristics of the manycore on-chip storage hierarchy is also a research interest in the industry. This study first partitioned the on-chip memory hierarchy into multilevel Cache, SPM, and SPM+Cache hybrid structures according to different organizations, and then summarized and analyzed the advantages and disadvantages of these structures. This study analyzed the current status and development trend of the memory hierarchy designs of the chips of mainstream exascale supercomputer systems, such as the international mainstream GPU, homogeneous manycore, and domestic manycore. In summary, the research status of software and hardware technologies is related to the design and optimization of the memory hierarchy from the manycore of the manycore LLC management and cache consistency protocol, SPM management and data movement optimization, and the global perspective optimization of the SPM+cache hybrid architecture. Thus, this study looks forward to the future research direction of on-chip memory hierarchy based on different perspectives, such as hardware, software, and algorithm designs.

  • Graphics and Image Processing
    Fangyu FENG, Xiaoshu LUO, Zhiming MENG, Guangyu WANG
    Computer Engineering. 2023, 49(8): 190-198. https://doi.org/10.19678/j.issn.1000-3428.0065224

    As it is difficult to extract effective features in facial expression recognition and the high similarity between categories and easy confusion lead to low accuracy of facial expression recognition, a facial expression recognition method based on anti-aliasing residual attention network is proposed. First, in view of the problem that the traditional subsampling method can easily cause the loss of expression discriminative features, an anti-aliasing residual network is constructed to improve the feature extraction ability of expression images and enhance the representation of expression features, enabling more effective global facial expression information to be extracted.At the same time, the improved channel attention mechanism and label smoothing regularization strategy are used to enhance the attention to the local key expression regions of the face: the improved channel attention focuses on the highly discriminative expression features and suppresses the weight of non-expressive regions, so as to locate more detailed local expression regions in the global information extracted by the network, and the label smoothing technology corrects the prediction probability by increasing the amount of information of the decision-making expression category, avoiding too absolute prediction results, which reduces misjudgment between similar expressions. Experimental results show that, the recognition accuracies of this method on the facial expression datasets RAF-DB and FERPlus reach 88.14% and 89.31%, respectively.Compared with advanced methods such as DACT and VTFF, this method has better performance. Compared with the original residual network, the accuracy and robustness of facial expression recognition are effectively improved.

  • Graphics and Image Processing
    Jianwei LI, Xiaoqi LÜ, Yu GU
    Computer Engineering. 2023, 49(10): 239-246, 254. https://doi.org/10.19678/j.issn.1000-3428.0066050

    Skin cancer is one of the deadliest cancers, and it is particularly critical to accurately classify dermoscopy images. However, the existing dermoscopy images have complex shapes and a small number of samples, which makes it difficult for the existing automatic classification methods to extract image feature information; these methods also have a high error rate. To solve this problem, this paper proposes an improved ConvNeXt method and build, SE-SimAM-ConvNeXt model. First, with ConvNeXt as the basic network, the SimAM nonparametric attention module is added to improve the network's feature extraction capability. Second, channel attention is added to the basic network to enhance the mining ability of ConvNeXt for potential key features. Finally, the Cosine Warmup mechanism is added at the beginning of training, and the cosine function value is used to attenuate the learning rate during the process, further accelerating the convergence of ConvNeXt and improving the classification ability of the ConvNeXt model. The experimental results on the HAM10000 skin dataset show that the classification accuracy, precision, recall, and specificity of the model reach 92.9%, 85.3%, 78.0%, and 97.5%, respectively, and is demonstrated effective classification capability for dermoscopy images. This bears significant potential in aiding the auxiliary diagnosis of skin cancer lesions, providing valuable assistance to dermatologists in making accurate diagnoses of skin cancer.

  • Artificial Intelligence and Pattern Recognition
    Zhangjie RAN, Linfu SUN, Yisheng ZOU, Yulin MA
    Computer Engineering. 2023, 49(9): 52-59. https://doi.org/10.19678/j.issn.1000-3428.0065745

    A Knowledge Graph(KG) is composed of a large number of fact triples, which often contain a large number of few-shot relations that rarely appear in the real world. For these few-shot relations, it is challenging to complete the missing triples in the KG, and existing few-shot Knowledge Graph Completion(KGC) models cannot effectively extract the representation of few-shot relations. To address this problem, a few-shot KGC model based on a relation learning network is proposed. Considering the relevance of the relations, neighbor aggregation encoding is performed on the reference and query triples to obtain an enhanced entity embedding representation. The structure that integrates a Transformer encoder and Long Short-Term Memory(LSTM) neural network, allows the relation representation of triples to be encoded and output. The semantic similarity between query and dynamic reference relations is obtained using the attention mechanism and combined with the hypothesis of the translation model, whereby the possibility of establishing query triples is comprehensively scored. The experimental results show that the model can effectively extract the fine-grained semantics of few-shot relations by integrating path-finding and context semantics. Compared with the optimal value of the evaluation metrics in baseline models, the average improvement of few-shot link prediction tasks reach 9.5 percentage points with the proposed model.

  • Development Research and Engineering Application
    Shui HU
    Computer Engineering. 2023, 49(9): 303-312. https://doi.org/10.19678/j.issn.1000-3428.0067067

    Wargame deduction is an important method for cultivating modern military commanders. Introducing artificial intelligence technology in wargame deduction can simplify organizational processes and improve deduction efficiency. Owing to the complex situational information and incomplete inference information, intelligent wargame based on machine learning often reduces the sample efficiency of autonomous decision-making models. This paper proposes an intelligent wargame deduction decision-making method based on deep reinforcement learning. In response to the efficiency issue of intelligent wargame deduction and combat decision-making, a baseline is introduced into the strategy network, and the training of the policy network is accelerated. Subsequently, derivation and proof are presented, and a method for updating the parameters of the policy network after adding the baseline is proposed. The process of introducing the state-value function in the wargame deduction environment into the model is analyzed. Construct a Low Advantage Policy-Value Network(LAPVN) model and its training framework for wargame deduction under traditional policy-value networks, and construct the model using battlefield situational awareness methods. In a wargame combat experimental environment that approximately conforms to military operational rules, the traditional policy-value network and LAPVN are compared for training. In 400 self-game training sessions, the loss value of the LAPVN model decreases from 5.3 to 2.3, and the convergence is faster than that of the traditional policy-value network. The KL divergence of the LAPVN model is very close to zero during the training process.

  • Research Hotspots and Reviews
    Xingxing DONG, Jixun GAO, Xiaotong WANG, Song LI
    Computer Engineering. 2023, 49(9): 1-15. https://doi.org/10.19678/j.issn.1000-3428.0064822

    As indispensable components of spatial relations, spatial directional relations are widely used in many fields such as urban intelligent traffic control, environmental resource detection, and disaster prevention and reduction. Spatial directional relations represent a significant and challenging issue in fields such as geographic information systems, spatial database, artificial intelligence, and pattern recognition.This study conducts a comprehensive analysis and comparison of existing spatial directional relationship expression and inference models.First, the research progress on current models for directional relations between objects in two-dimensional space are introduced in detail in terms of single and group target objects.In addition, the characteristics, advantages, and disadvantages of the current models for directional relations in three-dimensional space are analyzed from point to block.The study expounds the current research status of models that use uncertainty directional relations from the two aspects of extended models based on those that use precision objects for directional relations and models based on uncertainty set theory for uncertain objects. The study then discusses the advantages, drawbacks, and applicable fields for each type of model.Finally, the shortcomings of current research are explained, and the future research directions of spatial orientation relations are prospected in terms of automatic reasoning technology, joint representation of spatial relations, and group target objects.

  • Development Research and Engineering Application
    Lumeng CHEN, Yanyan CAO, Min HUANG, Xingang XIE
    Computer Engineering. 2023, 49(8): 291-301, 309. https://doi.org/10.19678/j.issn.1000-3428.0065025

    The existing image-based flame detection approach finds it challenging to balance real-time and precision, and it is incapable of accurately identifying small flame targets, making it ineffective for application situations such as small fire extinguishing. In terms of real-time detection, the YOLOv5 algorithm provides significant benefits over conventional techniques. A real-time flame detection method based on improved YOLOv5 is proposed to increase flame detection accuracy. First, to help the model locate the flame features more accurately, a coordinate attention mechanism module is embedded in the feature extraction portion of the YOLOv5 model.This module can reduce feature redundancy without sacrificing the feature information. Second, to help the model successfully obtain flame features with a receptive field smaller than 8×8 pixels, a detection layer specifically designed for small flame targets is added to the feature fusion portion of the algorithm along with the corresponding feature extraction and feature fusion modules. Finally, to increase the model's speed of convergence and robustness to small datasets, α-CIoU is employed as a new bounding box loss function in the computation phase of the loss function.Additionally, model pretraining and transfer learning techniques are used to initialize the weight parameters of each layer structure of the flame detection model to prevent the gradient from dissipating and enhance the training effect. According to the experimental findings, the proposed flame detection model shows an accuracy rate of 96.6%, which is 7.4 percentage points higher than that of the YOLOv5 original model.Additionally, the detection speed of this model is 68 frame/s, and its size is only 15.4 MB. On the basic of significantly improving accuracy, it can also meet the requirements of firefighting robots for real-time and lightweight flame detection.

  • Research Hotspots and Reviews
    Jinsheng CHEN, Wenzhen MA, Shaofeng FANG, Ziming ZOU
    Computer Engineering. 2023, 49(11): 13-23. https://doi.org/10.19678/j.issn.1000-3428.0066521

    With the construction of the Meridian Project all-sky airglow imager observation network, a large amount of raw airglow image data has been accumulated. The current atmospheric gravity wave research based on airglow observation is extremely dependent on manual identification, which is very time-consuming, and the quality of labeling is difficult to guarantee. Therefore, there is an urgent need for a fast and effective automatic identification method. To solve the problem of sparsely labeled samples of atmospheric gravity waves, this paper proposes an algorithm based on the improved Cycle GAN model to expand the atmospheric gravity wave airglow observation dataset, thereby greatly improving the recognition accuracy of atmospheric gravity waves by labeling only a small number of samples. A new intelligent recognition algorithm for atmospheric gravity waves is also proposed by improving the YOLOv5s model backbone network and bounding box prediction, considering the characteristics of low Signal-to-Noise Ratio(SNR) between the recognition target and background in airglow images. The experimental results showed that using the augmented dataset and improved YOLOv5s target detection algorithm, the average precision reached 75.8% under an Intersection-over-Union(IoU) threshold of 0.5, which is 9.7 percentage points higher than that of the original model. Meanwhile, the detection speed and average recognition accuracy are superior to mainstream target detection algorithms compared.

  • Research Hotspots and Reviews
    Hongpeng LI, Bo MA, Yating YANG, Lei WANG, Zhen WANG, Xiao LI
    Computer Engineering. 2023, 49(9): 23-31. https://doi.org/10.19678/j.issn.1000-3428.0066170

    Event extraction aims to recognize and extract event information from unstructured natural language texts in a structured form.Traditional methods extract events at the sentence level, relying on massive labeled data for training, which are unqualified for document-level event extraction and lack performance in low-resource scenarios.Existing research utilizes prompt learning methods to achieve document-level event extraction by filling in template slots.However, traditional prompt template slots have low accuracy in classifying argument roles, which can easily lead to errors in argument role extraction.To address the above issues, this paper proposes a document-level event extraction method based on slot semantic enhancement prompt learning.Based on the prompt learning method, the argument role semantic information in the traditional event extraction paradigm is integrated into the slot of the prompt template, providing argument type constraints for the slot prediction generation process of the model and improving the accuracy of document-level event extraction.By keeping the upstream and downstream tasks of the pretrained language model consistent, the generalization ability of the model is improved, and knowledge transfer is achieved at a lower cost to improve model performance in low-resource event extraction scenarios.Experimental results show that compared to the traditional baseline method with suboptimal performance, this method achieved an F1 score improvement of 2.6, 2.9, and 4.0 percentage points on an English event extraction dataset containing 59 argument types, Chinese dataset containing 92 argument types, and low-resource data scale, respectively.

  • Artificial Intelligence and Pattern Recognition
    Lu HAN, Weigang HUO, Yonghui ZHANG, Tao LIU
    Computer Engineering. 2023, 49(9): 99-108. https://doi.org/10.19678/j.issn.1000-3428.0065846

    Each subsequence of the Multivariate Time Series(MTS) contains multi-scale characteristics of different time spans, comprising information such as development process, direction, and trend. However, existing time series prediction models cannot effectively capture multi-scale features and evaluate their importance. In this study, a MTS prediction network, FFANet, is proposed based on multi-scale temporal feature fusion and a Dual-Attention Mechanism(DAM).FFANet effectively integrates multi-scale features and focuses on important parts.Utilizing the parallel temporal dilation convolution layer in the multi-scale temporal feature fusion module endows the model with multiple receptive domains to extract features of temporal data at different scales and adaptively fuse them based on their importance. Using a DAM to recalibrate the fused temporal features, FFANet focuses on features that make significant contributions to prediction by assigning temporal and channel attention weights and weighting them to the corresponding temporal features. The experimental results show that compared with AR, VARMLP, RNN-GRU, LSTNet-skip, TPA-LSTM, MTGNN, and AttnAR time series prediction models, FFANet achieves average reduction of 0.152 3、0.120 0、0.074 3、0.035 4、0.021 5、0.012 1、0.020 0 in RRSE prediction error on Traffic, Solar Energy, and Electricity datasets, respectively.

  • Graphics and Image Processing
    Hong ZHAO, Yubo FENG
    Computer Engineering. 2023, 49(12): 194-204. https://doi.org/10.19678/j.issn.1000-3428.0066520

    In tasks involving traffic sign detection, the YOLOv5 detection algorithm encounters several issues including missed detections, erroneous detections, and a complex model in complex environments and road conditions. To address these challenges, an improved CGS-Ghost YOLO detection model is proposed. YOLOv5 uses the focus module for sampling, which introduces more parameters. In this study, the StemBlock module is used to replace the focus module for sampling after input, which can reduce the number of parameters while maintaining the accuracy. CGS-Ghost YOLO uses a Coordinate Attention(CA) mechanism, which improves the semantic and location information within the features and enhances the feature extraction ability of the model. Additionally, a CGS convolution module, which combines the SMU activation function with GroupNorm(GN) normalization, is proposed. The CGS convolution module is designed to avoid the influence of the batch Size on the model during training and improve model performance. This study aims to use GhostConv to reduce the number of model parameters and effectively improve the detection accuracy of the model.The loss function, $ \alpha $-CIoU Loss+VFocal Loss, is used to solve the problem of unbalanced positive and negative samples in traffic sign detection tasks and improve the overall performance of the model. The neck part uses a Bi-FPN bidirectional feature pyramid network, ensuring that the multi-scale features of the detection target are effectively fused. The results of an experiment on the TT100K traffic sign detection dataset show that the detection accuracy of the improved CGS-Ghost YOLO model reaches 93.1%, which is 11.3 percentage points higher than the accuracy achieved by the original model. Additionally, the proposed network model reduces the model parameter quantity by 21.2 percentage points compared to the original model. In summary, the network model proposed in this study optimizes the convolution layer and the downsampling part, thus considerably reducing the model parameters while enhancing the model detection accuracy.

  • Research Hotspots and Reviews
    Jinshuo LIU, Daichen WANG, Juan DENG, Lina WANG
    Computer Engineering. 2023, 49(8): 13-19, 28. https://doi.org/10.19678/j.issn.1000-3428.0067003

    Currently, most existing methods for classifying harmful information on Internet overlook imbalanced data and long-tailed distributions, biasing the model towards more numerous data samples during classification. This makes them unable to effectively identify small data samples, which results in a decrease in overall recognition accuracy. To address this issue, a classification method LTIC for long-tailed harmful information datasets is proposed. By integrating few-shot learning with knowledge transfer strategies, the BERT model is used to learn the weights of the head class. The prototype of the head class is obtained through a Prototyper network specifically designed for few-shot learning.This design allows for the processing of head and tail data separately, thereby avoiding the data imbalance caused by mutual training. Researchers then use the mapping relationship learned from the prototype to convert the prototype of the tail class into weights. Subsequently, the head and tail class weights are combined to obtain the final classification result. In experiments, the LTIC method achieves classification accuracies of 82.7% and 83.5% on the Twitter and THUCNews datasets, respectively. This method also significantly improves the F1 value compared to the non-long tailed model, thus effectively improving classification accuracy. When compared with the latest classification methods such as BNN and OLTR, this method exhibits superior classification performance on long-tailed datasets, with an average accuracy improvement of 3%. When new categories of harmful information emerge, the LTIC method demonstrates the capability to predict them with minimal computation, achieving an accuracy of 70% and showcasing impressive scalability.

  • Evolutionary and Swarm Intelligence Algorithm and Application
    Rong FEI, Mengyang MA, Xiao ZHANG, Xinhong HEI, Qingzheng XU, Yuan QIU
    Computer Engineering. 2023, 49(7): 10-20. https://doi.org/10.19678/j.issn.1000-3428.0066975

    In autonomous driving, trajectory prediction and collision detection are the key technologies that can improve the perception ability of the autonomous driving system for the surrounding environment and ensure driving safety. The Conv-LSTM model displays good trajectory prediction ability, effectively processing trajectory data with spatio-temporal correlation. However, the predictive ability of this model is relatively weak in complex situations, such as traffic congestion and complex roads. Therefore, this study proposes a trajectory prediction model for driving intention identification based on Long Short-Term Memory(LSTM) network.The trajectory prediction model is constructed based on Conv-LSTM and uses the identified driving intention information to predict future trajectories, improving the accuracy and interpretability of trajectory prediction. In addition, two attention mechanisms are introduced to analyze the importance of the historical trajectory information of the target object and surrounding vehicles, which enables the model to focus on the most representative neighboring vehicles to better capture the relationships between different time steps.In addition, a collision detection algorithm based on hybrid bounding box is proposed. In this algorithm, collision is pre-judged based on the proposed minimum safe distance and maximum collision distance to avoid creation of an oriented bounding box during collision detection in non-conflict situations, thus improving the efficiency of collision detection while ensuring detection accuracy. The NGSIM dataset is used for model performance verification and the results show that the Root Mean Square Error(RMSE) of the proposed model is lower than that of Conv-LSTM, sys-Conv, and other models, indicating that the trajectory prediction accuracy of the proposed model is higher. Compared with the Oriented Bounding Box(OBB), Axis-Aligned Bounding Box(AABB), and AABB-OBB algorithms, the average collision detection time is reduced by 64.47%, 53.88%, and 55.47% respectively, using the proposed algorithm based on hybrid bounding box.

  • Graphics and Image Processing
    Xianguo LI, Bin LI
    Computer Engineering. 2023, 49(9): 226-233, 245. https://doi.org/10.19678/j.issn.1000-3428.0065513

    Convolutional Neural Network(CNN) has limitations when applied solely to image deblurring tasks with restricted receptive fields.Transformer can effectively mitigate these limitations.However, the computational complexity increases quadratically as the spatial resolution of the input image increases.Therefore, this study proposes an image deblurring network based on Transformer and multi-scale CNN called T-MIMO-UNet. The multi-scale CNN is used to extract spatial features while the global feature of the Transformer is employed to capture remote pixel information.The local enhanced Transformer module, local Multi-Head Self-Attention(MHSA) computing network, and Enhanced Feed-Forward Network(EFFN) are designed.The block-by-block MHSA computation is performed using a windowing approach. The information interaction between different windows is enhanced by increasing the depth of the separable convolution layer.The results of the experiment conducted using the GoPro test dataset demonstrate that the Peak Signal-to-Noise Ratio(PSNR) of the T-MIMO-UNet increases by 0.39 dB, 2.89 dB, 3.42 dB, and 1.86 dB compared to the MIMO-UNet, DeepDeblur, DeblurGAN, and SRN networks, respectively.Additionally, the number of parameters is reduced by 1/2 compared to MPRNet.These findings prove that the T-MIMO-UNet effectively addresses the challenge of image blurring in dynamic scenes.

  • Graphics and Image Processing
    Xieliu YANG, Guowen MEN, Wenfeng LIANG, Dan WANG, Zhengyi XIE, Huijie FAN
    Computer Engineering. 2023, 49(11): 247-256. https://doi.org/10.19678/j.issn.1000-3428.0066610

    Due to the particularity of the underwater environment, underwater optical images often suffer from degradation issues such as color cast, blur, and low contrast. To restore these underwater images to their natural and clear colors, numerous methods for enhancing and restoring underwater images have been proposed. However, the existing underwater image enhancement restoration techniques primarily focus on improving the visual quality of underwater images. Their impact on the accuracy of underwater object detection using deep learning methods remains uncertain. Therefore, this study conducts a detailed and comprehensive exploration of the influence of fourteen typical underwater image enhancement and restoration methods and three common deep learning-based object detection models on the accuracy of deep learning-based object detection models. The analysis includes URPC2018 and URPC2019 datasets and considers factors such as domain variations between training and testing sets, the number of domains in the training set, the quantity of images in the training set, and self-created datasets for cross-dataset testing. The experimental results show that when both the training and test sets belong to the same dataset, underwater image enhancement and restoration methods, whether used as image preprocessing methods or data enhancement methods, do not significantly improve the detection accuracy of deep learning objects. However, when detecting across datasets, using underwater image enhancement and restoration methods can significantly enhance the detection accuracy of deep learning objects, with mAP increasing by up to 13.6 percentage points.

  • Research Hotspots and Reviews
    Bin YANG, Yitong WANG
    Computer Engineering. 2023, 49(10): 13-21. https://doi.org/10.19678/j.issn.1000-3428.0065807

    Heterogeneous Information Network(HIN) typically contains different types of nodes and interactions. Richer semantic information and complex relationships have posed significant challenges to current representation learning in HINs. Although most existing approaches typically use predefined meta-paths to capture heterogeneous semantic and structural information, they suffer from high cost and low coverage. In addition, most existing methods cannot precisely and effectively capture and learn influential high-order neighbor nodes. Accordingly, this study attempts to address the problems of meta-paths and influential high-order neighbor nodes with a proposed original HIN-HG model. HIN-HG generates a hyperadjacency graph of the HIN, precisely and effectively capturing the influential neighbor nodes of the target nodes. Then, convolutional neural networks are adopted with a multichannel mechanism to aggregate different types of neighbor nodes under different relationships. HIN-HG can automatically learn the weights of different neighbor nodes and meta-paths without manually specifying them. Meanwhile, nodes similar to the target node can be captured in the entire graph as higher-order neighbor nodes and the representation of the target node can be effectively updated through information propagation. The experimental results of HIN-HG on three real datasets-DBLP, ACM, and IMDB demonstrate the improved performance of HIN-HG compared with state-of-the-art methods in HIN representation learning, including HAN, GTN, and HGSL. HIN-HG exhibits improved accuracy of node classification by 5.6 and 5.7 percentage points on average in the multiple classification evaluation indices Macro-F1 and Micro-F1, respectively, thus improving the accuracy and effectiveness of node classification.

  • Graphics and Image Processing
    Wenshun SHENG, Xiongfeng YU, Jiayan LIN, Xin CHEN
    Computer Engineering. 2024, 50(1): 242-250. https://doi.org/10.19678/j.issn.1000-3428.0066724

    A modified Faster R-CNN algorithm is proposed to address the problem of poor detection ability for small-scale objects and occluded or truncated objects, combining the CBAM mechanism and feature pyramid structure. To focus on the efficient use of local information in feature images, the CBAM mechanism is integrated into the feature extraction network to reduce the interference of invalid targets and improve the detection ability, notwithstanding occluded or truncated objects. This introduces a Feature Pyramid Network(FPN) structure to connect high- and low-level feature data, obtaining high-resolution and strong semantic data, thereby enhancing the detection effect of small objects. To alleviate the phenomenon of gradient vanishing and reduce the scale of hyperparameters, the commonly used VGG16 network is replaced with a strong expressive ability of the inverse residual VS-ResNet network. VS-ResNet modifies some hierarchical structures based on the original ResNet 50, adds auxiliary classifiers, designs inverse residual and group convolution methods, such that the activation function information is fully preserved in high-dimensional environments, and improves detection accuracy. The reset candidate box score calculation method is used to compensate for the defect of the Non-Maximum Suppression(NMS) algorithm in mistakenly eliminating overlapping detection boxes. The experimental results demonstrate that compared to VGG16, VS-ResNet has a 2.97 percentage points improvement in accuracy on the CIFAR-10 dataset. The target detection mAP value of the proposed algorithm on the Pascal VOC 2012 dataset is 76.2%, which is 13.9 percentage points higher than that of the original Faster R-CNN algorithm.

  • Computer Engineering. 2023, 49(10): 0-0.
  • Research Hotspots and Reviews
    Baihao JIANG, Jing LIU, Dawei QIU, Liang JIANG
    Computer Engineering. 2024, 50(3): 1-15. https://doi.org/10.19678/j.issn.1000-3428.0067502

    Deep learning algorithms have the advantages of strong learning, strong adaptive, and unique nonlinear mapping abilities in spinal image segmentation. Compared with traditional segmentation methods, they can better extract key information from spinal images and suppress irrelevant information, which can assist doctors in accurately locating focal areas and realizing accurate and efficient segmentation. The application status of deep learning in spinal image segmentation is summarized and analyzed as concerns deep learning algorithms, types of spinal diseases, types of images, experimental segmentation results, and performance evaluation indicators. First, the background of the deep learning model and spinal image segmentation is described, and thereafter, the application of deep learning in spinal image segmentation is introduced. Second, several common types of spinal diseases are introduced, the difficulties in image segmentation are described, and common open datasets, image segmentation method flow, and image segmentation evaluation indicators are introduced in spinal image segmentation. Combined with specific experiments, the application progress of the Convolutional Neural Network(CNN) model, the U-Net model, and their improved models in the image segmentation of vertebrae, intervertebral discs, and spinal tumors are summarized and analyzed. Combined with previous experimental results and the current research progress of deep learning models, this paper summarizes the limitations of current clinical studies and the reasons for the insufficient segmentation effect, and proposes corresponding solutions to the existing problems. Finally, prospects for future studies and development are proposed.

  • Research Hotspots and Reviews
    Ying LIU, Yupeng MA, Fan ZHAO, Yi WANG, Tonghai JIANG
    Computer Engineering. 2024, 50(1): 39-49. https://doi.org/10.19678/j.issn.1000-3428.0067004

    Hyperledger Fabric is an alliance chain framework widely adopted both domestically and internationally. It exhibits characteristics such as numerous participating organizations, frequent transaction operations, and increased transaction conflicts in certain businesses based on Fabric technology. The multi-version concurrency control technology used in Fabric can partially resolve transaction conflicts as well as enhance system concurrency. However, this mechanism is imperfect and certain transaction data cannot be properly stored on the chain. To achieve complete, efficient, and trustworthy up-chain storage of massive transaction data, a data preprocessing mechanism based on the Fabric oracle machine is proposed. The Massive Conflict Preprocessing(MCPP) method is designed to ensure the integrity of transaction data with primary key conflicts through techniques including detection, monitoring, delayed submission, transaction locking, and reordering caching. Data transmission protection measures are introduced to utilize asymmetric encryption technology during transmission, preventing malicious nodes from forging authentication information and ensuring consistency before and after off-chain processing of transaction data. Theoretical analysis and experimental results demonstrate that this mechanism can effectively address concurrent conflict issues regarding up-chain massive transaction data in alliance chain platforms. When the transaction data scales reach 1 000 and 10 000, the MCPP method achieves time efficiency improvements of 38% and 21.4%, respectively, compared with the LMLS algorithm, with a success rate close to 100%. Thus, the proposed method exhibits efficiency and security, and does not impact Fabric system performance when concurrent conflicts do not occur.

  • Graphics and Image Processing
    Fangxin XU, Rong FAN, Xiaolu MA
    Computer Engineering. 2024, 50(3): 250-258. https://doi.org/10.19678/j.issn.1000-3428.0067741

    Aiming at the problem that the detection algorithm is prone to omission and false detection in crowded pedestrian detection scenarios, this study proposes an improved YOLOv7 crowded pedestrian detection algorithm. Introducing a BiFormer visual transformer and an improved RepConv and Channel Space Attention Module (CSAM)-based Efficient Layer Aggregation Network (RC-ELAN) module in the backbone network, the self-attention mechanism and the attention module enable the backbone network to focus more on the important features of the occluded pedestrians, effectively mitigating the adverse effects of the missing target features on the detection. The improved neck network based on the idea of a Bidirectional Feature Pyramid Network (BiFPN) is used, and the transposed convolution and improved Rep-ELAN-W module enable the model to efficiently utilize the small-target feature information in the middle and low-dimensional feature maps, effectively improving the small-target pedestrian detection performance of the model. The introduction of an Efficient Complete Intersection-over-Union (E-CIoU) loss function allows the model to further converge to a higher accuracy. Experimental results on the WiderPerson dataset containing a large number of small target-obscuring pedestrians demonstrate that the average accuracies of the improved YOLOv7 algorithm when the IoU thresholds are set to 0.5 and 0.5-0.95 are improved by 2.5 and 2.8, 9.9 and 7.1, and 12.3 and 10.7 percentage points compared with the YOLOv7, YOLOv5, and YOLOX algorithms, respectively, which can be better applied to crowded pedestrian detection scenarios.

  • Research Hotspots and Reviews
    Enxu WANG, Xiaohong WANG, Kun ZHANG, Dongwen ZHANG
    Computer Engineering. 2023, 49(11): 40-48, 69. https://doi.org/10.19678/j.issn.1000-3428.0066255

    In response to the challenge of capturing both timing and feature information in current load forecasting models, we propose a dual attention mechanism-based load forecasting model. This model seamlessly integrates both feature attention and temporal attention mechanisms, allowing it to adaptively extract feature and temporal information from server load data. This enhanced approach effectively emphasizes key information within feature and temporal data within the network. To comprehensively and accurately evaluate server load status for the next moment, we employ the CRITIC objective weighting method. This method assigns weights to various server characteristics, facilitating precise load value calculations. The resulting dual attention mechanism network builds upon a foundation of short-term and Long Short-Term Memory(LSTM) networks. It introduces both characteristic and temporal attention mechanisms while utilizing historical load data as input to predict future server load values. This approach significantly enhances the accuracy of the network for both single-step and multi-step load predictions. Experimental results using the Alibaba Cluster-trace-v2018 public dataset demonstrate the superiority of our dual attention mechanism network over LSTM-based load prediction networks. Specifically, the Mean Absolute Error(MAE) and Mean Square Error(MSE) of the dual attention mechanism network show impressive reductions of 9.2% and 16.8% respectively. This performance improvement underscores the network's stability and accuracy.

  • Research Hotspots and Reviews
    Qilin WU, Yagu DANG, Shanwei XIONG, Xu JI, Kexin BI
    Computer Engineering. 2023, 49(11): 24-29, 39. https://doi.org/10.19678/j.issn.1000-3428.0066181

    Taking the sentiment analysis task of students' teaching evaluation text as the starting point, in view of the insufficient feature-extraction ability of the traditional basic depth learning model, the low training efficiency of the recurrent neural network, and the inaccurate semantic representation of word vectors, a sentiment classification algorithm for student evaluation text based on a hybrid feature network is proposed. The lightweight pre-training model ALBERT is used to extract the dynamic vector representation of each word that conforms to the current context, solve the problem of polysemy in the traditional word vector model, and increase the accuracy of vector semantic representation.The hybrid feature network comprehensively captures the global context sequence features of the teaching evaluation text and the local semantic information at different scales by combining the simple recurrent unit, multi-scale local convolution learning module, and self-attention layer, to improve the deep feature representation ability of the model. The self-attention mechanism identifies the key features that significantly impact the emotional recognition results by calculating the importance of each classification feature to the classification results. To prevent irrelevant features from interfering with the results and affecting the classification performance, the classification vectors are spliced, and the emotional classification results of the evaluation text are output from the linear layer. In an experiment based on a real student teaching evaluation text dataset, the model achieves an F1 score of 97.8%, which is higher than that of the BERT-BiLSTM、BERT-GRU-ATT depth learning model. Additionally, an ablation experiment proves the effectiveness of each module.

  • Graphics and Image Processing
    Shupeng WANG, Yindi HE
    Computer Engineering. 2023, 49(8): 232-239. https://doi.org/10.19678/j.issn.1000-3428.0065889

    In uneven lighting conditions, images acquired by users often exhibit uneven brightness distribution and loss of details. Existing image enhancement methods suffer from local over-or under-enhancement when working with low-illumination images affected by uneven illumination.This study proposes an uneven illumination image enhancement algorithm called ULIEN fused with a feature attention mechanism. ULIEN learns a nonlinear Gamma function to effectively map unevenly illuminated images to enhanced images. The network integrates a luminance attention map and channel attention mechanism to mitigate local over- or under-enhancement issues. These components assign varying learning weights to different luminance areas and feature channels within the image, enabling the network to focus on the enhancement process in different regions. The enhancement network by the ULIEN exhibits a simple structure and is trained using a set of reference-free loss functions, eliminating the need for any reference image. Experimental results demonstrate the effectiveness of the ULIEN in preserving details, avoiding artifacts, and mitigating issues of local over- or under-enhancement problems from a subjective perspective. Furthermore, the images enhanced by the ULIEN achieves scores of 3.727 0, 1.109 6, 0.903 0, and 0.755 7 in BTMQI, ENIQA, TMQI, and UNIQUE, respectively, showcasing clear advantages over other enhancement algorithms.

  • Artificial Intelligence and Pattern Recognition
    Tianchen QIU, Xiaoying ZHENG, Yongxin ZHU, Songlin FENG
    Computer Engineering. 2023, 49(7): 110-117. https://doi.org/10.19678/j.issn.1000-3428.0064016

    In the scenarios of federated learning involving ultra-large-scale edge devices, the local data of participants are non-Independent Identically Distribution(non-IID) pattern, resulting in an imbalance in overall training data and difficulty in defending against poison attacks.The prior knowledge required by most methods to enhance the data balance in supervised learning conflicts with the privacy protection principle of federated learning.Furthermore, existing defense algorithms for poison attacks defense in non-IID scenarios are overly complex or violate data privacy.This study introduces FedFog, a multi-server architecture, capable of clustering participants with similar data distributions without disclosing the participants' local data distribution, and converting non-IID training data into multiple IID data subsets. Based on each cluster center, the global server calculates the weight of the features extracted from each category of data in the global model update to alleviate the negative impact of the overall training data imbalance.Simultaneously, FedFog assigns poison attack defense tasks from the entire set of participants to each cluster, thereby solving the problem of poison attack defense.The experimental results show that FedFog improves global model precision by up to 4.2 percentage points compared to FedSGD when the overall training data are not balanced.The convergence of FedFog in the scenario where the overall data are balanced but 1/3 of the participants are poison attackers approaches that of FedSGD in the no-poison attack scenario.

  • Mobile Internet and Communication Technology
    Linghui KONG, Zheheng RAO, Yanyan XU, Shaoming PAN
    Computer Engineering. 2023, 49(9): 199-207, 216. https://doi.org/10.19678/j.issn.1000-3428.0066301

    Intelligent routing algorithm based on Deep Reinforcement Learning(DRL) has become an important development direction for intelligent routing algorithms due to its combination of deep learning perception ability and reinforcement learning decision-making ability.However, existing DRL-based intelligent routing algorithms cannot adapt to the dynamically changing network topology in wireless networks, making it difficult to make appropriate routing decisions.To address this issue, this paper proposes an intelligent routing algorithm called MPNN-DQN, which combines the Message Passing Neural Network (MPNN) and DRL.MPNN-DQN uses MPNN to learn irregular network topology, enabling it to make effective decisions even when the network topology changes dynamically.Moreover, a hop-by-hop routing generation method based on k-order neighbor information aggregation is designed to improve the scalability of the algorithm while ensuring decision-making effectiveness; thus, the algorithm can be better applied to medium- to large-sized network topologies.Experimental results show that compared to routing algorithms such as GCN, DRSIR, and DQN, MPNN-DQN has superior average latency, packet loss rate, and network throughput indicators.In three different network scenarios, Germany, GBN, and synth50, the throughput of the proposed algorithm has been improved by 3.27%-23.03%, and has strong adaptability to dynamic network topologies.

  • Frontiers in Computer Systems
    Yi CHEN, Bosheng LIU, Yongqi XU, Jigang WU
    Computer Engineering. 2023, 49(12): 1-9. https://doi.org/10.19678/j.issn.1000-3428.0066701

    Deep Convolutional Neural Network(CNN) have large models and high computational complexity, making their deployment in Programmable Gate Array(FPGA) with limited hardware resources difficult. Hybrid precision CNNs can provide an effective trade-off between model size and accuracy, thus providing an efficient solution for reducing the model's memory footprint. As a fast algorithm, the Fast Fourier Transform(FFT) can convert traditional spatial domain CNNs into the frequency domain, effectively reducing the computational complexity of the model. This study presents an FPGA-based accelerator design for 8 bit and 16 bit hybrid precision frequency domain CNNs that supports the dynamic configuration of 8 bit and 16 bit frequency domain convolutions and can pack 8 bit frequency domain multiplication operations to enable the reuse of DSPs for performance improvement. A DSP-based Frequency-domain Processing Element(FPE) is designed to support 8 bit and 16 bit frequency domain convolution operations. It can pack a couple of 8 bit frequency domain multiplications to reuse DSPs to boost throughput. In addition, a mapping dataflow that supports both 8 bit and 16 bit computation patterns and can maximize the reduction of redundant data processing and data movement through data reuse is proposed. The proposed accelerator is evaluated based on the ResNet-18 and VGG16 models using the ImageNet dataset. The experimental results reveal that the proposed model can achieve 29.74 and 56.73 energy efficiency ratio(ratio of GOP to energy consumption)on the ResNet-18 and VGG16 models, respectively, which is 1.2-6.0 times better than those of frequency domain FPGA accelerators.

  • Frontiers in Computer Systems
    Junchao YE, Cong XU, Yao HUANG, Zhilei CHAI
    Computer Engineering. 2023, 49(12): 35-45. https://doi.org/10.19678/j.issn.1000-3428.0066260

    As a third-generation neural network, the Spiking Neural Network(SNN) uses neurons and synapses as the basic computing units, and its working mechanism is similar to that of the biological brain. Its complex topology of intra-layer connections and reverse connections has the potential to solve complex problems. Compared with the Leaky-Integrate-and-Fire(LIF) model, the Izhikevich neuron model can support a wider range of neuromorphic computing by simulating more biological impulse phenomena; however, the Izhikevich neuron model has higher computational complexity, leading to potential issues of suboptimal performance and increased power consumption within the network. To address these problem, a customized calculation method of Izhikevich neurons based on FPGA is proposed. First, by studying the value range of the parameters of Izhikevich neurons in the SNN and balancing the relative errors of the membrane potential and resource consumption, a fixed-point solution with mixed-precision is designed. Second, for a single neuron, the data path of the calculation equation is updated by balancing the neuron to achieve the minimum pipeline length. Furthermore, at the network level, a scalable computing architecture is devised to accommodate varying FPGA scales, ensuring adaptability across different configurations. Finally, the customized computing method is used to accelerate the classical NEST simulator. The experimental results reveal that, compared with that of the i7-10700 CPU, the performance of the classic lateral geniculate nucleus network model and the liquid state machine model on the ZCU102 is 2.26 and 3.02 times better in average, and the energy efficiency ratio is improved by 8.06 and 10.8 times in average.

  • Computer Engineering. 2023, 49(11): 0-0.
  • Evolutionary and Swarm Intelligence Algorithm and Application
    Xingjuan CAI, Yanheng GUO, Tianhao ZHAO, Wensheng ZHANG
    Computer Engineering. 2023, 49(7): 1-9. https://doi.org/10.19678/j.issn.1000-3428.0066105

    With the emergent development of edge computing, service deployment and task offloading are two significant challenges to be addressed. However, currently, the single problem of task offloading in edge environments is solved, while service deployment is rarely considered simultaneously.Because service deployment and task offloading are highly coupled, considering only one has limitations and can cause some wasted resources and significant latency, thus affecting user experience.Meanwhile, traditional evolutionary algorithms can not manage multiple single-objective or multi-objective optimization tasks simultaneously.Therefore, to solve both challenges simultaneously, this study focuses on constructing a multi-task multi-objective model, where each optimization problem is treated as a task.An improved multifactor optimization-based evolutionary multitasking algorithm is proposed and a location update strategy is introduced to increase the search population diversity. The proposed design improves the selective mating method and increases the quality of offspring individuals. Experimental simulation results demonstrate that, compared with different multi-objective algorithms, the proposed algorithm performs well in SP, Span, PD and other indicators, has better convergence performance, and significantly accelerates solution speed, which improves the overall system performance by approximately 11.4%.

  • Graphics and Image Processing
    Ben HONG, Xusheng QIAN, Minglei SHEN, Jisu HU, Chen GENG, Yakang DAI, Zhiyong ZHOU
    Computer Engineering. 2023, 49(9): 234-245. https://doi.org/10.19678/j.issn.1000-3428.0065678

    Medical image registration and segmentation are important tasks in medical image analysis.The accuracy of the tasks can be improved effectively by their combination.However, the existing joint registration and segmentation framework of single-modal images is difficult to apply to multi-modal images.To address these problems, a Computed Tomography-Magnetic Resonance(CT-MR) image-based joint registration and segmentation framework based on modality-consistent supervision and a multi-scale modality-independent neighborhood descriptor is proposed.It consists of a multimodal image registration network and two segmentation networks.The deformation field generated by the multi-modal registration is used to establish the corresponding deformation relationship between the segmentation network results of the two modalities.Modality consistency supervision loss is constructed, which improves the accuracy of multi-modal segmentation because the two segmentation networks supervise each other.In the multimodal image registration network, a multi-scale modality-independent neighborhood descriptor is constructed to enhance the representation ability of cross-modal information.The descriptor is added to the registration network as a structural loss term to constrain the local structure correspondence of multimodal images more accurately.Experiments were performed on a dataset of 118 CT-MR multimodal liver images.When 30% segmentation labels are provided, the Dice Similarity Coefficient(DSC) of liver registration of this method reaches 94.66(±0.84)%, and the Target Registration Error(TRE) reaches 5.191(±1.342) mm.The DSC of liver segmentation of this method reaches 94.68(±0.82)% and 94.12%(±1.06)% in CT and MR images.These results are superior to those of the comparable registration and segmentation method.