Author Login Editor-in-Chief Peer Review Editor Work Office Work

Most accessed

  • Published in last 1 year
  • In last 2 years
  • In last 3 years
  • All

Please wait a minute...
  • Select all
    |
  • Graphics and Image Processing
    Yang LIU, Jun CHEN, Shijia HU, Jiahua LAI
    Computer Engineering. 2023, 49(10): 247-254. https://doi.org/10.19678/j.issn.1000-3428.0065825

    In the mainstream feature-based Simultaneous Localization and Mapping(SLAM) method, feature matching is a key step in estimating camera motion. However, the local characteristics of image features cause widespread mismatch and have become a major bottleneck in visual SLAM. In addition, the sparse maps generated by the feature-based method can only be used for localization, as they do not satisfy higher-level requirements. To address the problems of low efficiency in ORB feature point matching and failure to generate dense maps in ORB-SLAM3, an improved ORB Grid-based Motion Statistics(ORB-GMS) matching strategy is proposed, whereby a dense point cloud construction thread is added to ORB-SLAM3 to realize dense mapping. The motion smoothness constraint is used for the feature point motion statistics method, and the number of matches in the feature point neighborhood and threshold are compared to efficiently determine whether the current match is correct. The gridded images are used for fast computation to perform camera pose estimation. Finally, the dense point cloud map is constructed according to the key frame and the corresponding pose, using the outlier point removal and voxel-grid filters to reduce the size of the point cloud. The experimental results on the RGB-D dataset of TUM show that compared with ORB-SLAM3, the proposed algorithm can reduce matching time by approximately 50% and average positioning error by 32%, while increasing the number of matches by an average of 60%. In addition, compared to sparse maps, this method generates dense point cloud maps that are easy for secondary processing, thereby expanding the application scenarios of the algorithm.

  • Graphics and Image Processing
    Jiaxin LI, Jin HOU, Boying SHENG, Yuhang ZHOU
    Computer Engineering. 2023, 49(9): 256-264. https://doi.org/10.19678/j.issn.1000-3428.0065935

    In remote sensing imagery, the detection of small objects poses significant challenges due to factors such as complex background, high resolution, and limited effective information. Based on YOLOv5, this study proposes an advanced approach, referred to as YOLOv5-RS, to enhance small object detection in remote sensing images. The presented approach employs a parallel mixed attention module to address issues arising from complex backgrounds and negative samples. This module optimizes the generation of a weighted feature map by substituting fully connected layers with convolutions and eliminating pooling layers. To capture the nuanced characteristics of small targets, the downsampling factor is tailored, and shallow features are incorporated during model training. At the same time, a unique feature extraction module combining convolution and Multi-Head Self-Attention (MHSA) is designed to overcome the limitations of ordinary convolution extraction by jointly representing local and global information, thereby extending the model's receptive field. The EIoU loss function is employed to optimize the regression process for both prediction and detection frames to enhance the localization capacity of small objects. The efficacy of the proposed algorithm is verified via experiments on datasets comprising small target remote sensing images. The results show that compared with YOLOv5s, the proposed algorithm has an average detection accuracy improvement of 1.5 percentage points, coupled with a 20% reduction in parameter count. Particularly, the proposed algorithm's average detection accuracy of small vehicle targets increased by 3.2 percentage points. Comparative evaluations against established methodologies such as EfficientDet, YOLOx, and YOLOv7 underscore the proposed algorithm's capacity to adeptly balance the dual objectives of detection accuracy and real-time performance.

  • Research Hotspots and Reviews
    LI Kequan, CHEN Yan, LIU Jiachen, MU Xiangwei
    Computer Engineering. 2022, 48(7): 1-12. https://doi.org/10.19678/j.issn.1000-3428.0062725
    CSCD(4)
    Most existing conventional object detection algorithms are based on sliding windows and artificial feature extraction, and exhibit disadvantages such as high computational complexity and unsatisfactory robustness under complex conditions.Recently, deep learning has been applied to object detection, bringing significant improvements to algorithm performance.Compared with conventional target detection algorithms, deep-learning-based algorithms offer high speed, accuracy and robustness under complex conditions.In this paper, we first expound upon target detection tasks in terms of their evaluation indicators, public datasets, and traditional algorithm frameworks.Then the existing deep learning-based target detection algorithms are categorized based on two criteria, whether there is an explicit region proposal and whether to define a priori anchorbox.We introduce the evolution of these algorithms, summarizing their mechanism, advantages, limits and application scenarios.On this basis, the performance of the representative algorithmson public datasets are analyzed and compared.Finally, we discuss the future directionsofresearch in deeplearning-based object detection.
  • Artificial Intelligence and Pattern Recognition
    Zhiqiang WU, Qing XIE, Lin LI, Yongjian LIU
    Computer Engineering. 2024, 50(1): 91-100. https://doi.org/10.19678/j.issn.1000-3428.0066929

    Many existing Graph Neural Network(GNN) recommendation algorithms use the node number information of the user-item interaction graph for training and learn the high-order connectivity among user and item nodes to enrich their representations. However, user preferences for different modal information are ignored, modal information such as images and text of items are not utilized, and the fusion of different modal features is summed without distinguishing the user preferences for different modal information types. A multimodal fusion GNN recommendation model is proposed to address this problem. First, for a single modality, a unimodal graph network is constructed by combining the user-item interaction bipartite graph, and the user preference for this modal information is learned in the unimodal graph. Graph ATtention(GAT) network is used to aggregate the neighbor information and enrich the local node representation, and the Gated Recurrent Unit(GRU) is used to decide whether to aggregate the neighbor information to achieve the denoising effect. Finally, the user and item representations learned from each modal graph are fused by the attention mechanism to obtain the final representation and then sent to the prediction module. Experimental results on the MovieLens-20M and H&M datasets show that the multimodal information and attention fusion mechanism can effectively improve the recommendation accuracy, and the algorithm model has significant improvements in Precision@K, Recall@K, and NDCG@K compared with the baseline optimal algorithm for the three indicators. When an evaluation index K value of 10 is selected, Precision@10, Recall@10, and NDCG@10 increase by 4.67%, 2.42%, 2.03%, and 2.49%, 5.24%, 2.05%, respectively, for the two datasets.

  • Research Hotspots and Reviews
    YANG Wenzhong, DING Tiantian, KANG Peng, BU Wenxiu
    Computer Engineering. 2023, 49(3): 1-17. https://doi.org/10.19678/j.issn.1000-3428.0064374
    The keyword extraction algorithm for public opinion events is used as a basic technique for public opinion monitoring.To quickly understand the news content, the algorithm aims to extract the core words associated with the concerns of the people at different events.With the development of deep learning, the traditional unsupervised keyword extraction techniques and classification models in supervised algorithms have been gradually replaced by sequence annotation models.The limitations of unsupervised keyword extraction, the advantages and disadvantages associated with classification models for keyword extraction, and the application of existing deep learning to assist in the development of keyword extraction technology have been addressed. The development of the overall keyword extraction technology is focused on analyzing the development of the deep learning keyword extraction methods, such as convolutional neural networks and recurrent neural networks.Furthermore, the advantages, disadvantages, and development trends of existing methods are summarized. In addition, although deep learning has an important function in the field of keyword extraction, the associated disadvantages of reliance on large-scale labeled samples, long training time, and high complexity need to be addressed further in future development. To ensure the authenticity of the analysis process, experimental replications were conducted using six public opinion news datasets and two small datasets.The experimental results were consistent with the theoretical analysis presented.On this basis, the various keyword extraction techniques and their associated difficulties and challenges are reviewed and analyzed. Additionally, the prospects for the development of this field are discussed in view of the existing problems.
  • Artificial Intelligence and Pattern Recognition
    Qiru LI, Xia GENG
    Computer Engineering. 2023, 49(12): 111-120. https://doi.org/10.19678/j.issn.1000-3428.0066348

    The traditional Deep Q Network(DQN) algorithm solves the dimensionality problem of Q-learning algorithms in complex environments by integrating deep neural networks and reinforcement learning methods that are widely used in the path planning of mobile robots. However, the traditional DQN algorithm has a low network convergence speed and poor path planning effect, and consequently, obtaining the optimal path in a short training round is challenging. To solve these problems, an improved ERDQN algorithm is proposed. The Q value is recalculated by recording the frequency of the repeated states. The more times a state is repeated in the process of network training, the lower the probability of the next occurrence of the state. This phenomenon can improve the robot's ability to explore the environment, reduce the risk of network convergence to the local optima to a certain extent, and reduce the number of training rounds required for network convergence. The reward function is redesigned according to the moving direction of the robot, and the distance between the robot and target point. The robot obtains a positive reward when it is close to the target point and a negative reward when it is far from the target point. The absolute value of the reward is adjusted according to the current moving direction of the robot, and the distance between the robot and target point; thus, the robot can plan a better path while avoiding obstacles. The experimental results show that compared with the DQN algorithm, the average score of the ERDQN algorithm is increased by 18.9%, whereas the path length and number of planned rounds reduced by approximately 20.1% and 500, respectively. These results prove that the ERDQN algorithm can effectively improve network convergence speed and path planning performance.

  • Development Research and Engineering Application
    Jianhao ZHAN, Lipeng GAN, Yonghui BI, Peng ZENG, Xiaochao LI
    Computer Engineering. 2023, 49(10): 280-288, 297. https://doi.org/10.19678/j.issn.1000-3428.0065152

    The multi-modality fusion method is a core technique for effectively exploring complementary features from multiple modalities to improve action recognition performance at data-, feature-, and decision-level fusion. This study mainly investigated the multimodality fusion method at the feature and decision levels through knowledge distillation, transferring feature learning from other modalities to the RGB model, including the effects of different loss functions and fusion strategies. A multi-modality distillation fusion method is proposed for action recognition, whereby knowledge distillation is performed using the MSE loss function at the feature level, KL divergence at the decision-prediction level, and a combination of the original skeleton and optical flow modalities as multi-teacher networks so that the RGB student network can simultaneously learn with better recognition accuracy. Extensive experiments show that the proposed method achieved state-of-the-art performance with 90.09%, 95.12%, 97.82%, and 81.26% accuracies on the NTU RGB+D 60, UTD-MHAD, N-UCLA, and HMDB51 datasets, respectively. The recognition accuracy on the UTD-MHAD dataset has increased by 3.49, 2.54, 3.21, and 7.34 percentage points compared to single mode RGB data, respectively.

  • Graphics and Image Processing
    Bingyan ZHU, Zhihua CHEN, Bin SHENG
    Computer Engineering. 2024, 50(1): 216-223. https://doi.org/10.19678/j.issn.1000-3428.0066941

    Owing to the rapid development of remote sensing technology, remote sensing image detection technology is being used extensively in agriculture, military, national defense security, and other fields. Compared with conventional images, remote sensing images are more difficult to detect; therefore, researchers have endeavored to detect remote sensing images efficiently and accurately. To address the high calculation complexity, large-scale range variation, and scale imbalance of remote sensing images, this study proposes a perceptually enhanced Swin Transformer network, which improves the detection of remote sensing images. Exploiting the hierarchical design and shift windows of the basic Swin Transformer, the network inserts spatial local perceptually blocks into each stage, thus enhancing local feature extraction while negligibly increasing the calculation amount. An area-distributed regression loss is introduced to assign larger weights to small objects for solving scale imbalance; additionally, the network is combined with an improved IoU-aware classification loss to eliminate the discrepancy between different branches and reduce the loss of classification and regression. Experimental results on the public dataset DOTA show that the proposed network yields a mean Average Precision(mAP) of 78.47% and a detection speed of 10.8 frame/s, thus demonstrating its superiority over classical object detection networks(i.e., Faster R-CNN and Mask R-CNN) and existing excellent remote sensing image detection networks. Additionally, the network performs well on all types of objects at different scales.

  • Graphics and Image Processing
    Chunbo XU, Juan YAN, Huibin YANG, Bo WANG, Han WU
    Computer Engineering. 2023, 49(8): 199-206, 214. https://doi.org/10.19678/j.issn.1000-3428.0065522

    Currently, most Visual Simultaneous Localization And Mapping(VSLAM) algorithms are based on static scene design and do not consider dynamic objects in a scene.However, dynamic objects in an actual scene cause mismatches among the feature points of the visual odometer, which affects the positioning and mapping accuracy of the SLAM system and reduce its robustness in practical applications. Aimed at an indoor dynamic environment, a VSLAM algorithm based on the ORB-SLAM3 main framework, known as RDTS-SLAM, is proposed. An improved YOLOv5 target detection and semantic segmentation network is used to accurately and rapidly segment objects in the environment.Simultaneously, the target detection results are combined with the local optical flow method to accurately identify dynamic objects, and the feature points in the dynamic object area are eliminated. Only static feature points are used for feature point matching and subsequent positioning and mapping.Experimental results on the TUM RGB dataset and actual environment data show that compared to ORB-SLAM3 and RDS-SLAM algorithms, the Root Mean Square Error(RMSE) of trajectory estimation for sequence walking_rpy of RDTS-SLAM algorithm is reduced by 95.38% and 86.20%, respectively, which implies that it can significantly improve the robustness and accuracy of the VSLAM system in a dynamic environment.

  • Cyberspace Security
    Shuaiwei LIU, Zhi LI, Guomei WANG, Li ZHANG
    Computer Engineering. 2024, 50(2): 180-187. https://doi.org/10.19678/j.issn.1000-3428.0067077

    Adversarial attack and defense is a popular research area in computer security. Trans-GAN, an adversarial example generation algorithm based on the combination of Transformer and Generate Adversarial Network(GAN), is proposed to address the problems of the poor visual quality of existing gradient-based adversarial example generation methods and the low generation efficiency of optimization-based methods. First, the algorithm utilizes the powerful visual representation capability of the Transformer as a reconstruction network for receiving clean images and generating adversarial noise. Second, the Transformer reconstruction network is combined with a deep convolutional network-based discriminator as a generator to form a GAN architecture, which improves the authenticity of the generated images and ensures the stability of training. Meanwhile, the improved attention mechanism, Targeted Self-Attention, is proposed to introduce target labels as a priori knowledge when training the network, which guides the network model to learn to generate adversarial perturbations with specific attack targets. Finally, adversarial noise is added to the clean examples using skip-connections to form adversarial examples. Experimental results demonstrate that the proposed algorithm achieves an attack success rate of more than 99.9% on both models used for the MNIST dataset and 96.36% and 98.47% on the two models used for the CIFAR10 dataset, outperforming the current state-of-the-art generative-based adversarial attack methods. The qualitative results show that compared to the Fast Gradient Sign Method(FGSM)and Projected Gradient Descent(PGD)algorithms, the generated adversarial noise of the Trans-GAN algorithm is less perturbed, and the formed adversarial examples are more natural and meet the requirements of human vision, which is not easily distinguished.

  • Research Hotspots and Reviews
    DU Qinghua, ZHANG Kai
    Computer Engineering. 2022, 48(7): 13-21,28. https://doi.org/10.19678/j.issn.1000-3428.0064163
    To manage complex data analysis tasks, cross-platform data processing systems combining multiple platforms are being developed.The platform selection of operators in the cross-platform workflow of the system is critical to the system performance, because the implementation of operators on different platforms will result in significantly different performances.Currently, cost-based optimization methods are primarily applied in cross-platform workflow optimization to achieve platform selection;however, the existing cost models cannot mine the potential information of cross-platform workflows, thus resulting in inaccurate cost estimation.Hence, a more efficient cross-platform workflow optimization method is proposed herein.This method uses the GAT-BiGRU-FC Network(GGFN) model as the cost model, which uses both operator and workflow features as model inputs.The model uses a graph attention mechanism to capture the structure information of the Directed Acyclic Graph(DAG)-type cross-platform workflow and the information of the neighbor nodes of the operator.The gated recurrent unit is used to memorize the operation timing information of operators to achieve accurate cost estimations.Subsequently, the enumeration algorithm of the operator implementation platform is designed and implemented based on the characteristics of the cross-platform workflow.The algorithm utilizes the GGFN-based cost model and delay-greedy pruning method to perform enumeration and selects the appropriate implementation platform for each operator.Experiments show that this method can improve the execution performance of cross-platform workflows by 3x and reduce the runtime by more than 60%.
  • Research Hotspots and Reviews
    Chang WANG, Leixiao LI, Yanyan YANG
    Computer Engineering. 2023, 49(11): 1-12. https://doi.org/10.19678/j.issn.1000-3428.0066661

    The fatigue driving detection method based on computer vision has the advantage of being noninvasive and does not affect driving behavior, making it easy to apply in practical scenarios.With the development of computer technology, an increasing number of researchers are studying fatigue driving detection methods based on computer vision. Fatigue driving behavior is mainly reflected in the face and limbs. Furthermore, in the field of computer vision, facial behavior is easier to obtain than physical behavior. Therefore, facial-feature-based fatigue driving detection methods have become an important research direction in the field of fatigue driving detection. Various fatigue driving detection methods are analyzed comprehensively based on multiple facial features of drivers, and the latest research results worldwide are summarized.The specific behaviors of drivers with different facial features under fatigue conditions are introduced, and the fatigue driving detection process is discussed based on multiple facial features. Results from research conducted worldwide are classified based on different facial features, and different feature extraction methods and state discrimination methods are classified. The parameters used to distinguish driver fatigue status are summarized based on the various behaviors generated by different features in a state of fatigue. Furthermore, current research results on the use of facial multi-feature comprehensive discrimination for fatigue driving are described, and the similarities and differences of different methods are analyzed. On this basis, the shortcomings in the current field of fatigue driving detection based on facial multi-feature fusion are discussed, and future research directions in this field are described.

  • Graphics and Image Processing
    Wenzhuo FAN, Tao WU, Junping XU, Qingqing LI, Jianlin ZHANG, Meihui LI, Yuxing WEI
    Computer Engineering. 2023, 49(9): 217-225. https://doi.org/10.19678/j.issn.1000-3428.0065689

    Traditional deep learning image super-resolution reconstruction network only extracts features at a fixed resolution and cannot integrate advanced semantic information. The challenges include difficulties integrating advanced semantic information, reconstructing images with specific scale factors, limited generalization capability, and managing an excessive number of network parameters. An arbitrary scale image super-resolution reconstruction algorithm based on multi-resolution feature fusion is proposed, termed as MFSR. In the phase of multi-resolution feature fusion encoding, a multi-resolution feature extraction module is designed to extract different resolution features. A dual attention module is constructed to enhance the network feature extraction ability. The information-rich fused feature map is obtained by fully interacting with different resolution features. In the phase of image reconstruction, the fused feature map is decoded by a multi-layer perception machine to realize a super-resolution image at any scale. The experimental results indicate that tests were conducted on the Set5 data set with scaling factors of 2, 3, 4, 6, 8, and the Peak Signal-to-Noise Ratios (PSNR) of the proposed algorithm were 38.62, 34.70, 32.41, 28.96, and 26.62 dB, respectively. The model parameters correspond to 0.72×106, which significantly reduce the number of parameters, maintain the reconstruction quality, and realize super-resolution image reconstruction at any scale. Furthermore, the model can realize better performance than mainstream algorithms, such as SRCNN, VDSR, and EDSR.

  • Cyberspace Security
    SUN Jia, ZHANG Jianhui, BU Youjun, CHEN Bo, HU Nan, WANG Fangyu
    Computer Engineering. 2022, 48(7): 151-158,167. https://doi.org/10.19678/j.issn.1000-3428.0061750
    At present, the field of log anomaly detection has difficulties such as large data volume, high concealment of faults and attack threats, and complex feature engineering of traditional methods.The rapid research and development of deep learning provides new ideas for solving these problems.Here we propose to combine Convolutional Neural Network(CNN) and Bi-LSTM. The superior CNN-BiLSTM deep learning model not only considers the significant time series characteristics of the log key, but also takes into account the spatial location characteristics of the log parameters, and uses the splicing mapping method to perform feature fusion processing to avoid mutual inundation to the greatest extent, which is feasible in analyzing model complexity After the performance, based on the Hadoop log HDFS data set, comparing CNN and Bi-LSTM to verify the superior CNN-BiLSTMassification effect of the CNN-BiLSTM model, reaching about 91% log anomaly detection accuracy, and reaching 94% detection accuracy on the WC98_day Web log data set. Verify the good generalization ability of the CNN-BiLSTM model, and finally analyze the importance of word embedding and fully connected layer structure in the CNN-BiLSTM model through ablation experiments.
  • Artificial Intelligence and Pattern Recognition
    SI Yichen, GUAN Youqing
    Computer Engineering. 2022, 48(7): 66-72. https://doi.org/10.19678/j.issn.1000-3428.0061432
    CSCD(2)
    Named Entity Recognition(NER) is an important task in Natural Language Processing(NLP), and compared with English NER, Chinese NER is often more difficult to achieve.Traditional Chinese entity recognition models are usually based on deep neural networks used to label all characters in the text.Although they identify named entities according to the label sequence, such character-based labeling methods have difficulty obtaining the word information.To address this problem, this paper proposes a Chinese NER model based on the Transformer encoder.In the word embedding layer of the model, the word vector coding method is used in combination with a dictionary, such that the char vector contains the word information.At the same time, to solve the problem in which the Transformer encoder loses the relative position information of the characters during an attention calculation, this paper modifies the attention calculation method of the Transformer encoder and introduces a relative position coding method.Finally, a Conditional Random Field(CRF) model is introduced to obtain the optimal tag sequence.The experimental results show that the F1 value of this model when applied to the Resume dataset reaches 94.7%, and on the Weibo dataset reaches 58.2%, which are improvements in comparison with traditional NER models based on a Bidirectional Long Short-Term Memory(BiLSTM) network and Iterated Dilated Convolution Neural Network(ID-CNN).In addition, it achieves a better recognition and faster convergence speed.
  • Research Hotspots and Reviews
    Zhe LIAN, Yanjun YIN, Fei YUN, Min ZHI
    Computer Engineering. 2024, 50(3): 16-27. https://doi.org/10.19678/j.issn.1000-3428.0067427

    Natural scene text detection technology based on deep learning has become a crucial research focal point in the fields of computer vision and natural language processing. Not only does it possess a wide range of potential applications but also serves as a new platform for researchers to explore neural network models and algorithms. First, this study introduces the relevant concepts, research background, and current developments in natural scene text detection technology. Subsequently, an analysis of recent deep learning-based text detection methods is performed, categorizing them into four classes: detection boxes-, segmentation-, detection-boxes and segmentation-based, and others. The fundamental concepts and main algorithmic processes of classical and mainstream methods within these four categories are elaborated, summarizing the usage mechanisms, applicable scenarios, advantages, disadvantages, simulation experimental results, and environment settings of different methods, while clarifying their interrelationships. Thereafter, common public datasets and performance evaluation methods for natural scene text detection are introduced. Finally, the major challenges facing current deep learning-based natural scene text detection technology are outlined, and future development directions are discussed.

  • Development Research and Engineering Application
    Xingya YAN, Yaxi KUANG, Guangrui BAI, Yue LI
    Computer Engineering. 2023, 49(7): 251-258. https://doi.org/10.19678/j.issn.1000-3428.0065369

    Student classroom behaviors can directly reflect the quality of the class, whereby the analysis and evaluation of classroom behaviors through artificial intelligence and big data can help improve the quality of teaching. Traditional student classroom behavior recognition methods rely on teachers' direct observation of students or an analysis of student surveillance videos after class. This method is time-consuming, labor-intensive, and has low recognition rate, making it difficult to follow problems in the classroom and during exams in real time. This study proposes a posture recognition method based on deep learning BetaPose. The data enhancement technology is used to improve the robustness of the subsequent detection model. The improved YOLOv5 target detection algorithm is used to obtain the human detection frame. Based on the MobileNetV3 model, the lightweight posture recognition model is designed to improve the accuracy of posture recognition in crowded scenes. The keypoints of the human body thus obtained are input into the linear classifier with improved modeling and expression ability to determine the final behavior results. The experimental show that the proposed lightweight posture recognition model BetaPose had the highest average recognition accuracy of 82.6% for various parts of the human body, and the recognition rates for various behaviors in simple and crowded scenes are above 91% and 85%, respectively. Therefore, the proposed model can be effectively recognize multiple behaviors in the classroom.

  • Research Hotspots and Reviews
    Haoyang LI, Xiaowei HE, Bin WANG, Hao WU, Qi YOU
    Computer Engineering. 2024, 50(2): 43-50. https://doi.org/10.19678/j.issn.1000-3428.0066399

    Load prediction is an essential part of cloud computing resource management. Accurate prediction of cloud resource usage can improve cloud platform performance and prevent resource wastage. However, the dynamic and mutative use of cloud computing resources makes load prediction difficult, and managers cannot allocate resources reasonably. In addition, although Informer has achieved better results in time-series prediction, it does not impose restrictions on the causal dependence of time, causing future information leakage. Moreover, it does not consider the increase in network depth leading to model performance degradation. A multi-step load prediction model based on an improved Informer, known as Informer-DCR, is proposed. The regular convolution between attention blocks in the encoder is replaced by dilated causal convolution, such that the upper layer in the deep network can receive a wider range of input information to improve the prediction accuracy of the model, and ensure the causality of the time-series prediction process. Simultaneously, the residual connection is added to the encoder, such that the input information of the lower layer of the network is directly transmitted to the subsequent higher layer, and the deep network degradation is solved to improve the model performance. The experimental results demonstrate that compared with the mainstream prediction models such as Informer and Temporal Convolutional Network(TCN), the Mean Absolute Error(MAE) of the Informer-DCR model is reduced by 8.4%-40.0% under different prediction steps, and Informer-DCR exhibits better convergence than Informer during the training process.

  • Graphics and Image Processing
    Xinlu JIANG, Tianen CHEN, Cong WANG, Chunjiang ZHAO
    Computer Engineering. 2024, 50(1): 232-241. https://doi.org/10.19678/j.issn.1000-3428.0067030

    Intelligent pest detection is an essential application of target detection technology in the agricultural field. This detection method effectively improves the efficiency and reliability of pest detection and reporting work and ensures crop yield and quality. Under fixed-trapping devices such as insect traps and sticky insect boards, the image background is simple, the lighting conditions are stable, and the pest features are significant and easy to extract. Pest detection can achieve high accuracy, but its application scenario is fixed, and the detection range is limited to the surrounding equipment and cannot adapt to complex field environments. A small object pest detection model called Pest-YOLOv5 is proposed to improve the flexibility of pest detection and prediction to address the difficulties and missed detections attributed to complex image backgrounds and small pest sizes in field environments. By adding a Coordinate Attention(CA) mechanism in the feature extraction network and combining spatial and channel information, the ability to extract small object pest features is enhanced. The Bidirectional Feature Pyramid Network(BiFPN) structure is used in the neck connection section, and multi-scale features are combined to alleviate the problem of small object information loss caused by multiple convolutions. Based on this, SIoU and VariFocal loss functions are used to calculate losses, and the optimal classification loss weight coefficients are obtained experimentally, making the model more focused on object samples that are difficult to classify. The experimental results on a subset of the publicly available dataset, AgriPest, show that the Pest-YOLOv5 model has mAP0.5 and recall of 70.4% and 67.8%, respectively, which are superior to those of classical object detection models, such as the original YOLOv5s model, SSD, and Faster R-CNN. Compared with the YOLOv5s model, the Pest-YOLOv5 model improves the mAP0.5, mAP0.50∶0.95, and recall by 8.1%, 7.9%, and 12.8%, respectively, enhancing the ability to detect targets.

  • Research Hotspots and Reviews
    Jian CAO, Yimei CHEN, Haisheng LI, Qiang CAI
    Computer Engineering. 2023, 49(10): 1-12. https://doi.org/10.19678/j.issn.1000-3428.0065984

    Small target detection in complex road scenes can improve the vehicle's perception of the surrounding environment. Thus, it is an important research direction in the field of computer vision and intelligent transportation. With the development of deep learning technology, a combination of deep learning and small target detection on roads can effectively improve detection accuracy, allowing the vehicle to quickly respond to the surrounding environment. Starting with the latest classic research results in small target detection, this research provides two definitions for small targets and analyzes the reasons for the difficulty encountered in small target detection on roads. Subsequently, five types of optimization methods based on deep learning are expounded upon to improve detection accuracy of small targets on roads. The optimization methods include enhanced data, multi-scale strategy, generated Super-Resolution(SR) detail information, strengthened contextual information connection and improved loss function. The core ideas of various methods and the latest research progress at home and abroad are summarized. Large and public datasets commonly used in road small target detection are introduced along with corresponding indicators to evaluate the performance of small target detection. In comparing and analyzing the performance detection results of various methods on different datasets, this research presents the current research on road small target and associated problems, looking forward to future research directions from multiple perspectives.

  • Development Research and Engineering Application
    HU Xinrong, GONG Chuang, ZHANG Zili, ZHU Qiang, PENG Tao, HE Ruhan
    Computer Engineering. 2022, 48(7): 284-291. https://doi.org/10.19678/j.issn.1000-3428.0062392
    To solve the problems of rough clothing edge segmentation, unsatisfactory segmentation accuracy, and insufficient deep semantic feature extraction in clothing image segmentation, the Coordinate Attention(CA) mechanism and Semantic Feature Enhancement Module(SFEM) are embedded into the Deeplab v3+ network, whichfeatures good semantic segmentation performance, and a CA_SFEM_Deeplab v3+ network is proposed for clothing image segmentation in this study.To strengthen the learning of effective features in clothing images, the CA mechanism module is embedded into resnet101, which is the backbone network of the Deeplab v3+ network, and the feature map after convolution pooling is performed on a pyramid with holes is input into the SFEM for feature enhancement.Consequently, the segmentation accuracy improved.Experimental results show that the mean Intersection over Union(mIoU) and Mean Pixel Accuracy(MPA) of the CA_SFEM_Deeplabv3 + network are 0.557 and 0.671, respectively, in the DeepFashion2 dataset, which are 2.1% and 2.3% higher than those of the Deeplab v3 + network, respectively.Compared with the Deeplab v3+ network, the proposedCA_SFEM_Deeplab v3+offersa finer segmentation of the clothing contour and better segmentation performance.
  • Development Research and Engineering Application
    Long SUN, Rongfen ZHANG, Yuhong LIU, Tingli RAO
    Computer Engineering. 2023, 49(9): 313-320. https://doi.org/10.19678/j.issn.1000-3428.0065697

    In dense crowds scenario, dense targets under the monitoring perspective, mutual occlusion, small targets, and face perspective distortion cause problems in mask wearing detection. Meanwhile, public datasets covering incorrectly worn masks are also lacking. Therefore, this paper proposes a mask wearing detection algorithm from a monitoring perspective, MDDC-YOLO, based on the YOLO-v5 improvement. In view of the large proportion of small- and medium-sized targets in dense population, the conventional C3 module in YOLO-v5 is replaced with the MRF-C3 module of the atrous convolutional structure. The anti-occlusion ability of the model is also improved by using Repulsion Loss based on the principle of repulsion attraction of the sample bounding box, and the masking positive sample is fully utilized during the training process. An Efficient Channel Attention(ECA) mechanism is further introduced for optimal selection of feature channels. Finally, to address the lack of mask wearing data in the crowd from a monitoring perspective, an offline data enhancement method based on perspective transformation is proposed. The proposed Mosaic-9 data enhancement generates additional small target samples to address this problem. The experimental results show that the MDDC-YOLO algorithm provides 6.5 percentage points mAP improvement compared with YOLO-v5, thereby reaching a detection speed of 32 frame/s, which satisfies the application requirements of mask-wearing detection in dense populations.

  • Graphics and Image Processing
    CUI Yunxuan, LIU Guihua, YU Dongying, GUO Zhongyuan, ZHANG Wenkai
    Computer Engineering. 2022, 48(7): 254-263. https://doi.org/10.19678/j.issn.1000-3428.0062245
    The multisensor fusion Simultaneous Localization and Mapping(SLAM) system has higher localization accuracy than the single-sensor SLAM system.However, its localization accuracy in low-texture or degraded scenes needs improvement.The Point-Line with lidar-Visual-mono-Inertial tightly coupling SLAM system(PL2VI-SLAM) is proposed.This system comprises two subsystems:the Point-Line with Visual-Inertial System(PLVIS) and the Lidar Inertial System(LIS).The PLVIS subsystem first extracts and matches the point-line features.Next, this subsystem closely couples the inertial measurement unit with a camera to enhance the position by selectively introducing keyframes through sliding windows.LIS integrates multiple constraints into the factor graph joint optimization, and its initial state can be used as the initial guess of PLVIS.The lida rodometry is achieved by scan-matching, and its point cloud depth is associated with the points and feature lines of PLVIS to provide a precise depth value for visual features.These procedures further improve positioning accuracy.Finally, the two subsystems jointly conduct loop-closure to correct the position.The experimental results forthe jackal, handled, and self-made long-corridor data sets show that compared with VINS-MONO, LIO-SAM, and LVI-SAM systems, this system exhibits improved positioning accuracy, can satisfactorily handle low-texture and degraded scenes, and can meet the real-time requirements.
  • Development Research and Engineering Application
    Xinyi ZHANG, Fei ZHANG, Bin HAO, Lu GAO, Xiaoying REN
    Computer Engineering. 2023, 49(8): 265-274. https://doi.org/10.19678/j.issn.1000-3428.0065701

    In dense crowd scenes in public places, face mask wearing detection algorithms have poor detection results because of missing information caused by target occlusion and the problems of small detection targets and low resolution. To improve the detection accuracy and speed of the model as well as to reduce the hardware footprint, an improved mask wearing detection algorithm based on YOLOv5s is proposed. The conventional convolution is replaced with Ghost-Shadowed wash Convolution(GSConv), combining Standard Convolution(SConv)and Depth-Wise separable Convolution(DWConv) with channel blending, thereby improving the network speed with guaranteed accuracy. The nearest neighbor upsampling method is replaced with a lightweight universal upsampling operator to make full use of the semantic feature information. Adaptive Spatial Feature Fusion(ASFF) is added at the end of the neck layer of the improved YOLOv5s model, which allows better fusion of features at different scales and improves the network detection accuracy.In addition, adaptive image sampling is used to alleviate the problem of data imbalance. Mosaic data enhancement is used to make full use of small targets.Experimental results show that the model achieves a mean Average Precision(mAP) value of 93% on the AIZOO dataset, a 2 percentage points improvement over the original YOLOv5 model.It achieves 97.7% detection accuracy for faces wearing masks and outperforms the detection results of the YOLO series, SSD, and RetinaFace in the same situation. It also runs on a GPU with a 16.7 percentage points inference speedup. The model weights file uses 23.5 MB memory for real-time mask wearing detection.

  • Cyberspace Security
    Huawei SONG, Shengqi LI, Fangjie WAN, Yuping WEI
    Computer Engineering. 2024, 50(3): 166-172. https://doi.org/10.19678/j.issn.1000-3428.0067791

    Federated Learning(FL) can collaborate to train global models without compromising data privacy. Nonetheless, this collaborative training approach faces the challenge of Non-IID in the real world; slow model convergence and low accuracy. Numerous existing FL methods improve only from one perspective of global model aggregation and local client update, and inevitably will not cause the impact of the other perspective and reduce the quality of the global model. In this context, we introduce a hierarchical continuous learning optimization method for FL, denoted as FedMas, which is based on the idea of hierarchical fusion. First, clients with similar data distribution are divided into different layers using the DBSCAN algorithm, and only part of clients of a certain layer are selected for training each time to avoid weight differences caused by different data distributions when the server global model is aggregated. Further, owing to the different data distributions of each layer, the client combines the solution of continuous learning catastrophic forgetting during local update to effectively integrate the differences between the data of different layers of clients, thus ensuring the performance of the global model. Experiments on MNIST and CIFAR-10 standard datasets demonstrate that the global model test accuracy is improved by 0.3-2.2 percentage points on average compared with FedProx, Scaffold, and FedCurv FL algorithms.

  • Artificial Intelligence and Pattern Recognition
    Huan WANG, Lijuan SONG, Fang DU
    Computer Engineering. 2023, 49(12): 88-95. https://doi.org/10.19678/j.issn.1000-3428.0066938

    Interactive tasks involving multi-modal data present advanced requirements for the comprehensive utilization of knowledge from different modalities, leading to the emergence of multi-modal knowledge graphs. When constructing these graphs, accurately determining whether image and text entities refer to the same object is particularly important for entity alignment of Chinese cross-modal entities. To address this problem, a Chinese cross-modal entity alignment method based on a multi-modal knowledge graph is proposed. Image information is introduced into the entity alignment task, and a single and dual-stream interactive pre-trained language model, namely CCMEA, is designed for domain-specific, fine-grained images and Chinese text. Utilizing a self-supervised learning method, Text-Visual features are extracted using Text-Visual Encoder, and fine-grained modeling is performed using cross-coders. Finally, a comparison learning method is employed to evaluate the degree of alignment between image and text entities. The experimental results show that the Mean Recall(MR) of the CCMEA model improved by 3.20 and 11.96 percentage points compared to that of the WukongViT-B baseline model on the MUGE and Flickr30k-CN datasets, respectively. Furthermore, the model achieved a remarkable MR of 94.3% on the self-built TEXTILE dataset. These results demonstrate that the proposed method can effectively align Chinese cross-modal entities with high accuracy in practical applications.

  • Frontiers in Computer Systems
    Yanfei FANG, Qi LIU, Enming DONG, Yanbing LI, Feng GUO, Di WANG, Wangquan HE, Fengbin QI
    Computer Engineering. 2023, 49(12): 10-24. https://doi.org/10.19678/j.issn.1000-3428.0066548

    Manycore has become the mainstream processor architecture for building HPC supercomputer systems, providing powerful computing power for High Performance Computing(HPC) exascale supercomputers. With the increasing number of cores integrated on manycore processor chips, the competition for large-scale cores for memory resources has become more intense. Manycore on-chip memory hierarchy is an important structure that alleviates the "memory wall" problem, aids HPC applications better play the computing advantages of manycore processors, and improves the performance of practical applications. The design has a significant impact on the performance, power consumption, and area of an on-chip system. The design of a many-call on-chip memory hierarchy has a significant impact on the performance, power consumption, and area of manycore systems. It is an important part of the structural design of manycore systems and is a research interest in the industry. Owing to the differences in the development history of manycore chips, the design technology of on-chip microarchitecture, and the different requirements of the application fields, the current HPC mainstream manycore on-chip storage hierarchy is different; however, from the perspective of horizontal comparison and the vertical development trend of each processor, as well as from the changes in application requirements brought by the continuous integration and development of HPC, data science, and machine learning, the hybrid structure of the SPM+Cache would most likely become the mainstream choice for the on-chip storage hierarchy designs of manycore processors in HPC exascale supercomputer systems in the future. For exascale computing software and algorithms, the designs and optimization based on the characteristics of the manycore memory hierarchy can aid HPC applications benefit from the computing advantages of manycore processors, thus effectively improving the performance of practical applications. Therefore, software, algorithm design, and optimization technology for the characteristics of the manycore on-chip storage hierarchy is also a research interest in the industry. This study first partitioned the on-chip memory hierarchy into multilevel Cache, SPM, and SPM+Cache hybrid structures according to different organizations, and then summarized and analyzed the advantages and disadvantages of these structures. This study analyzed the current status and development trend of the memory hierarchy designs of the chips of mainstream exascale supercomputer systems, such as the international mainstream GPU, homogeneous manycore, and domestic manycore. In summary, the research status of software and hardware technologies is related to the design and optimization of the memory hierarchy from the manycore of the manycore LLC management and cache consistency protocol, SPM management and data movement optimization, and the global perspective optimization of the SPM+cache hybrid architecture. Thus, this study looks forward to the future research direction of on-chip memory hierarchy based on different perspectives, such as hardware, software, and algorithm designs.

  • Graphics and Image Processing
    Fangyu FENG, Xiaoshu LUO, Zhiming MENG, Guangyu WANG
    Computer Engineering. 2023, 49(8): 190-198. https://doi.org/10.19678/j.issn.1000-3428.0065224

    As it is difficult to extract effective features in facial expression recognition and the high similarity between categories and easy confusion lead to low accuracy of facial expression recognition, a facial expression recognition method based on anti-aliasing residual attention network is proposed. First, in view of the problem that the traditional subsampling method can easily cause the loss of expression discriminative features, an anti-aliasing residual network is constructed to improve the feature extraction ability of expression images and enhance the representation of expression features, enabling more effective global facial expression information to be extracted.At the same time, the improved channel attention mechanism and label smoothing regularization strategy are used to enhance the attention to the local key expression regions of the face: the improved channel attention focuses on the highly discriminative expression features and suppresses the weight of non-expressive regions, so as to locate more detailed local expression regions in the global information extracted by the network, and the label smoothing technology corrects the prediction probability by increasing the amount of information of the decision-making expression category, avoiding too absolute prediction results, which reduces misjudgment between similar expressions. Experimental results show that, the recognition accuracies of this method on the facial expression datasets RAF-DB and FERPlus reach 88.14% and 89.31%, respectively.Compared with advanced methods such as DACT and VTFF, this method has better performance. Compared with the original residual network, the accuracy and robustness of facial expression recognition are effectively improved.

  • Graphics and Image Processing
    Jianwei LI, Xiaoqi LÜ, Yu GU
    Computer Engineering. 2023, 49(10): 239-246, 254. https://doi.org/10.19678/j.issn.1000-3428.0066050

    Skin cancer is one of the deadliest cancers, and it is particularly critical to accurately classify dermoscopy images. However, the existing dermoscopy images have complex shapes and a small number of samples, which makes it difficult for the existing automatic classification methods to extract image feature information; these methods also have a high error rate. To solve this problem, this paper proposes an improved ConvNeXt method and build, SE-SimAM-ConvNeXt model. First, with ConvNeXt as the basic network, the SimAM nonparametric attention module is added to improve the network's feature extraction capability. Second, channel attention is added to the basic network to enhance the mining ability of ConvNeXt for potential key features. Finally, the Cosine Warmup mechanism is added at the beginning of training, and the cosine function value is used to attenuate the learning rate during the process, further accelerating the convergence of ConvNeXt and improving the classification ability of the ConvNeXt model. The experimental results on the HAM10000 skin dataset show that the classification accuracy, precision, recall, and specificity of the model reach 92.9%, 85.3%, 78.0%, and 97.5%, respectively, and is demonstrated effective classification capability for dermoscopy images. This bears significant potential in aiding the auxiliary diagnosis of skin cancer lesions, providing valuable assistance to dermatologists in making accurate diagnoses of skin cancer.

  • Artificial Intelligence and Pattern Recognition
    Zhangjie RAN, Linfu SUN, Yisheng ZOU, Yulin MA
    Computer Engineering. 2023, 49(9): 52-59. https://doi.org/10.19678/j.issn.1000-3428.0065745

    A Knowledge Graph(KG) is composed of a large number of fact triples, which often contain a large number of few-shot relations that rarely appear in the real world. For these few-shot relations, it is challenging to complete the missing triples in the KG, and existing few-shot Knowledge Graph Completion(KGC) models cannot effectively extract the representation of few-shot relations. To address this problem, a few-shot KGC model based on a relation learning network is proposed. Considering the relevance of the relations, neighbor aggregation encoding is performed on the reference and query triples to obtain an enhanced entity embedding representation. The structure that integrates a Transformer encoder and Long Short-Term Memory(LSTM) neural network, allows the relation representation of triples to be encoded and output. The semantic similarity between query and dynamic reference relations is obtained using the attention mechanism and combined with the hypothesis of the translation model, whereby the possibility of establishing query triples is comprehensively scored. The experimental results show that the model can effectively extract the fine-grained semantics of few-shot relations by integrating path-finding and context semantics. Compared with the optimal value of the evaluation metrics in baseline models, the average improvement of few-shot link prediction tasks reach 9.5 percentage points with the proposed model.

  • Research Hotspots and Reviews
    Xingxing DONG, Jixun GAO, Xiaotong WANG, Song LI
    Computer Engineering. 2023, 49(9): 1-15. https://doi.org/10.19678/j.issn.1000-3428.0064822

    As indispensable components of spatial relations, spatial directional relations are widely used in many fields such as urban intelligent traffic control, environmental resource detection, and disaster prevention and reduction. Spatial directional relations represent a significant and challenging issue in fields such as geographic information systems, spatial database, artificial intelligence, and pattern recognition.This study conducts a comprehensive analysis and comparison of existing spatial directional relationship expression and inference models.First, the research progress on current models for directional relations between objects in two-dimensional space are introduced in detail in terms of single and group target objects.In addition, the characteristics, advantages, and disadvantages of the current models for directional relations in three-dimensional space are analyzed from point to block.The study expounds the current research status of models that use uncertainty directional relations from the two aspects of extended models based on those that use precision objects for directional relations and models based on uncertainty set theory for uncertain objects. The study then discusses the advantages, drawbacks, and applicable fields for each type of model.Finally, the shortcomings of current research are explained, and the future research directions of spatial orientation relations are prospected in terms of automatic reasoning technology, joint representation of spatial relations, and group target objects.

  • Development Research and Engineering Application
    Shui HU
    Computer Engineering. 2023, 49(9): 303-312. https://doi.org/10.19678/j.issn.1000-3428.0067067

    Wargame deduction is an important method for cultivating modern military commanders. Introducing artificial intelligence technology in wargame deduction can simplify organizational processes and improve deduction efficiency. Owing to the complex situational information and incomplete inference information, intelligent wargame based on machine learning often reduces the sample efficiency of autonomous decision-making models. This paper proposes an intelligent wargame deduction decision-making method based on deep reinforcement learning. In response to the efficiency issue of intelligent wargame deduction and combat decision-making, a baseline is introduced into the strategy network, and the training of the policy network is accelerated. Subsequently, derivation and proof are presented, and a method for updating the parameters of the policy network after adding the baseline is proposed. The process of introducing the state-value function in the wargame deduction environment into the model is analyzed. Construct a Low Advantage Policy-Value Network(LAPVN) model and its training framework for wargame deduction under traditional policy-value networks, and construct the model using battlefield situational awareness methods. In a wargame combat experimental environment that approximately conforms to military operational rules, the traditional policy-value network and LAPVN are compared for training. In 400 self-game training sessions, the loss value of the LAPVN model decreases from 5.3 to 2.3, and the convergence is faster than that of the traditional policy-value network. The KL divergence of the LAPVN model is very close to zero during the training process.

  • Research Hotspots and Reviews
    Jinsheng CHEN, Wenzhen MA, Shaofeng FANG, Ziming ZOU
    Computer Engineering. 2023, 49(11): 13-23. https://doi.org/10.19678/j.issn.1000-3428.0066521

    With the construction of the Meridian Project all-sky airglow imager observation network, a large amount of raw airglow image data has been accumulated. The current atmospheric gravity wave research based on airglow observation is extremely dependent on manual identification, which is very time-consuming, and the quality of labeling is difficult to guarantee. Therefore, there is an urgent need for a fast and effective automatic identification method. To solve the problem of sparsely labeled samples of atmospheric gravity waves, this paper proposes an algorithm based on the improved Cycle GAN model to expand the atmospheric gravity wave airglow observation dataset, thereby greatly improving the recognition accuracy of atmospheric gravity waves by labeling only a small number of samples. A new intelligent recognition algorithm for atmospheric gravity waves is also proposed by improving the YOLOv5s model backbone network and bounding box prediction, considering the characteristics of low Signal-to-Noise Ratio(SNR) between the recognition target and background in airglow images. The experimental results showed that using the augmented dataset and improved YOLOv5s target detection algorithm, the average precision reached 75.8% under an Intersection-over-Union(IoU) threshold of 0.5, which is 9.7 percentage points higher than that of the original model. Meanwhile, the detection speed and average recognition accuracy are superior to mainstream target detection algorithms compared.

  • Development Research and Engineering Application
    Lumeng CHEN, Yanyan CAO, Min HUANG, Xingang XIE
    Computer Engineering. 2023, 49(8): 291-301, 309. https://doi.org/10.19678/j.issn.1000-3428.0065025

    The existing image-based flame detection approach finds it challenging to balance real-time and precision, and it is incapable of accurately identifying small flame targets, making it ineffective for application situations such as small fire extinguishing. In terms of real-time detection, the YOLOv5 algorithm provides significant benefits over conventional techniques. A real-time flame detection method based on improved YOLOv5 is proposed to increase flame detection accuracy. First, to help the model locate the flame features more accurately, a coordinate attention mechanism module is embedded in the feature extraction portion of the YOLOv5 model.This module can reduce feature redundancy without sacrificing the feature information. Second, to help the model successfully obtain flame features with a receptive field smaller than 8×8 pixels, a detection layer specifically designed for small flame targets is added to the feature fusion portion of the algorithm along with the corresponding feature extraction and feature fusion modules. Finally, to increase the model's speed of convergence and robustness to small datasets, α-CIoU is employed as a new bounding box loss function in the computation phase of the loss function.Additionally, model pretraining and transfer learning techniques are used to initialize the weight parameters of each layer structure of the flame detection model to prevent the gradient from dissipating and enhance the training effect. According to the experimental findings, the proposed flame detection model shows an accuracy rate of 96.6%, which is 7.4 percentage points higher than that of the YOLOv5 original model.Additionally, the detection speed of this model is 68 frame/s, and its size is only 15.4 MB. On the basic of significantly improving accuracy, it can also meet the requirements of firefighting robots for real-time and lightweight flame detection.

  • Research Hotspots and Reviews
    Hongpeng LI, Bo MA, Yating YANG, Lei WANG, Zhen WANG, Xiao LI
    Computer Engineering. 2023, 49(9): 23-31. https://doi.org/10.19678/j.issn.1000-3428.0066170

    Event extraction aims to recognize and extract event information from unstructured natural language texts in a structured form.Traditional methods extract events at the sentence level, relying on massive labeled data for training, which are unqualified for document-level event extraction and lack performance in low-resource scenarios.Existing research utilizes prompt learning methods to achieve document-level event extraction by filling in template slots.However, traditional prompt template slots have low accuracy in classifying argument roles, which can easily lead to errors in argument role extraction.To address the above issues, this paper proposes a document-level event extraction method based on slot semantic enhancement prompt learning.Based on the prompt learning method, the argument role semantic information in the traditional event extraction paradigm is integrated into the slot of the prompt template, providing argument type constraints for the slot prediction generation process of the model and improving the accuracy of document-level event extraction.By keeping the upstream and downstream tasks of the pretrained language model consistent, the generalization ability of the model is improved, and knowledge transfer is achieved at a lower cost to improve model performance in low-resource event extraction scenarios.Experimental results show that compared to the traditional baseline method with suboptimal performance, this method achieved an F1 score improvement of 2.6, 2.9, and 4.0 percentage points on an English event extraction dataset containing 59 argument types, Chinese dataset containing 92 argument types, and low-resource data scale, respectively.

  • Graphics and Image Processing
    ZHANG Aihan, LIU Xiang, SHI Yunyu, LIU Siqi
    Computer Engineering. 2022, 48(7): 277-283. https://doi.org/10.19678/j.issn.1000-3428.0061913
    As the smartphones and 5G networks have become increasingly popular, short videos have become the medium through which people to acquire knowledge in a short time.Inspired by the shortage of short video datasets in real-life scenarios and low accuracy of short video classification, this study proposes a dual-process short video classification method integrating the deep learning technology.In the main process, a A-VGG-3D network model is constructed.Then, a VGG network with an attention mechanism is used to extract features, while the optimized 3D Convolutional Neural Network(3DCNN) is used for short video classification, which can improve the continuity, balance, and robustness of short videos in the temporal dimension.In the auxiliary process, the frame difference method is used to conduct shot switching to extract several frames from the short videos.Then, multi-scale face detection is performed on the extracted frames by integrating the sliding window mechanism and cascade classifier, which can further improve the short video classification accuracy.The experimental results demonstrate that the precision and recall of this method for non-plot and non-interview short videos on the UCF101 dataset and a self-built short video dataset of life scenes are 98.9% and 98.6%, respectively.Compared with the short video classification method based on a C3D network, the classification accuracy of the proposed method on the UCF101 dataset is 9.7 percentage points higher, which signifies that the proposed method more universally accurate.
  • Artificial Intelligence and Pattern Recognition
    Lu HAN, Weigang HUO, Yonghui ZHANG, Tao LIU
    Computer Engineering. 2023, 49(9): 99-108. https://doi.org/10.19678/j.issn.1000-3428.0065846

    Each subsequence of the Multivariate Time Series(MTS) contains multi-scale characteristics of different time spans, comprising information such as development process, direction, and trend. However, existing time series prediction models cannot effectively capture multi-scale features and evaluate their importance. In this study, a MTS prediction network, FFANet, is proposed based on multi-scale temporal feature fusion and a Dual-Attention Mechanism(DAM).FFANet effectively integrates multi-scale features and focuses on important parts.Utilizing the parallel temporal dilation convolution layer in the multi-scale temporal feature fusion module endows the model with multiple receptive domains to extract features of temporal data at different scales and adaptively fuse them based on their importance. Using a DAM to recalibrate the fused temporal features, FFANet focuses on features that make significant contributions to prediction by assigning temporal and channel attention weights and weighting them to the corresponding temporal features. The experimental results show that compared with AR, VARMLP, RNN-GRU, LSTNet-skip, TPA-LSTM, MTGNN, and AttnAR time series prediction models, FFANet achieves average reduction of 0.152 3、0.120 0、0.074 3、0.035 4、0.021 5、0.012 1、0.020 0 in RRSE prediction error on Traffic, Solar Energy, and Electricity datasets, respectively.

  • Research Hotspots and Reviews
    SHEN Jiquan, LIN Shuai, LI Zhiying
    Computer Engineering. 2022, 48(7): 22-28. https://doi.org/10.19678/j.issn.1000-3428.0062160
    Measures of user influence are at the core of the influence maximization problem.As these measures relate to network topology, they may be classified as global indicators or local indicators.Global indicators rely on the complete network topology to calculate the influence of nodes, and incur a high time complexity.In contrast, local indicators generally ignore or weaken the self-loop and multilateral phenomena in the network, resulting in an incomplete measurement of the influence of nodes, which affects the final dissemination range of information.Using the principle of three degrees of separation, an influence maximization algorithm based on local domain is proposed in this study.This algorithm first constructs a generated graph by making full use of the edges, starting at and ending with the same interconnected nodes.Then, it generates a local domain for each node and approximates each node's global influence, calculated by the topology of its local domain.Finally, the algorithm conducts seed selection by considering the influence of each node, as well as the influence overlap ratio factor of a seed set.Experimental results on real datasets show that, compared with algorithms such as MaxDegree and PageRank, the proposed algorithm can effectively identify high-influence node groups, improve the scope of information dissemination, and incur lower time complexity.
  • Graphics and Image Processing
    Hong ZHAO, Yubo FENG
    Computer Engineering. 2023, 49(12): 194-204. https://doi.org/10.19678/j.issn.1000-3428.0066520

    In tasks involving traffic sign detection, the YOLOv5 detection algorithm encounters several issues including missed detections, erroneous detections, and a complex model in complex environments and road conditions. To address these challenges, an improved CGS-Ghost YOLO detection model is proposed. YOLOv5 uses the focus module for sampling, which introduces more parameters. In this study, the StemBlock module is used to replace the focus module for sampling after input, which can reduce the number of parameters while maintaining the accuracy. CGS-Ghost YOLO uses a Coordinate Attention(CA) mechanism, which improves the semantic and location information within the features and enhances the feature extraction ability of the model. Additionally, a CGS convolution module, which combines the SMU activation function with GroupNorm(GN) normalization, is proposed. The CGS convolution module is designed to avoid the influence of the batch Size on the model during training and improve model performance. This study aims to use GhostConv to reduce the number of model parameters and effectively improve the detection accuracy of the model.The loss function, $ \alpha $-CIoU Loss+VFocal Loss, is used to solve the problem of unbalanced positive and negative samples in traffic sign detection tasks and improve the overall performance of the model. The neck part uses a Bi-FPN bidirectional feature pyramid network, ensuring that the multi-scale features of the detection target are effectively fused. The results of an experiment on the TT100K traffic sign detection dataset show that the detection accuracy of the improved CGS-Ghost YOLO model reaches 93.1%, which is 11.3 percentage points higher than the accuracy achieved by the original model. Additionally, the proposed network model reduces the model parameter quantity by 21.2 percentage points compared to the original model. In summary, the network model proposed in this study optimizes the convolution layer and the downsampling part, thus considerably reducing the model parameters while enhancing the model detection accuracy.

  • Development Research and Engineering Application
    SUN Wei, CHANG Pengshuai, DAI Liang, ZHANG Xiaorui, CHEN Xuan, DAI Guangzhao
    Computer Engineering. 2022, 48(7): 300-306. https://doi.org/10.19678/j.issn.1000-3428.0062096
    Vehicle type recognition plays an important role in intelligent transportation systems.Owing to the lack of vehicle data and small differences between vehicle classes, traditional vehicle type recognition do not make full use of the features of the vehicle discriminant area, resulting in a reduction in recognition accuracy.This study proposes a vehicle type recognition method based on attention guided data augmentation.In this method, ResNet-50 is used as the backbone network to extract vehicle features.Simultaneously, a Coordinate Attention(CA) module is embedded behind each residual block of the network to encode a pair of direction-aware and position-sensitive attention diagrams to enhance the feature representation of the vehicle discriminant area.On this basis, the bilinear attention-gathering operation is used to effectively obtain the enhanced feature image.Through attention cropping and erasure of the enhanced feature map, enhanced data with strong discrimination are obtained.The results on the Stanford Cars vehicle dataset verify the effectiveness of this method.The results showed that the accuracy of vehicle type recognition of this method reaches 94.86%.Compared with RA-CNN, MA-CNN, WS-DAN+Inception-v3, and other methods, it can effectively improve the accuracy of vehicle type recognition and efficiency of data augmentation.
  • Cyberspace Security
    JIN Haibo, ZHAO Xinyue
    Computer Engineering. 2022, 48(7): 130-140. https://doi.org/10.19678/j.issn.1000-3428.0063144
    Intrusion detection algorithms are widely used in the field of network security.However, existing intrusion detection algorithms based on machine learning only output prediction result labels for data and lack evaluation mechanisms for the confidence value of prediction results, making it difficult to ensure the reliability of results.This study proposes a high-reliability intrusion detection algorithm based on Conformal Prediction(CP).CP is integrated into a traditional machine learning algorithm to obtain data classification labels and corresponding confidence values to improve the reliability of network data classification.By digitalization, standardization and reducing the dimensionality of network data according to the characteristics of traditional machine learning algorithms, an inconsistent score calculation formula under the CP framework is designed and a smoothing factor is introduced to improve the calculation of p-value.The improved p-value calculation formula can calculate the p-value of prediction results smoothly and improve the overall stability of the algorithm.Experimental results demonstrate that compared to the SVM, DT and DT-SVM algorithms alone, the classification accuracy of the proposed algorithm on the KDD CUP99 dataset is improved by 11.1, 4.6, and 3.7 percentage points, respectively, and that on the AWID dataset is improved by 4.0, 2.5, and 1.3 percentage points, respectively, which ensures the high-reliability of intrusion detection results.
  • Research Hotspots and Reviews
    Jinshuo LIU, Daichen WANG, Juan DENG, Lina WANG
    Computer Engineering. 2023, 49(8): 13-19, 28. https://doi.org/10.19678/j.issn.1000-3428.0067003

    Currently, most existing methods for classifying harmful information on Internet overlook imbalanced data and long-tailed distributions, biasing the model towards more numerous data samples during classification. This makes them unable to effectively identify small data samples, which results in a decrease in overall recognition accuracy. To address this issue, a classification method LTIC for long-tailed harmful information datasets is proposed. By integrating few-shot learning with knowledge transfer strategies, the BERT model is used to learn the weights of the head class. The prototype of the head class is obtained through a Prototyper network specifically designed for few-shot learning.This design allows for the processing of head and tail data separately, thereby avoiding the data imbalance caused by mutual training. Researchers then use the mapping relationship learned from the prototype to convert the prototype of the tail class into weights. Subsequently, the head and tail class weights are combined to obtain the final classification result. In experiments, the LTIC method achieves classification accuracies of 82.7% and 83.5% on the Twitter and THUCNews datasets, respectively. This method also significantly improves the F1 value compared to the non-long tailed model, thus effectively improving classification accuracy. When compared with the latest classification methods such as BNN and OLTR, this method exhibits superior classification performance on long-tailed datasets, with an average accuracy improvement of 3%. When new categories of harmful information emerge, the LTIC method demonstrates the capability to predict them with minimal computation, achieving an accuracy of 70% and showcasing impressive scalability.

  • Graphics and Image Processing
    Xieliu YANG, Guowen MEN, Wenfeng LIANG, Dan WANG, Zhengyi XIE, Huijie FAN
    Computer Engineering. 2023, 49(11): 247-256. https://doi.org/10.19678/j.issn.1000-3428.0066610

    Due to the particularity of the underwater environment, underwater optical images often suffer from degradation issues such as color cast, blur, and low contrast. To restore these underwater images to their natural and clear colors, numerous methods for enhancing and restoring underwater images have been proposed. However, the existing underwater image enhancement restoration techniques primarily focus on improving the visual quality of underwater images. Their impact on the accuracy of underwater object detection using deep learning methods remains uncertain. Therefore, this study conducts a detailed and comprehensive exploration of the influence of fourteen typical underwater image enhancement and restoration methods and three common deep learning-based object detection models on the accuracy of deep learning-based object detection models. The analysis includes URPC2018 and URPC2019 datasets and considers factors such as domain variations between training and testing sets, the number of domains in the training set, the quantity of images in the training set, and self-created datasets for cross-dataset testing. The experimental results show that when both the training and test sets belong to the same dataset, underwater image enhancement and restoration methods, whether used as image preprocessing methods or data enhancement methods, do not significantly improve the detection accuracy of deep learning objects. However, when detecting across datasets, using underwater image enhancement and restoration methods can significantly enhance the detection accuracy of deep learning objects, with mAP increasing by up to 13.6 percentage points.

  • Research Hotspots and Reviews
    YE Mao, MA Jie, WANG Qian, WU Lin
    Computer Engineering. 2022, 48(7): 42-50. https://doi.org/10.19678/j.issn.1000-3428.0062231
    Standardized usage of face masks is effective as a non-pharmaceutical intervention to prevent the spread of infectious respiratory diseases, such as COVID-19 and influenza.In the current epidemic situation, wearing face masks correctly is especially important.Most existing mask-wearing detection algorithms involve problems such as complex structures, high training difficulty, and insufficient feature extraction.Therefore, this study proposes a lightweight mask-wearing detection algorithm based on multi-scale feature fusion and the YOLOv4-Tiny network, called L-MFFN-YOLO.L-MFFN-YOLO improves on the original residual structure and uses a lightweight residual module to promote rapid convergence.Moreover, it reduces the computational load while ensuring detection accuracy. Based on the original network's 13×13 and 26×26 feature maps, 52×52 feature branches are added to enhance the ability of the lower feature layer to express information and reduce the false negative rate for small targets.On this basis, a Multi-level Cross Fusion(MCF) structure is used to maximally extract useful information so as to improve feature utilization.In addition to detecting mask-wearing, a category of masks worn incorrectly is added to the dataset and manually labeled.The experimental results show that the size of the proposed L-MFFN-YOLO model is only 5.8 MB, which is 76% smaller than that of the original YOLOv4-Tiny.Moreover, the mean Average Precision(mAP) of the proposed approach is 5.25 percentage points higher, and its processing time is 14 ms faster on an equivalent CPU.These results demonstrate that the proposed approach can meet the requirements of accuracy and real-time operation in resource-constrained devices to detect faces wearing masks.
  • Research Hotspots and Reviews
    Bin YANG, Yitong WANG
    Computer Engineering. 2023, 49(10): 13-21. https://doi.org/10.19678/j.issn.1000-3428.0065807

    Heterogeneous Information Network(HIN) typically contains different types of nodes and interactions. Richer semantic information and complex relationships have posed significant challenges to current representation learning in HINs. Although most existing approaches typically use predefined meta-paths to capture heterogeneous semantic and structural information, they suffer from high cost and low coverage. In addition, most existing methods cannot precisely and effectively capture and learn influential high-order neighbor nodes. Accordingly, this study attempts to address the problems of meta-paths and influential high-order neighbor nodes with a proposed original HIN-HG model. HIN-HG generates a hyperadjacency graph of the HIN, precisely and effectively capturing the influential neighbor nodes of the target nodes. Then, convolutional neural networks are adopted with a multichannel mechanism to aggregate different types of neighbor nodes under different relationships. HIN-HG can automatically learn the weights of different neighbor nodes and meta-paths without manually specifying them. Meanwhile, nodes similar to the target node can be captured in the entire graph as higher-order neighbor nodes and the representation of the target node can be effectively updated through information propagation. The experimental results of HIN-HG on three real datasets-DBLP, ACM, and IMDB demonstrate the improved performance of HIN-HG compared with state-of-the-art methods in HIN representation learning, including HAN, GTN, and HGSL. HIN-HG exhibits improved accuracy of node classification by 5.6 and 5.7 percentage points on average in the multiple classification evaluation indices Macro-F1 and Micro-F1, respectively, thus improving the accuracy and effectiveness of node classification.

  • Graphics and Image Processing
    Xianguo LI, Bin LI
    Computer Engineering. 2023, 49(9): 226-233, 245. https://doi.org/10.19678/j.issn.1000-3428.0065513

    Convolutional Neural Network(CNN) has limitations when applied solely to image deblurring tasks with restricted receptive fields.Transformer can effectively mitigate these limitations.However, the computational complexity increases quadratically as the spatial resolution of the input image increases.Therefore, this study proposes an image deblurring network based on Transformer and multi-scale CNN called T-MIMO-UNet. The multi-scale CNN is used to extract spatial features while the global feature of the Transformer is employed to capture remote pixel information.The local enhanced Transformer module, local Multi-Head Self-Attention(MHSA) computing network, and Enhanced Feed-Forward Network(EFFN) are designed.The block-by-block MHSA computation is performed using a windowing approach. The information interaction between different windows is enhanced by increasing the depth of the separable convolution layer.The results of the experiment conducted using the GoPro test dataset demonstrate that the Peak Signal-to-Noise Ratio(PSNR) of the T-MIMO-UNet increases by 0.39 dB, 2.89 dB, 3.42 dB, and 1.86 dB compared to the MIMO-UNet, DeepDeblur, DeblurGAN, and SRN networks, respectively.Additionally, the number of parameters is reduced by 1/2 compared to MPRNet.These findings prove that the T-MIMO-UNet effectively addresses the challenge of image blurring in dynamic scenes.

  • Evolutionary and Swarm Intelligence Algorithm and Application
    Rong FEI, Mengyang MA, Xiao ZHANG, Xinhong HEI, Qingzheng XU, Yuan QIU
    Computer Engineering. 2023, 49(7): 10-20. https://doi.org/10.19678/j.issn.1000-3428.0066975

    In autonomous driving, trajectory prediction and collision detection are the key technologies that can improve the perception ability of the autonomous driving system for the surrounding environment and ensure driving safety. The Conv-LSTM model displays good trajectory prediction ability, effectively processing trajectory data with spatio-temporal correlation. However, the predictive ability of this model is relatively weak in complex situations, such as traffic congestion and complex roads. Therefore, this study proposes a trajectory prediction model for driving intention identification based on Long Short-Term Memory(LSTM) network.The trajectory prediction model is constructed based on Conv-LSTM and uses the identified driving intention information to predict future trajectories, improving the accuracy and interpretability of trajectory prediction. In addition, two attention mechanisms are introduced to analyze the importance of the historical trajectory information of the target object and surrounding vehicles, which enables the model to focus on the most representative neighboring vehicles to better capture the relationships between different time steps.In addition, a collision detection algorithm based on hybrid bounding box is proposed. In this algorithm, collision is pre-judged based on the proposed minimum safe distance and maximum collision distance to avoid creation of an oriented bounding box during collision detection in non-conflict situations, thus improving the efficiency of collision detection while ensuring detection accuracy. The NGSIM dataset is used for model performance verification and the results show that the Root Mean Square Error(RMSE) of the proposed model is lower than that of Conv-LSTM, sys-Conv, and other models, indicating that the trajectory prediction accuracy of the proposed model is higher. Compared with the Oriented Bounding Box(OBB), Axis-Aligned Bounding Box(AABB), and AABB-OBB algorithms, the average collision detection time is reduced by 64.47%, 53.88%, and 55.47% respectively, using the proposed algorithm based on hybrid bounding box.

  • Computer Engineering. 2022, 48(5): 0-0.
  • Graphics and Image Processing
    LI Haomin, LI Guangping
    Computer Engineering. 2022, 48(7): 247-253. https://doi.org/10.19678/j.issn.1000-3428.0062364
    Many deep learning-based image super-resolution reconstruction algorithms improve the overall feature expression ability of a network by extending the depth of the network.However, excessively extending the depth of the network causes the model to be over-parameterized and complicated.Furthermore, redundant parameters increase the instability of feature expression.To address this issue, based on the LTH pruning algorithm, the weight parameters are changed and the balanced learning strategy is used, this paper proposes a neural network unstructured pruning algorithm which is suitable for image super-resolution reconstruction tasks, called the RLTH pruning algorithm.Without changing the network structure and increasing the computational complexity, the overall feature expression ability of the network is improved by searching for an optimal yet sparse sub-network of the original network, which excludes the influence of redundant parameters and maximizes the ability of capturing fine-grained and richer features with limited parameters.The experimental results based on Set5, Set14 and BSD100 test sets show that, compared with the original network model and LTH pruning algorithm, the PSNR and SSIM of the reconstructed images obtained by RLTH algorithm are improved, and they have richer detail features and clearer overall and local contours.
  • Graphics and Image Processing
    Wenshun SHENG, Xiongfeng YU, Jiayan LIN, Xin CHEN
    Computer Engineering. 2024, 50(1): 242-250. https://doi.org/10.19678/j.issn.1000-3428.0066724

    A modified Faster R-CNN algorithm is proposed to address the problem of poor detection ability for small-scale objects and occluded or truncated objects, combining the CBAM mechanism and feature pyramid structure. To focus on the efficient use of local information in feature images, the CBAM mechanism is integrated into the feature extraction network to reduce the interference of invalid targets and improve the detection ability, notwithstanding occluded or truncated objects. This introduces a Feature Pyramid Network(FPN) structure to connect high- and low-level feature data, obtaining high-resolution and strong semantic data, thereby enhancing the detection effect of small objects. To alleviate the phenomenon of gradient vanishing and reduce the scale of hyperparameters, the commonly used VGG16 network is replaced with a strong expressive ability of the inverse residual VS-ResNet network. VS-ResNet modifies some hierarchical structures based on the original ResNet 50, adds auxiliary classifiers, designs inverse residual and group convolution methods, such that the activation function information is fully preserved in high-dimensional environments, and improves detection accuracy. The reset candidate box score calculation method is used to compensate for the defect of the Non-Maximum Suppression(NMS) algorithm in mistakenly eliminating overlapping detection boxes. The experimental results demonstrate that compared to VGG16, VS-ResNet has a 2.97 percentage points improvement in accuracy on the CIFAR-10 dataset. The target detection mAP value of the proposed algorithm on the Pascal VOC 2012 dataset is 76.2%, which is 13.9 percentage points higher than that of the original Faster R-CNN algorithm.