Author Login Editor-in-Chief Peer Review Editor Work Office Work

Most download

  • Published in last 1 year
  • In last 2 years
  • In last 3 years
  • All
  • Most Downloaded in Recent Month
  • Most Downloaded in Recent Year

Please wait a minute...
  • Select all
    |
  • Computer Engineering. 2024, 50(2): 0-0.
  • Research Hotspots and Reviews
    Chang WANG, Leixiao LI, Yanyan YANG
    Computer Engineering. 2023, 49(11): 1-12. https://doi.org/10.19678/j.issn.1000-3428.0066661

    The fatigue driving detection method based on computer vision has the advantage of being noninvasive and does not affect driving behavior, making it easy to apply in practical scenarios.With the development of computer technology, an increasing number of researchers are studying fatigue driving detection methods based on computer vision. Fatigue driving behavior is mainly reflected in the face and limbs. Furthermore, in the field of computer vision, facial behavior is easier to obtain than physical behavior. Therefore, facial-feature-based fatigue driving detection methods have become an important research direction in the field of fatigue driving detection. Various fatigue driving detection methods are analyzed comprehensively based on multiple facial features of drivers, and the latest research results worldwide are summarized.The specific behaviors of drivers with different facial features under fatigue conditions are introduced, and the fatigue driving detection process is discussed based on multiple facial features. Results from research conducted worldwide are classified based on different facial features, and different feature extraction methods and state discrimination methods are classified. The parameters used to distinguish driver fatigue status are summarized based on the various behaviors generated by different features in a state of fatigue. Furthermore, current research results on the use of facial multi-feature comprehensive discrimination for fatigue driving are described, and the similarities and differences of different methods are analyzed. On this basis, the shortcomings in the current field of fatigue driving detection based on facial multi-feature fusion are discussed, and future research directions in this field are described.

  • Evolutionary and Swarm Intelligence Algorithm and Application
    Jing MEI, Longbao DAI, Zhao TONG, Xin DENG, Jiake WANG
    Computer Engineering. 2023, 49(7): 34-46. https://doi.org/10.19678/j.issn.1000-3428.0067064

    In Mobile Edge Computing (MEC), computing, storage, network resources, and services are relegated to the edge of the network, allowing users to process tasks on edge servers. However, in practical applications, the amount of task offloading requested by users and the state of wireless channels constantly change over time, which poses challenges to task offloading and resource allocation, as computing and communication resources are limited. To address this issue, a system utility optimization model is constructed to handle dynamic environments under resource constraints. In this model, each terminal device is equipped with energy harvesting devices, fully utilizing external renewable energy to support system processing tasks.The dependence of optimization models on uncertain information is eliminated using Lyapunov optimization theory, whereby the original optimization objective that relies on unknown information is transformed into an optimization problem that only relies on the current time slice system information. This optimization is an NP-hard problem, which is proved using the reduction method.Subsequently, Deep Reinforcement Learning(DRL) method is used to design an adaptive unloading algorithm, LyUO, under resource constraints. When the system dynamic information is unknown, this algorithm is responsible for the near optimal task unloading decision and the MEC server communication/computing resource allocation strategy in real time.The simulation results show that the LyUO algorithm can keep the task queue stable for all devices in the system even when the task arrival rate and wireless channel state change, improving the system utility by about 15% compared to the benchmark algorithm while meeting the long-term constraints of the queue.

  • Artificial Intelligence and Pattern Recognition
    Zhiqiang WU, Qing XIE, Lin LI, Yongjian LIU
    Computer Engineering. 2024, 50(1): 91-100. https://doi.org/10.19678/j.issn.1000-3428.0066929

    Many existing Graph Neural Network(GNN) recommendation algorithms use the node number information of the user-item interaction graph for training and learn the high-order connectivity among user and item nodes to enrich their representations. However, user preferences for different modal information are ignored, modal information such as images and text of items are not utilized, and the fusion of different modal features is summed without distinguishing the user preferences for different modal information types. A multimodal fusion GNN recommendation model is proposed to address this problem. First, for a single modality, a unimodal graph network is constructed by combining the user-item interaction bipartite graph, and the user preference for this modal information is learned in the unimodal graph. Graph ATtention(GAT) network is used to aggregate the neighbor information and enrich the local node representation, and the Gated Recurrent Unit(GRU) is used to decide whether to aggregate the neighbor information to achieve the denoising effect. Finally, the user and item representations learned from each modal graph are fused by the attention mechanism to obtain the final representation and then sent to the prediction module. Experimental results on the MovieLens-20M and H&M datasets show that the multimodal information and attention fusion mechanism can effectively improve the recommendation accuracy, and the algorithm model has significant improvements in Precision@K, Recall@K, and NDCG@K compared with the baseline optimal algorithm for the three indicators. When an evaluation index K value of 10 is selected, Precision@10, Recall@10, and NDCG@10 increase by 4.67%, 2.42%, 2.03%, and 2.49%, 5.24%, 2.05%, respectively, for the two datasets.

  • Graphics and Image Processing
    Jiaxin LI, Jin HOU, Boying SHENG, Yuhang ZHOU
    Computer Engineering. 2023, 49(9): 256-264. https://doi.org/10.19678/j.issn.1000-3428.0065935

    In remote sensing imagery, the detection of small objects poses significant challenges due to factors such as complex background, high resolution, and limited effective information. Based on YOLOv5, this study proposes an advanced approach, referred to as YOLOv5-RS, to enhance small object detection in remote sensing images. The presented approach employs a parallel mixed attention module to address issues arising from complex backgrounds and negative samples. This module optimizes the generation of a weighted feature map by substituting fully connected layers with convolutions and eliminating pooling layers. To capture the nuanced characteristics of small targets, the downsampling factor is tailored, and shallow features are incorporated during model training. At the same time, a unique feature extraction module combining convolution and Multi-Head Self-Attention (MHSA) is designed to overcome the limitations of ordinary convolution extraction by jointly representing local and global information, thereby extending the model's receptive field. The EIoU loss function is employed to optimize the regression process for both prediction and detection frames to enhance the localization capacity of small objects. The efficacy of the proposed algorithm is verified via experiments on datasets comprising small target remote sensing images. The results show that compared with YOLOv5s, the proposed algorithm has an average detection accuracy improvement of 1.5 percentage points, coupled with a 20% reduction in parameter count. Particularly, the proposed algorithm's average detection accuracy of small vehicle targets increased by 3.2 percentage points. Comparative evaluations against established methodologies such as EfficientDet, YOLOx, and YOLOv7 underscore the proposed algorithm's capacity to adeptly balance the dual objectives of detection accuracy and real-time performance.

  • Graphics and Image Processing
    Yang LIU, Jun CHEN, Shijia HU, Jiahua LAI
    Computer Engineering. 2023, 49(10): 247-254. https://doi.org/10.19678/j.issn.1000-3428.0065825

    In the mainstream feature-based Simultaneous Localization and Mapping(SLAM) method, feature matching is a key step in estimating camera motion. However, the local characteristics of image features cause widespread mismatch and have become a major bottleneck in visual SLAM. In addition, the sparse maps generated by the feature-based method can only be used for localization, as they do not satisfy higher-level requirements. To address the problems of low efficiency in ORB feature point matching and failure to generate dense maps in ORB-SLAM3, an improved ORB Grid-based Motion Statistics(ORB-GMS) matching strategy is proposed, whereby a dense point cloud construction thread is added to ORB-SLAM3 to realize dense mapping. The motion smoothness constraint is used for the feature point motion statistics method, and the number of matches in the feature point neighborhood and threshold are compared to efficiently determine whether the current match is correct. The gridded images are used for fast computation to perform camera pose estimation. Finally, the dense point cloud map is constructed according to the key frame and the corresponding pose, using the outlier point removal and voxel-grid filters to reduce the size of the point cloud. The experimental results on the RGB-D dataset of TUM show that compared with ORB-SLAM3, the proposed algorithm can reduce matching time by approximately 50% and average positioning error by 32%, while increasing the number of matches by an average of 60%. In addition, compared to sparse maps, this method generates dense point cloud maps that are easy for secondary processing, thereby expanding the application scenarios of the algorithm.

  • Graphics and Image Processing
    Wenzhuo FAN, Tao WU, Junping XU, Qingqing LI, Jianlin ZHANG, Meihui LI, Yuxing WEI
    Computer Engineering. 2023, 49(9): 217-225. https://doi.org/10.19678/j.issn.1000-3428.0065689

    Traditional deep learning image super-resolution reconstruction network only extracts features at a fixed resolution and cannot integrate advanced semantic information. The challenges include difficulties integrating advanced semantic information, reconstructing images with specific scale factors, limited generalization capability, and managing an excessive number of network parameters. An arbitrary scale image super-resolution reconstruction algorithm based on multi-resolution feature fusion is proposed, termed as MFSR. In the phase of multi-resolution feature fusion encoding, a multi-resolution feature extraction module is designed to extract different resolution features. A dual attention module is constructed to enhance the network feature extraction ability. The information-rich fused feature map is obtained by fully interacting with different resolution features. In the phase of image reconstruction, the fused feature map is decoded by a multi-layer perception machine to realize a super-resolution image at any scale. The experimental results indicate that tests were conducted on the Set5 data set with scaling factors of 2, 3, 4, 6, 8, and the Peak Signal-to-Noise Ratios (PSNR) of the proposed algorithm were 38.62, 34.70, 32.41, 28.96, and 26.62 dB, respectively. The model parameters correspond to 0.72×106, which significantly reduce the number of parameters, maintain the reconstruction quality, and realize super-resolution image reconstruction at any scale. Furthermore, the model can realize better performance than mainstream algorithms, such as SRCNN, VDSR, and EDSR.

  • Development Research and Engineering Application
    Jianhao ZHAN, Lipeng GAN, Yonghui BI, Peng ZENG, Xiaochao LI
    Computer Engineering. 2023, 49(10): 280-288, 297. https://doi.org/10.19678/j.issn.1000-3428.0065152

    The multi-modality fusion method is a core technique for effectively exploring complementary features from multiple modalities to improve action recognition performance at data-, feature-, and decision-level fusion. This study mainly investigated the multimodality fusion method at the feature and decision levels through knowledge distillation, transferring feature learning from other modalities to the RGB model, including the effects of different loss functions and fusion strategies. A multi-modality distillation fusion method is proposed for action recognition, whereby knowledge distillation is performed using the MSE loss function at the feature level, KL divergence at the decision-prediction level, and a combination of the original skeleton and optical flow modalities as multi-teacher networks so that the RGB student network can simultaneously learn with better recognition accuracy. Extensive experiments show that the proposed method achieved state-of-the-art performance with 90.09%, 95.12%, 97.82%, and 81.26% accuracies on the NTU RGB+D 60, UTD-MHAD, N-UCLA, and HMDB51 datasets, respectively. The recognition accuracy on the UTD-MHAD dataset has increased by 3.49, 2.54, 3.21, and 7.34 percentage points compared to single mode RGB data, respectively.

  • Graphics and Image Processing
    Bingyan ZHU, Zhihua CHEN, Bin SHENG
    Computer Engineering. 2024, 50(1): 216-223. https://doi.org/10.19678/j.issn.1000-3428.0066941

    Owing to the rapid development of remote sensing technology, remote sensing image detection technology is being used extensively in agriculture, military, national defense security, and other fields. Compared with conventional images, remote sensing images are more difficult to detect; therefore, researchers have endeavored to detect remote sensing images efficiently and accurately. To address the high calculation complexity, large-scale range variation, and scale imbalance of remote sensing images, this study proposes a perceptually enhanced Swin Transformer network, which improves the detection of remote sensing images. Exploiting the hierarchical design and shift windows of the basic Swin Transformer, the network inserts spatial local perceptually blocks into each stage, thus enhancing local feature extraction while negligibly increasing the calculation amount. An area-distributed regression loss is introduced to assign larger weights to small objects for solving scale imbalance; additionally, the network is combined with an improved IoU-aware classification loss to eliminate the discrepancy between different branches and reduce the loss of classification and regression. Experimental results on the public dataset DOTA show that the proposed network yields a mean Average Precision(mAP) of 78.47% and a detection speed of 10.8 frame/s, thus demonstrating its superiority over classical object detection networks(i.e., Faster R-CNN and Mask R-CNN) and existing excellent remote sensing image detection networks. Additionally, the network performs well on all types of objects at different scales.

  • Computer Engineering. 2023, 49(10): 0-0.
  • Artificial Intelligence and Pattern Recognition
    Qiru LI, Xia GENG
    Computer Engineering. 2023, 49(12): 111-120. https://doi.org/10.19678/j.issn.1000-3428.0066348

    The traditional Deep Q Network(DQN) algorithm solves the dimensionality problem of Q-learning algorithms in complex environments by integrating deep neural networks and reinforcement learning methods that are widely used in the path planning of mobile robots. However, the traditional DQN algorithm has a low network convergence speed and poor path planning effect, and consequently, obtaining the optimal path in a short training round is challenging. To solve these problems, an improved ERDQN algorithm is proposed. The Q value is recalculated by recording the frequency of the repeated states. The more times a state is repeated in the process of network training, the lower the probability of the next occurrence of the state. This phenomenon can improve the robot's ability to explore the environment, reduce the risk of network convergence to the local optima to a certain extent, and reduce the number of training rounds required for network convergence. The reward function is redesigned according to the moving direction of the robot, and the distance between the robot and target point. The robot obtains a positive reward when it is close to the target point and a negative reward when it is far from the target point. The absolute value of the reward is adjusted according to the current moving direction of the robot, and the distance between the robot and target point; thus, the robot can plan a better path while avoiding obstacles. The experimental results show that compared with the DQN algorithm, the average score of the ERDQN algorithm is increased by 18.9%, whereas the path length and number of planned rounds reduced by approximately 20.1% and 500, respectively. These results prove that the ERDQN algorithm can effectively improve network convergence speed and path planning performance.

  • Graphics and Image Processing
    Chunbo XU, Juan YAN, Huibin YANG, Bo WANG, Han WU
    Computer Engineering. 2023, 49(8): 199-206, 214. https://doi.org/10.19678/j.issn.1000-3428.0065522

    Currently, most Visual Simultaneous Localization And Mapping(VSLAM) algorithms are based on static scene design and do not consider dynamic objects in a scene.However, dynamic objects in an actual scene cause mismatches among the feature points of the visual odometer, which affects the positioning and mapping accuracy of the SLAM system and reduce its robustness in practical applications. Aimed at an indoor dynamic environment, a VSLAM algorithm based on the ORB-SLAM3 main framework, known as RDTS-SLAM, is proposed. An improved YOLOv5 target detection and semantic segmentation network is used to accurately and rapidly segment objects in the environment.Simultaneously, the target detection results are combined with the local optical flow method to accurately identify dynamic objects, and the feature points in the dynamic object area are eliminated. Only static feature points are used for feature point matching and subsequent positioning and mapping.Experimental results on the TUM RGB dataset and actual environment data show that compared to ORB-SLAM3 and RDS-SLAM algorithms, the Root Mean Square Error(RMSE) of trajectory estimation for sequence walking_rpy of RDTS-SLAM algorithm is reduced by 95.38% and 86.20%, respectively, which implies that it can significantly improve the robustness and accuracy of the VSLAM system in a dynamic environment.

  • Research Hotspots and Reviews
    Jian CAO, Yimei CHEN, Haisheng LI, Qiang CAI
    Computer Engineering. 2023, 49(10): 1-12. https://doi.org/10.19678/j.issn.1000-3428.0065984

    Small target detection in complex road scenes can improve the vehicle's perception of the surrounding environment. Thus, it is an important research direction in the field of computer vision and intelligent transportation. With the development of deep learning technology, a combination of deep learning and small target detection on roads can effectively improve detection accuracy, allowing the vehicle to quickly respond to the surrounding environment. Starting with the latest classic research results in small target detection, this research provides two definitions for small targets and analyzes the reasons for the difficulty encountered in small target detection on roads. Subsequently, five types of optimization methods based on deep learning are expounded upon to improve detection accuracy of small targets on roads. The optimization methods include enhanced data, multi-scale strategy, generated Super-Resolution(SR) detail information, strengthened contextual information connection and improved loss function. The core ideas of various methods and the latest research progress at home and abroad are summarized. Large and public datasets commonly used in road small target detection are introduced along with corresponding indicators to evaluate the performance of small target detection. In comparing and analyzing the performance detection results of various methods on different datasets, this research presents the current research on road small target and associated problems, looking forward to future research directions from multiple perspectives.

  • Computer Engineering. 2024, 50(1): 0-0.
  • Research Hotspots and Reviews
    Jinsheng CHEN, Wenzhen MA, Shaofeng FANG, Ziming ZOU
    Computer Engineering. 2023, 49(11): 13-23. https://doi.org/10.19678/j.issn.1000-3428.0066521

    With the construction of the Meridian Project all-sky airglow imager observation network, a large amount of raw airglow image data has been accumulated. The current atmospheric gravity wave research based on airglow observation is extremely dependent on manual identification, which is very time-consuming, and the quality of labeling is difficult to guarantee. Therefore, there is an urgent need for a fast and effective automatic identification method. To solve the problem of sparsely labeled samples of atmospheric gravity waves, this paper proposes an algorithm based on the improved Cycle GAN model to expand the atmospheric gravity wave airglow observation dataset, thereby greatly improving the recognition accuracy of atmospheric gravity waves by labeling only a small number of samples. A new intelligent recognition algorithm for atmospheric gravity waves is also proposed by improving the YOLOv5s model backbone network and bounding box prediction, considering the characteristics of low Signal-to-Noise Ratio(SNR) between the recognition target and background in airglow images. The experimental results showed that using the augmented dataset and improved YOLOv5s target detection algorithm, the average precision reached 75.8% under an Intersection-over-Union(IoU) threshold of 0.5, which is 9.7 percentage points higher than that of the original model. Meanwhile, the detection speed and average recognition accuracy are superior to mainstream target detection algorithms compared.

  • Computer Engineering. 2024, 50(3): 0-0.
  • Graphics and Image Processing
    Xinlu JIANG, Tianen CHEN, Cong WANG, Chunjiang ZHAO
    Computer Engineering. 2024, 50(1): 232-241. https://doi.org/10.19678/j.issn.1000-3428.0067030

    Intelligent pest detection is an essential application of target detection technology in the agricultural field. This detection method effectively improves the efficiency and reliability of pest detection and reporting work and ensures crop yield and quality. Under fixed-trapping devices such as insect traps and sticky insect boards, the image background is simple, the lighting conditions are stable, and the pest features are significant and easy to extract. Pest detection can achieve high accuracy, but its application scenario is fixed, and the detection range is limited to the surrounding equipment and cannot adapt to complex field environments. A small object pest detection model called Pest-YOLOv5 is proposed to improve the flexibility of pest detection and prediction to address the difficulties and missed detections attributed to complex image backgrounds and small pest sizes in field environments. By adding a Coordinate Attention(CA) mechanism in the feature extraction network and combining spatial and channel information, the ability to extract small object pest features is enhanced. The Bidirectional Feature Pyramid Network(BiFPN) structure is used in the neck connection section, and multi-scale features are combined to alleviate the problem of small object information loss caused by multiple convolutions. Based on this, SIoU and VariFocal loss functions are used to calculate losses, and the optimal classification loss weight coefficients are obtained experimentally, making the model more focused on object samples that are difficult to classify. The experimental results on a subset of the publicly available dataset, AgriPest, show that the Pest-YOLOv5 model has mAP0.5 and recall of 70.4% and 67.8%, respectively, which are superior to those of classical object detection models, such as the original YOLOv5s model, SSD, and Faster R-CNN. Compared with the YOLOv5s model, the Pest-YOLOv5 model improves the mAP0.5, mAP0.50∶0.95, and recall by 8.1%, 7.9%, and 12.8%, respectively, enhancing the ability to detect targets.

  • WANG Yiling , WU Qi , AN Junshe
    Accepted: 2023-07-05
    The distributed architecture of the spatial data system defined by the CCSDS advanced in-orbit system standard is an effective solution to improve the overall reliability of the on-board system. In China, the spacecraft control terminals, as typical application nodes of the architecture, usually use Loongson series processors based on MIPS architecture. However, due to the lack of an autonomous and controllable lightweight operating system, the deployment and application of this architecture in China's aerospace field are limited. The lightweight OpenHarmony has the advantages of independent controllability, high real-time and low power consumption, but it currently does not support the MIPS architecture of domestic Loongson processors. In order to build an autonomous and controllable aerospace information system technology architecture and achieve the goal of porting and adapting to the domestic OpenHarmony lightweight operating system on the Loongson control terminals, by analyzing the LiteOS-M lightweight real-time kernel of OpenHarmony and MIPS architecture, focusing on the hardware abstraction layer (HAL) and kernel hardware related parts, it includes the port scheme of boot loading, HAL architecture adaptation, UART driver, kernel cutting, and tool chain construction. In order to verify the basic functions and real-time performance indicators of the migrated system, experimental test cases are designed based on the MIPS architecture Loongson spaceborne control terminal LS1J and LS1C hardware platforms. The experimental results show that the lightweight real-time system of OpenHarmony successfully adapts to the MIPS architecture and can run stably and reliably on the Loongson control terminal. The system task context switching delay is 0.229μs and the interruptresponse delay is 4.73μs, which meets the real-time system indicators. It provides a solution for the deployment and application of the distributed architecture of space-borne computers for aerospace thin terminals in China, and has certain reference significance for building a reliable, safe and autonomously controllable information technology system for space-based computers.
  • Frontiers in Computer Systems
    Yanfei FANG, Qi LIU, Enming DONG, Yanbing LI, Feng GUO, Di WANG, Wangquan HE, Fengbin QI
    Computer Engineering. 2023, 49(12): 10-24. https://doi.org/10.19678/j.issn.1000-3428.0066548

    Manycore has become the mainstream processor architecture for building HPC supercomputer systems, providing powerful computing power for High Performance Computing(HPC) exascale supercomputers. With the increasing number of cores integrated on manycore processor chips, the competition for large-scale cores for memory resources has become more intense. Manycore on-chip memory hierarchy is an important structure that alleviates the "memory wall" problem, aids HPC applications better play the computing advantages of manycore processors, and improves the performance of practical applications. The design has a significant impact on the performance, power consumption, and area of an on-chip system. The design of a many-call on-chip memory hierarchy has a significant impact on the performance, power consumption, and area of manycore systems. It is an important part of the structural design of manycore systems and is a research interest in the industry. Owing to the differences in the development history of manycore chips, the design technology of on-chip microarchitecture, and the different requirements of the application fields, the current HPC mainstream manycore on-chip storage hierarchy is different; however, from the perspective of horizontal comparison and the vertical development trend of each processor, as well as from the changes in application requirements brought by the continuous integration and development of HPC, data science, and machine learning, the hybrid structure of the SPM+Cache would most likely become the mainstream choice for the on-chip storage hierarchy designs of manycore processors in HPC exascale supercomputer systems in the future. For exascale computing software and algorithms, the designs and optimization based on the characteristics of the manycore memory hierarchy can aid HPC applications benefit from the computing advantages of manycore processors, thus effectively improving the performance of practical applications. Therefore, software, algorithm design, and optimization technology for the characteristics of the manycore on-chip storage hierarchy is also a research interest in the industry. This study first partitioned the on-chip memory hierarchy into multilevel Cache, SPM, and SPM+Cache hybrid structures according to different organizations, and then summarized and analyzed the advantages and disadvantages of these structures. This study analyzed the current status and development trend of the memory hierarchy designs of the chips of mainstream exascale supercomputer systems, such as the international mainstream GPU, homogeneous manycore, and domestic manycore. In summary, the research status of software and hardware technologies is related to the design and optimization of the memory hierarchy from the manycore of the manycore LLC management and cache consistency protocol, SPM management and data movement optimization, and the global perspective optimization of the SPM+cache hybrid architecture. Thus, this study looks forward to the future research direction of on-chip memory hierarchy based on different perspectives, such as hardware, software, and algorithm designs.

  • Graphics and Image Processing
    Fangyu FENG, Xiaoshu LUO, Zhiming MENG, Guangyu WANG
    Computer Engineering. 2023, 49(8): 190-198. https://doi.org/10.19678/j.issn.1000-3428.0065224

    As it is difficult to extract effective features in facial expression recognition and the high similarity between categories and easy confusion lead to low accuracy of facial expression recognition, a facial expression recognition method based on anti-aliasing residual attention network is proposed. First, in view of the problem that the traditional subsampling method can easily cause the loss of expression discriminative features, an anti-aliasing residual network is constructed to improve the feature extraction ability of expression images and enhance the representation of expression features, enabling more effective global facial expression information to be extracted.At the same time, the improved channel attention mechanism and label smoothing regularization strategy are used to enhance the attention to the local key expression regions of the face: the improved channel attention focuses on the highly discriminative expression features and suppresses the weight of non-expressive regions, so as to locate more detailed local expression regions in the global information extracted by the network, and the label smoothing technology corrects the prediction probability by increasing the amount of information of the decision-making expression category, avoiding too absolute prediction results, which reduces misjudgment between similar expressions. Experimental results show that, the recognition accuracies of this method on the facial expression datasets RAF-DB and FERPlus reach 88.14% and 89.31%, respectively.Compared with advanced methods such as DACT and VTFF, this method has better performance. Compared with the original residual network, the accuracy and robustness of facial expression recognition are effectively improved.

  • Research Hotspots and Reviews
    Zhe LIAN, Yanjun YIN, Fei YUN, Min ZHI
    Computer Engineering. 2024, 50(3): 16-27. https://doi.org/10.19678/j.issn.1000-3428.0067427

    Natural scene text detection technology based on deep learning has become a crucial research focal point in the fields of computer vision and natural language processing. Not only does it possess a wide range of potential applications but also serves as a new platform for researchers to explore neural network models and algorithms. First, this study introduces the relevant concepts, research background, and current developments in natural scene text detection technology. Subsequently, an analysis of recent deep learning-based text detection methods is performed, categorizing them into four classes: detection boxes-, segmentation-, detection-boxes and segmentation-based, and others. The fundamental concepts and main algorithmic processes of classical and mainstream methods within these four categories are elaborated, summarizing the usage mechanisms, applicable scenarios, advantages, disadvantages, simulation experimental results, and environment settings of different methods, while clarifying their interrelationships. Thereafter, common public datasets and performance evaluation methods for natural scene text detection are introduced. Finally, the major challenges facing current deep learning-based natural scene text detection technology are outlined, and future development directions are discussed.

  • Research Hotspots and Reviews
    Xingxing DONG, Jixun GAO, Xiaotong WANG, Song LI
    Computer Engineering. 2023, 49(9): 1-15. https://doi.org/10.19678/j.issn.1000-3428.0064822

    As indispensable components of spatial relations, spatial directional relations are widely used in many fields such as urban intelligent traffic control, environmental resource detection, and disaster prevention and reduction. Spatial directional relations represent a significant and challenging issue in fields such as geographic information systems, spatial database, artificial intelligence, and pattern recognition.This study conducts a comprehensive analysis and comparison of existing spatial directional relationship expression and inference models.First, the research progress on current models for directional relations between objects in two-dimensional space are introduced in detail in terms of single and group target objects.In addition, the characteristics, advantages, and disadvantages of the current models for directional relations in three-dimensional space are analyzed from point to block.The study expounds the current research status of models that use uncertainty directional relations from the two aspects of extended models based on those that use precision objects for directional relations and models based on uncertainty set theory for uncertain objects. The study then discusses the advantages, drawbacks, and applicable fields for each type of model.Finally, the shortcomings of current research are explained, and the future research directions of spatial orientation relations are prospected in terms of automatic reasoning technology, joint representation of spatial relations, and group target objects.

  • Artificial Intelligence and Pattern Recognition
    Jun LUO, Qingwei GAO, Yi TAN, Dawei ZHAO, Yixiang LU, Dong SUN
    Computer Engineering. 2023, 49(11): 49-60. https://doi.org/10.19678/j.issn.1000-3428.0065787

    Label-specific features are a research hotspot in multi-label learning, which utilizes label feature extraction to solve the problem of multiple class labels in a single instance. Existing research on multi-label classification usually considers only the correlation between labels and ignores the local manifold structure between the original data, which results in a decrease in classification accuracy. In addition, in label correlation, the structural relationship between features and labels, as well as the inherent causal relationship between labels, are often overlooked. To address these issues, in this study, a multi-label learning algorithm based on double Laplace regularization and causal inference is proposed. Linear regression models are used to establish a basic multi-label classification framework which is combined with causal learning to explore the inherent causal relationships between labels, to achieve the goal of mining the essential connections between labels. To fully utilize the structural relationship between features and labels, double Laplace regularization is added to mine local label association information and effectively maintain the local manifold structure of the original data. The effectiveness of the proposed algorithm is verified on a public multi-label dataset. The experimental results showed that compared to algorithms such as LLSF, ML-KNN, and LIFT, the proposed algorithm achieved an average performance improvement of 8.82%, 4.98%, 9.43%, 16.27%, 12.19%, and 3.35% in terms of Hamming Loss(HL), Average Precision(AP), One Error(OE), Ranking Loss(RL), coverage, and AUC, respectively.

  • Computer Architecture and Software Technology
    Sichi YANG, Rongcai ZHAO, Lin HAN, Hongsheng WANG
    Computer Engineering. 2024, 50(2): 206-213. https://doi.org/10.19678/j.issn.1000-3428.0067210

    In a domestic general-purpose accelerator Deep Computing Unit(DCU), Local Data Shared(LDS) is a key storage component with a lower latency and higher bandwidth than global memory. As heterogeneous programs use LDS more frequently, the low memory access efficiency of LDS has become an important limiting factor in the performance of heterogeneous programs. In addition, owing to bank conflicts in the LDS access process, LDS access must follow certain principles to be used efficiently. When the data access between threads presents overlapping memory access characteristics, access vectorization instructions create delays. To address this problem, an optimization method for the LDS memory access vectorization for the DCU is proposed. This method reduces the number of LDS accesse and time-consuming memory accesse by realizing the vectorization of continuous data access, thereby improving the efficiency of program memory access. On this basis, through the determination of memory access characteristics, an LDS access vectorization method that can effectively address data overlap is proposed, and an efficient LDS memory access technology for domestic general-purpose accelerators is realized to ensure the vectorization method effectively improve the memory access efficiency. The experimental results demonstrate that in the heterogeneous programs using LDS, the program performance is improved by an average of 22.6% after the LDS access vectorization is implemented, which verifies the effectiveness of this study. Simultaneously, the vectorization method can realize the overlapping of memory access data between LDS threads, and improves the performance of heterogeneous programs by an average of 30%.

  • Development Research and Engineering Application
    Xinyi ZHANG, Fei ZHANG, Bin HAO, Lu GAO, Xiaoying REN
    Computer Engineering. 2023, 49(8): 265-274. https://doi.org/10.19678/j.issn.1000-3428.0065701

    In dense crowd scenes in public places, face mask wearing detection algorithms have poor detection results because of missing information caused by target occlusion and the problems of small detection targets and low resolution. To improve the detection accuracy and speed of the model as well as to reduce the hardware footprint, an improved mask wearing detection algorithm based on YOLOv5s is proposed. The conventional convolution is replaced with Ghost-Shadowed wash Convolution(GSConv), combining Standard Convolution(SConv)and Depth-Wise separable Convolution(DWConv) with channel blending, thereby improving the network speed with guaranteed accuracy. The nearest neighbor upsampling method is replaced with a lightweight universal upsampling operator to make full use of the semantic feature information. Adaptive Spatial Feature Fusion(ASFF) is added at the end of the neck layer of the improved YOLOv5s model, which allows better fusion of features at different scales and improves the network detection accuracy.In addition, adaptive image sampling is used to alleviate the problem of data imbalance. Mosaic data enhancement is used to make full use of small targets.Experimental results show that the model achieves a mean Average Precision(mAP) value of 93% on the AIZOO dataset, a 2 percentage points improvement over the original YOLOv5 model.It achieves 97.7% detection accuracy for faces wearing masks and outperforms the detection results of the YOLO series, SSD, and RetinaFace in the same situation. It also runs on a GPU with a 16.7 percentage points inference speedup. The model weights file uses 23.5 MB memory for real-time mask wearing detection.

  • Cyberspace Security
    Shuaiwei LIU, Zhi LI, Guomei WANG, Li ZHANG
    Computer Engineering. 2024, 50(2): 180-187. https://doi.org/10.19678/j.issn.1000-3428.0067077

    Adversarial attack and defense is a popular research area in computer security. Trans-GAN, an adversarial example generation algorithm based on the combination of Transformer and Generate Adversarial Network(GAN), is proposed to address the problems of the poor visual quality of existing gradient-based adversarial example generation methods and the low generation efficiency of optimization-based methods. First, the algorithm utilizes the powerful visual representation capability of the Transformer as a reconstruction network for receiving clean images and generating adversarial noise. Second, the Transformer reconstruction network is combined with a deep convolutional network-based discriminator as a generator to form a GAN architecture, which improves the authenticity of the generated images and ensures the stability of training. Meanwhile, the improved attention mechanism, Targeted Self-Attention, is proposed to introduce target labels as a priori knowledge when training the network, which guides the network model to learn to generate adversarial perturbations with specific attack targets. Finally, adversarial noise is added to the clean examples using skip-connections to form adversarial examples. Experimental results demonstrate that the proposed algorithm achieves an attack success rate of more than 99.9% on both models used for the MNIST dataset and 96.36% and 98.47% on the two models used for the CIFAR10 dataset, outperforming the current state-of-the-art generative-based adversarial attack methods. The qualitative results show that compared to the Fast Gradient Sign Method(FGSM)and Projected Gradient Descent(PGD)algorithms, the generated adversarial noise of the Trans-GAN algorithm is less perturbed, and the formed adversarial examples are more natural and meet the requirements of human vision, which is not easily distinguished.

  • Frontiers in Computer Systems
    Junchao YE, Cong XU, Yao HUANG, Zhilei CHAI
    Computer Engineering. 2023, 49(12): 35-45. https://doi.org/10.19678/j.issn.1000-3428.0066260

    As a third-generation neural network, the Spiking Neural Network(SNN) uses neurons and synapses as the basic computing units, and its working mechanism is similar to that of the biological brain. Its complex topology of intra-layer connections and reverse connections has the potential to solve complex problems. Compared with the Leaky-Integrate-and-Fire(LIF) model, the Izhikevich neuron model can support a wider range of neuromorphic computing by simulating more biological impulse phenomena; however, the Izhikevich neuron model has higher computational complexity, leading to potential issues of suboptimal performance and increased power consumption within the network. To address these problem, a customized calculation method of Izhikevich neurons based on FPGA is proposed. First, by studying the value range of the parameters of Izhikevich neurons in the SNN and balancing the relative errors of the membrane potential and resource consumption, a fixed-point solution with mixed-precision is designed. Second, for a single neuron, the data path of the calculation equation is updated by balancing the neuron to achieve the minimum pipeline length. Furthermore, at the network level, a scalable computing architecture is devised to accommodate varying FPGA scales, ensuring adaptability across different configurations. Finally, the customized computing method is used to accelerate the classical NEST simulator. The experimental results reveal that, compared with that of the i7-10700 CPU, the performance of the classic lateral geniculate nucleus network model and the liquid state machine model on the ZCU102 is 2.26 and 3.02 times better in average, and the energy efficiency ratio is improved by 8.06 and 10.8 times in average.

  • Development Research and Engineering Application
    Long SUN, Rongfen ZHANG, Yuhong LIU, Tingli RAO
    Computer Engineering. 2023, 49(9): 313-320. https://doi.org/10.19678/j.issn.1000-3428.0065697

    In dense crowds scenario, dense targets under the monitoring perspective, mutual occlusion, small targets, and face perspective distortion cause problems in mask wearing detection. Meanwhile, public datasets covering incorrectly worn masks are also lacking. Therefore, this paper proposes a mask wearing detection algorithm from a monitoring perspective, MDDC-YOLO, based on the YOLO-v5 improvement. In view of the large proportion of small- and medium-sized targets in dense population, the conventional C3 module in YOLO-v5 is replaced with the MRF-C3 module of the atrous convolutional structure. The anti-occlusion ability of the model is also improved by using Repulsion Loss based on the principle of repulsion attraction of the sample bounding box, and the masking positive sample is fully utilized during the training process. An Efficient Channel Attention(ECA) mechanism is further introduced for optimal selection of feature channels. Finally, to address the lack of mask wearing data in the crowd from a monitoring perspective, an offline data enhancement method based on perspective transformation is proposed. The proposed Mosaic-9 data enhancement generates additional small target samples to address this problem. The experimental results show that the MDDC-YOLO algorithm provides 6.5 percentage points mAP improvement compared with YOLO-v5, thereby reaching a detection speed of 32 frame/s, which satisfies the application requirements of mask-wearing detection in dense populations.

  • Research Hotspots and Reviews
    Li YU, Lin HAN, Youcai LUO, Jiandong SHANG
    Computer Engineering. 2024, 50(2): 51-58. https://doi.org/10.19678/j.issn.1000-3428.0067536

    Currently, there is no implementation related to the symmetric matrix eigenvalue solution on China's autonomous and controllable FT-M6678 platform, and the existing mathematical calculation library on this platform cannot satisfy the requirements for solving similar problems. This study focuses on the domestic FT-M6678 processor, implements and optimizes the algorithm of the symmetric matrix eigenvalue solution, SYEV, and improves the linear algebra calculation library of the FT-M6678 platform. First, by analyzing the implementation process and running hotspots of the SYEV algorithm, compile, memory access, and vector parallel optimizations are performed based on the FT-M6678 platform. Compilation optimization refers to guiding the compiler to optimize programs based on different compilation options to achieve acceleration effects; memory access optimization includes cache optimization and allocation optimization of data and program segments, accelerating the efficiency of matrix data access; and vector parallelization optimization includes loop unrolling and Single Instruction Multiple Data(SIMD)instruction parallel optimization adapted to the FT-M6678 platform, which improves the computational efficiency of programs. Verification and performance tests of the implemented and optimized algorithms are performed using the FT-M6678 platform. The accuracy of the algorithms passes the test of official Linear Algebra PACKage(LAPACK)test set, and the optimization acceleration effect of the algorithm on the FT-M6678 platform can reach 58.346 times, which can improve the speed by 2.053 times compared with the TMS320C6678 platform.

  • Graphics and Image Processing
    Hong ZHAO, Yubo FENG
    Computer Engineering. 2023, 49(12): 194-204. https://doi.org/10.19678/j.issn.1000-3428.0066520

    In tasks involving traffic sign detection, the YOLOv5 detection algorithm encounters several issues including missed detections, erroneous detections, and a complex model in complex environments and road conditions. To address these challenges, an improved CGS-Ghost YOLO detection model is proposed. YOLOv5 uses the focus module for sampling, which introduces more parameters. In this study, the StemBlock module is used to replace the focus module for sampling after input, which can reduce the number of parameters while maintaining the accuracy. CGS-Ghost YOLO uses a Coordinate Attention(CA) mechanism, which improves the semantic and location information within the features and enhances the feature extraction ability of the model. Additionally, a CGS convolution module, which combines the SMU activation function with GroupNorm(GN) normalization, is proposed. The CGS convolution module is designed to avoid the influence of the batch Size on the model during training and improve model performance. This study aims to use GhostConv to reduce the number of model parameters and effectively improve the detection accuracy of the model.The loss function, $ \alpha $-CIoU Loss+VFocal Loss, is used to solve the problem of unbalanced positive and negative samples in traffic sign detection tasks and improve the overall performance of the model. The neck part uses a Bi-FPN bidirectional feature pyramid network, ensuring that the multi-scale features of the detection target are effectively fused. The results of an experiment on the TT100K traffic sign detection dataset show that the detection accuracy of the improved CGS-Ghost YOLO model reaches 93.1%, which is 11.3 percentage points higher than the accuracy achieved by the original model. Additionally, the proposed network model reduces the model parameter quantity by 21.2 percentage points compared to the original model. In summary, the network model proposed in this study optimizes the convolution layer and the downsampling part, thus considerably reducing the model parameters while enhancing the model detection accuracy.

  • Research Hotspots and Reviews
    Qilin WU, Yagu DANG, Shanwei XIONG, Xu JI, Kexin BI
    Computer Engineering. 2023, 49(11): 24-29, 39. https://doi.org/10.19678/j.issn.1000-3428.0066181

    Taking the sentiment analysis task of students' teaching evaluation text as the starting point, in view of the insufficient feature-extraction ability of the traditional basic depth learning model, the low training efficiency of the recurrent neural network, and the inaccurate semantic representation of word vectors, a sentiment classification algorithm for student evaluation text based on a hybrid feature network is proposed. The lightweight pre-training model ALBERT is used to extract the dynamic vector representation of each word that conforms to the current context, solve the problem of polysemy in the traditional word vector model, and increase the accuracy of vector semantic representation.The hybrid feature network comprehensively captures the global context sequence features of the teaching evaluation text and the local semantic information at different scales by combining the simple recurrent unit, multi-scale local convolution learning module, and self-attention layer, to improve the deep feature representation ability of the model. The self-attention mechanism identifies the key features that significantly impact the emotional recognition results by calculating the importance of each classification feature to the classification results. To prevent irrelevant features from interfering with the results and affecting the classification performance, the classification vectors are spliced, and the emotional classification results of the evaluation text are output from the linear layer. In an experiment based on a real student teaching evaluation text dataset, the model achieves an F1 score of 97.8%, which is higher than that of the BERT-BiLSTM、BERT-GRU-ATT depth learning model. Additionally, an ablation experiment proves the effectiveness of each module.

  • Frontiers in Computer Systems
    Yi CHEN, Bosheng LIU, Yongqi XU, Jigang WU
    Computer Engineering. 2023, 49(12): 1-9. https://doi.org/10.19678/j.issn.1000-3428.0066701

    Deep Convolutional Neural Network(CNN) have large models and high computational complexity, making their deployment in Programmable Gate Array(FPGA) with limited hardware resources difficult. Hybrid precision CNNs can provide an effective trade-off between model size and accuracy, thus providing an efficient solution for reducing the model's memory footprint. As a fast algorithm, the Fast Fourier Transform(FFT) can convert traditional spatial domain CNNs into the frequency domain, effectively reducing the computational complexity of the model. This study presents an FPGA-based accelerator design for 8 bit and 16 bit hybrid precision frequency domain CNNs that supports the dynamic configuration of 8 bit and 16 bit frequency domain convolutions and can pack 8 bit frequency domain multiplication operations to enable the reuse of DSPs for performance improvement. A DSP-based Frequency-domain Processing Element(FPE) is designed to support 8 bit and 16 bit frequency domain convolution operations. It can pack a couple of 8 bit frequency domain multiplications to reuse DSPs to boost throughput. In addition, a mapping dataflow that supports both 8 bit and 16 bit computation patterns and can maximize the reduction of redundant data processing and data movement through data reuse is proposed. The proposed accelerator is evaluated based on the ResNet-18 and VGG16 models using the ImageNet dataset. The experimental results reveal that the proposed model can achieve 29.74 and 56.73 energy efficiency ratio(ratio of GOP to energy consumption)on the ResNet-18 and VGG16 models, respectively, which is 1.2-6.0 times better than those of frequency domain FPGA accelerators.

  • Graphics and Image Processing
    Zhengjia WANG, Feifei HU, Chengjuan ZHANG, Zhuo LEI, Tao HE
    Computer Engineering. 2024, 50(2): 256-265. https://doi.org/10.19678/j.issn.1000-3428.0067177

    The existing end-to-end stereo matching algorithms preset a fixed disparity range to reduce memory consumption and computation, making it difficult to balance matching accuracy and running efficiency.To solve this problem, this paper proposes an adaptive window stereo matching algorithm based on a lightweight Transformer. The coordinate attention layer with linear complexity is used to encode the position of the low-resolution feature map, which reduces the amount of calculation and enhances the discrimination of similar features. The lightweight Transformer feature description module is designed to convert context-related features, and a separable Multi-Head Self-Attention(MHSN) layer is introduced to reduce Transformer delay. The differentiable matching layer is used to match the features, and an adaptive window matching and refinement module is designed to perform sub-pixel matching and refinement, which improves matching accuracy and reduces video memory consumption, whereby after disparity regression, a disparity map can be generated regardless of the disparity range.The comparative experiments on KITTI2015, KITTI2012, and SceneFlow datasets showed that the proposed stereo matching algorithm is approximately 4.7 times faster than the standard Transformer-based STTR in matching efficiency and has friendlier storage performance. Compared with the PSMNet based on 3D convolution method, the mismatching rate was reduced by 18% and the running time was five times faster, achieving a better balance between speed and accuracy.

  • Graphics and Image Processing
    Xianguo LI, Bin LI
    Computer Engineering. 2023, 49(9): 226-233, 245. https://doi.org/10.19678/j.issn.1000-3428.0065513

    Convolutional Neural Network(CNN) has limitations when applied solely to image deblurring tasks with restricted receptive fields.Transformer can effectively mitigate these limitations.However, the computational complexity increases quadratically as the spatial resolution of the input image increases.Therefore, this study proposes an image deblurring network based on Transformer and multi-scale CNN called T-MIMO-UNet. The multi-scale CNN is used to extract spatial features while the global feature of the Transformer is employed to capture remote pixel information.The local enhanced Transformer module, local Multi-Head Self-Attention(MHSA) computing network, and Enhanced Feed-Forward Network(EFFN) are designed.The block-by-block MHSA computation is performed using a windowing approach. The information interaction between different windows is enhanced by increasing the depth of the separable convolution layer.The results of the experiment conducted using the GoPro test dataset demonstrate that the Peak Signal-to-Noise Ratio(PSNR) of the T-MIMO-UNet increases by 0.39 dB, 2.89 dB, 3.42 dB, and 1.86 dB compared to the MIMO-UNet, DeepDeblur, DeblurGAN, and SRN networks, respectively.Additionally, the number of parameters is reduced by 1/2 compared to MPRNet.These findings prove that the T-MIMO-UNet effectively addresses the challenge of image blurring in dynamic scenes.

  • Cyberspace Security
    Huawei SONG, Shengqi LI, Fangjie WAN, Yuping WEI
    Computer Engineering. 2024, 50(3): 166-172. https://doi.org/10.19678/j.issn.1000-3428.0067791

    Federated Learning(FL) can collaborate to train global models without compromising data privacy. Nonetheless, this collaborative training approach faces the challenge of Non-IID in the real world; slow model convergence and low accuracy. Numerous existing FL methods improve only from one perspective of global model aggregation and local client update, and inevitably will not cause the impact of the other perspective and reduce the quality of the global model. In this context, we introduce a hierarchical continuous learning optimization method for FL, denoted as FedMas, which is based on the idea of hierarchical fusion. First, clients with similar data distribution are divided into different layers using the DBSCAN algorithm, and only part of clients of a certain layer are selected for training each time to avoid weight differences caused by different data distributions when the server global model is aggregated. Further, owing to the different data distributions of each layer, the client combines the solution of continuous learning catastrophic forgetting during local update to effectively integrate the differences between the data of different layers of clients, thus ensuring the performance of the global model. Experiments on MNIST and CIFAR-10 standard datasets demonstrate that the global model test accuracy is improved by 0.3-2.2 percentage points on average compared with FedProx, Scaffold, and FedCurv FL algorithms.

  • Research Hotspots and Reviews
    Bin YANG, Yitong WANG
    Computer Engineering. 2023, 49(10): 13-21. https://doi.org/10.19678/j.issn.1000-3428.0065807

    Heterogeneous Information Network(HIN) typically contains different types of nodes and interactions. Richer semantic information and complex relationships have posed significant challenges to current representation learning in HINs. Although most existing approaches typically use predefined meta-paths to capture heterogeneous semantic and structural information, they suffer from high cost and low coverage. In addition, most existing methods cannot precisely and effectively capture and learn influential high-order neighbor nodes. Accordingly, this study attempts to address the problems of meta-paths and influential high-order neighbor nodes with a proposed original HIN-HG model. HIN-HG generates a hyperadjacency graph of the HIN, precisely and effectively capturing the influential neighbor nodes of the target nodes. Then, convolutional neural networks are adopted with a multichannel mechanism to aggregate different types of neighbor nodes under different relationships. HIN-HG can automatically learn the weights of different neighbor nodes and meta-paths without manually specifying them. Meanwhile, nodes similar to the target node can be captured in the entire graph as higher-order neighbor nodes and the representation of the target node can be effectively updated through information propagation. The experimental results of HIN-HG on three real datasets-DBLP, ACM, and IMDB demonstrate the improved performance of HIN-HG compared with state-of-the-art methods in HIN representation learning, including HAN, GTN, and HGSL. HIN-HG exhibits improved accuracy of node classification by 5.6 and 5.7 percentage points on average in the multiple classification evaluation indices Macro-F1 and Micro-F1, respectively, thus improving the accuracy and effectiveness of node classification.

  • Research Hotspots and Reviews
    Enxu WANG, Xiaohong WANG, Kun ZHANG, Dongwen ZHANG
    Computer Engineering. 2023, 49(11): 40-48, 69. https://doi.org/10.19678/j.issn.1000-3428.0066255

    In response to the challenge of capturing both timing and feature information in current load forecasting models, we propose a dual attention mechanism-based load forecasting model. This model seamlessly integrates both feature attention and temporal attention mechanisms, allowing it to adaptively extract feature and temporal information from server load data. This enhanced approach effectively emphasizes key information within feature and temporal data within the network. To comprehensively and accurately evaluate server load status for the next moment, we employ the CRITIC objective weighting method. This method assigns weights to various server characteristics, facilitating precise load value calculations. The resulting dual attention mechanism network builds upon a foundation of short-term and Long Short-Term Memory(LSTM) networks. It introduces both characteristic and temporal attention mechanisms while utilizing historical load data as input to predict future server load values. This approach significantly enhances the accuracy of the network for both single-step and multi-step load predictions. Experimental results using the Alibaba Cluster-trace-v2018 public dataset demonstrate the superiority of our dual attention mechanism network over LSTM-based load prediction networks. Specifically, the Mean Absolute Error(MAE) and Mean Square Error(MSE) of the dual attention mechanism network show impressive reductions of 9.2% and 16.8% respectively. This performance improvement underscores the network's stability and accuracy.

  • Research Hotspots and Reviews
    Haoyang LI, Xiaowei HE, Bin WANG, Hao WU, Qi YOU
    Computer Engineering. 2024, 50(2): 43-50. https://doi.org/10.19678/j.issn.1000-3428.0066399

    Load prediction is an essential part of cloud computing resource management. Accurate prediction of cloud resource usage can improve cloud platform performance and prevent resource wastage. However, the dynamic and mutative use of cloud computing resources makes load prediction difficult, and managers cannot allocate resources reasonably. In addition, although Informer has achieved better results in time-series prediction, it does not impose restrictions on the causal dependence of time, causing future information leakage. Moreover, it does not consider the increase in network depth leading to model performance degradation. A multi-step load prediction model based on an improved Informer, known as Informer-DCR, is proposed. The regular convolution between attention blocks in the encoder is replaced by dilated causal convolution, such that the upper layer in the deep network can receive a wider range of input information to improve the prediction accuracy of the model, and ensure the causality of the time-series prediction process. Simultaneously, the residual connection is added to the encoder, such that the input information of the lower layer of the network is directly transmitted to the subsequent higher layer, and the deep network degradation is solved to improve the model performance. The experimental results demonstrate that compared with the mainstream prediction models such as Informer and Temporal Convolutional Network(TCN), the Mean Absolute Error(MAE) of the Informer-DCR model is reduced by 8.4%-40.0% under different prediction steps, and Informer-DCR exhibits better convergence than Informer during the training process.

  • Research Hotspots and Reviews
    Baihao JIANG, Jing LIU, Dawei QIU, Liang JIANG
    Computer Engineering. 2024, 50(3): 1-15. https://doi.org/10.19678/j.issn.1000-3428.0067502

    Deep learning algorithms have the advantages of strong learning, strong adaptive, and unique nonlinear mapping abilities in spinal image segmentation. Compared with traditional segmentation methods, they can better extract key information from spinal images and suppress irrelevant information, which can assist doctors in accurately locating focal areas and realizing accurate and efficient segmentation. The application status of deep learning in spinal image segmentation is summarized and analyzed as concerns deep learning algorithms, types of spinal diseases, types of images, experimental segmentation results, and performance evaluation indicators. First, the background of the deep learning model and spinal image segmentation is described, and thereafter, the application of deep learning in spinal image segmentation is introduced. Second, several common types of spinal diseases are introduced, the difficulties in image segmentation are described, and common open datasets, image segmentation method flow, and image segmentation evaluation indicators are introduced in spinal image segmentation. Combined with specific experiments, the application progress of the Convolutional Neural Network(CNN) model, the U-Net model, and their improved models in the image segmentation of vertebrae, intervertebral discs, and spinal tumors are summarized and analyzed. Combined with previous experimental results and the current research progress of deep learning models, this paper summarizes the limitations of current clinical studies and the reasons for the insufficient segmentation effect, and proposes corresponding solutions to the existing problems. Finally, prospects for future studies and development are proposed.

  • Development Research and Engineering Application
    Wei LIU, Lei MA, Kai LI, Rong LI
    Computer Engineering. 2024, 50(2): 337-344. https://doi.org/10.19678/j.issn.1000-3428.0067285

    Chinese Medical Named Entity Recognition(CMNER) focuses on extracting entities from unstructured Chinese medical texts. Current character-based CMNER models inadequately address the distinct features of Chinese characters from various angles, thereby limiting their efficacy in CMNER applications. To address this, a model leveraging multigranular glyph information enhancement for Chinese medical named entity recognition is introduced. This model integrates the glyph spatial structure and radical representation of Chinese characters, aligning them with domain-specific lexicon-based word information. This approach enriches the semantic and boundary potential of characters. Through a gating mechanism, the model effectively combines domain-specific terms with the multifaceted glyph features of Chinese characters, ensuring comprehensive consideration of both domain relevance and intrinsic character details, thereby enhancing its capacity for medical entity recognition. The model employs multigranular glyph-enhanced character representations in the Bidirectional Long Short-Term Memory(BiLSTM) and Conditional Random Field(CRF) layers for contextual encoding and label decoding, respectively. Experimental results demonstrate that the proposed model surpasses the best baseline model, achieving an increase in F1 scores of 1.04% and 0.62% on the IMCS21 and CMeEE datasets, respectively. Ablation studies further confirm the efficacy of each component, highlighting the model's superiority in recognizing Chinese medical named entities.

  • Research Hotspots and Reviews
    Yuyao GAO, Mingquan SHI, Yu QIN, Jianping CHEN, Xi ZHOU, Peng ZHANG
    Computer Engineering. 2023, 49(9): 43-51. https://doi.org/10.19678/j.issn.1000-3428.0066234

    Station ridership data are among the most important basic data in the network planning of routine bus systems.The type, number, and distance of the Point of Interest(POI) around a station can lead to different ridership trends. However, this important feature is not reflected in the structure of traditional fully connected neural networks that are commonly used to study ridership prediction because of the mutual independence of POI influence on ridership, which tends to make prediction results unsatisfactory.This study improves the basic structure of a fully connected neural network by considering the specificity of the relationship between POI and ridership and constructs a specific, non-fully connected neural network. The simulation and prediction of ridership at each time period of the station are achieved using historical ridership data at all bus stations as well as weights of various POI types. The model creates a connection matrix to realize a non-fully connected network, thereby constructing a composite error transfer function to associate meaning with some of the hidden layers, to enhance the interpretability of the neural network based on the nature of ridership.The proposed neural network addresses some of the problems of traditional neural networks, such as slow convergence, poor fitting, and entrapment into local optima. Experiments demonstrate that the proposed model converges to the global optimal solution more rapidly and the probability of accurate prediction exceeded 88% when applying the model to 50 people to predict ridership per hour. The model has an excellent effect compared to other common prediction models and can accurately simulate the daily ridership trend.

  • Research Hotspots and Reviews
    Yimeng QIAO, Yinan JING, Hanbing ZHANG
    Computer Engineering. 2024, 50(1): 30-38. https://doi.org/10.19678/j.issn.1000-3428.0066743

    Owing to the significant latency of exact queries on large-scale datasets, Approximate Query-Processing(AQP) techniques are typically applied to online analytical processing to return query results within interactive timescales with minimal error. The existing learning-based AQP methods decouple the underlying data and convert I/O-intensive calculations into CPU-intensive calculations. However, because of the limitations of computing resources, model training is typically performed based on random data samples.Such training data eliminate rare populations, thus resulting in unsatisfactory prediction accuracy by the model. Hence, this paper proposes a Stratified Sampling-based Sum-Product Network(SSSPN) model and designs an AQP framework based on the abovementioned model.Stratified samples can effectively avoid the elimination of rare populations and significantly improves the model accuracy. Additionally, in terms of dynamic data updates, this paper proposes an adaptive model-update strategy that allows the model to detect data shifts timely and automatically perform updates adaptively.Experimental results show that compared with the performance of AQP methods based on sampling and machine learning, the average relative errors of this model on real and synthetic datasets are approximately 18.3% and 2.2% lower, respectively; in scenarios where data are dynamically updated, both the accuracy and query latency of the model are favorable.

  • Graphics and Image Processing
    Fangxin XU, Rong FAN, Xiaolu MA
    Computer Engineering. 2024, 50(3): 250-258. https://doi.org/10.19678/j.issn.1000-3428.0067741

    Aiming at the problem that the detection algorithm is prone to omission and false detection in crowded pedestrian detection scenarios, this study proposes an improved YOLOv7 crowded pedestrian detection algorithm. Introducing a BiFormer visual transformer and an improved RepConv and Channel Space Attention Module (CSAM)-based Efficient Layer Aggregation Network (RC-ELAN) module in the backbone network, the self-attention mechanism and the attention module enable the backbone network to focus more on the important features of the occluded pedestrians, effectively mitigating the adverse effects of the missing target features on the detection. The improved neck network based on the idea of a Bidirectional Feature Pyramid Network (BiFPN) is used, and the transposed convolution and improved Rep-ELAN-W module enable the model to efficiently utilize the small-target feature information in the middle and low-dimensional feature maps, effectively improving the small-target pedestrian detection performance of the model. The introduction of an Efficient Complete Intersection-over-Union (E-CIoU) loss function allows the model to further converge to a higher accuracy. Experimental results on the WiderPerson dataset containing a large number of small target-obscuring pedestrians demonstrate that the average accuracies of the improved YOLOv7 algorithm when the IoU thresholds are set to 0.5 and 0.5-0.95 are improved by 2.5 and 2.8, 9.9 and 7.1, and 12.3 and 10.7 percentage points compared with the YOLOv7, YOLOv5, and YOLOX algorithms, respectively, which can be better applied to crowded pedestrian detection scenarios.

  • Research Hotspots and Reviews
    Guanrong WU, Yuanxiang LI, Yilin WANG, Yuhan LU, Xiuhua CHEN
    Computer Engineering. 2024, 50(3): 36-43. https://doi.org/10.19678/j.issn.1000-3428.0067599

    The existing few-shot classification methods are limited to inducing intra class commonalities from each round of support information, ignoring inter class correlations and category information carried by the samples themselves during the iteration process. Due to the fine and varied texture of metal damage, the resulting feature distribution has small inter class distance and large intra class distance. A few-shot metal surface damage classification method based on an inner and outer two-layer training model architecture is proposed, as the poor aggregation of feature distribution leads to a decrease in the performance of few-shot classification and a decrease in the generalization of new classes. The inner model uses metric methods to complete the metal classification task, while the outer model incorporates bimodal features as signals in the feature space. In the new mapping space, category label information is used to supervise the comparison of image features from different categories and optimize the feature distribution, resulting in improved inter-class discrimination and intra-class aggregation. During the training phase, the external model enhances the representation ability of the original space through backpropagation contrastive loss, thereby enhancing the measurement level of the internal model and improving classification accuracy. Additionally, the use of category embedding as a dynamic category center effectively reduces noise interference in small sample problems and enhances model generalization performance. Experimental results on three commonly used metal damage datasets, GC10, NEU, and APSD, demonstrate that the proposed method achieves superior classification accuracy compared to mainstream methods such as ProtoNet, MatchingNet, and RelationNet. In particular, the generalization ability of new categories is significantly improved. Under the 5-way 5-shot setting, the classification accuracy is by at least 5.24, 1.39, and 6.37 percentage points, with classification error reduction rates of 36.00%, 17.94%, and 66.15%, respectively. Specifically, the accuracy of new class classification increases from 36.53%, 82.43%, and 31.89% to 69.12%, 91.57%, and 48.23%, respectively. Under the 5-way 1-shot setting, the classification accuracy is improved by at least 8.34, 3.01, and 4.61 percentage points, with classification error reduction rates of 28.32%, 23.37%, and 46.57%, respectively.

  • Artificial Intelligence and Pattern Recognition
    Zhangjie RAN, Linfu SUN, Yisheng ZOU, Yulin MA
    Computer Engineering. 2023, 49(9): 52-59. https://doi.org/10.19678/j.issn.1000-3428.0065745

    A Knowledge Graph(KG) is composed of a large number of fact triples, which often contain a large number of few-shot relations that rarely appear in the real world. For these few-shot relations, it is challenging to complete the missing triples in the KG, and existing few-shot Knowledge Graph Completion(KGC) models cannot effectively extract the representation of few-shot relations. To address this problem, a few-shot KGC model based on a relation learning network is proposed. Considering the relevance of the relations, neighbor aggregation encoding is performed on the reference and query triples to obtain an enhanced entity embedding representation. The structure that integrates a Transformer encoder and Long Short-Term Memory(LSTM) neural network, allows the relation representation of triples to be encoded and output. The semantic similarity between query and dynamic reference relations is obtained using the attention mechanism and combined with the hypothesis of the translation model, whereby the possibility of establishing query triples is comprehensively scored. The experimental results show that the model can effectively extract the fine-grained semantics of few-shot relations by integrating path-finding and context semantics. Compared with the optimal value of the evaluation metrics in baseline models, the average improvement of few-shot link prediction tasks reach 9.5 percentage points with the proposed model.

  • Research Hotspots and Reviews
    Douwei LEI, Debiao HE, Min LUO, Cong PENG
    Computer Engineering. 2024, 50(2): 15-24. https://doi.org/10.19678/j.issn.1000-3428.0067167

    The rapid development of quantum computing seriously threatens the security of widely used public-key cryptography. Lattice-based cryptography occupies an essential position in Post-Quantum Cryptography(PQC) owing to its excellent anti-quantum security and efficient computational efficiency. In May 2022, the National Institute of Standards and Technology(NIST) published four PQC standards, three of which are lattice-based cryptography algorithms, along with Kyber. With the identification of post-quantum standards, the importance and need for their efficient implementation is increasing. This study presents an optimized and high-speed parallel implementation of the Kyber algorithm based on the Advanced Vector eXtensions 512(AVX512). It utilizes techniques such as lazy reduction, optimized Montgomery modular reduction, and optimized Number-Theoretic Transformation(NTT) to reduce unnecessary modular reduction operations and improve the efficiency and parallelism of polynomial computations by fully utilizing computer storage space. It also employs redundant bit technology to improve the parallel processing capability of bits during polynomial sampling. The 512 bit width of AVX512 is utilized to perform 8-way parallel Hash operations, and the resulting pseudo-random bit strings are properly scheduled to fully leverage parallel performance. Finally, this study implements polynomial computations and sampling on Kyber in high-speed parallel using the AVX512 instruction set and further implements the entire Kyber public-key encryption scheme. Performance test results indicate that the key generation and encryption algorithms in this study achieve 10 to 16 times acceleration compared to the C language implementation provided in the standard documentation, while the decryption algorithm achieves approximately 56 times acceleration.

  • Research Hotspots and Reviews
    Jiaxin WU, Yifei SUN, Yalan WU, Jigang WU
    Computer Engineering. 2024, 50(2): 59-67. https://doi.org/10.19678/j.issn.1000-3428.0066761

    Unmanned Aerial Vehicles(UAVs) are widely used to collect large-scale discrete node data because of their flexible maneuverability and high data transmission rates. The limited onboard energy also makes UAV energy consumption optimization a trending research hotspot.However, when eavesdropping nodes are formed in the environment, optimizing the energy consumption of drones while ensuring the secure data transmission from multiple discrete data nodes poses a significant challenges. Based on this, the introduction of relay nodes and secure capacity aims to ensure the secure transmission of data from a physical level and proposes a low-energy UAV trajectory optimization algorithm for secure transmission.The channel model between drones and ground nodes, the secure capacity between drones and data nodes, and the energy consumption of drone flight communication are established. The problem is formulated as a Non-deterministic Polynomial(NP) hard to solve the optimization problem to minimize drone energy consumption and the main constraint of secure data transmission between the data nodes and drones. The problem is decomposed into subproblems, and a self-organizing mapping method and customized particle swarm optimization algorithm are used to solve the optimal order of drone access to data nodes and the optimal position to hover around data nodes.Based on previous studies, three benchmark schemes are proposed for performance comparison. The simulation experimental results show that when the maximum output power of the energy collection circuit of the relay node changes, the proposed optimization algorithm is, on average, 7.25%, 8.59%, and 11.57% better than the BASE_D, BASE_M, BASE_R three benchmark schemes in reducing the total energy consumption of the drone. In addition, the performance of the proposed algorithm is superior to existing solutions in terms of the secure capacity implementation rate. For example, when the secure capacity threshold increases from 0.001 to 0.500, the proposed algorithm outperforms the benchmark scheme BASE_M is 23.45%.

  • Development Research and Engineering Application
    Ying HOU, Lin YANG, Xin HU, Shun HE, Wanying SONG, Qian ZHAO
    Computer Engineering. 2024, 50(3): 277-289. https://doi.org/10.19678/j.issn.1000-3428.0067416

    Escalators are widely used in public places. If passenger fall accidents cannot be detected and handled in a timely manner, they will cause serious personal injury. Therefore, it is imperative to achieve intelligent monitoring and management of escalators. Owing to the complex operating environment, large number of pedestrians, and local occlusion of escalators, traditional human posture feature fall detection models have poor performance and slow detection speed. A pedestrian fall detection algorithm for escalators is proposed based on the SwinT-YOLOX network model, which combines the excellent strategy of the Swin Transformer and YOLOX object detection algorithms. Adopting the Swin Transformer model as the backbone network, the neck network uses the YOLOX model with an added attention mechanism to further enhance the diversity and expression ability of feature maps. In addition, utilizing the Funnel Rectified Linear Unit (FReLU) visual activation function to construct a CBF module improves the structure of the neck and Head networks, thereby achieving better feature detection performance. The experimental results demonstrate that compared with algorithms such as AlphaPose, OpenPose, and YOLOv5, the detection performance of this algorithm is significantly improved for self-built escalator pedestrian fall databases and network collection of actual escalator pedestrian fall accidents. The average detection accuracy of pedestrian falls can reach 95.92%, with a detection frame rate of 24.08 frames/s, which can quickly and accurately detect the occurrence of passenger fall accidents. The monitoring management platform immediately takes safety emergency stop measures to ensure passenger safety.

  • Research Hotspots and Reviews
    Ying LIU, Yupeng MA, Fan ZHAO, Yi WANG, Tonghai JIANG
    Computer Engineering. 2024, 50(1): 39-49. https://doi.org/10.19678/j.issn.1000-3428.0067004

    Hyperledger Fabric is an alliance chain framework widely adopted both domestically and internationally. It exhibits characteristics such as numerous participating organizations, frequent transaction operations, and increased transaction conflicts in certain businesses based on Fabric technology. The multi-version concurrency control technology used in Fabric can partially resolve transaction conflicts as well as enhance system concurrency. However, this mechanism is imperfect and certain transaction data cannot be properly stored on the chain. To achieve complete, efficient, and trustworthy up-chain storage of massive transaction data, a data preprocessing mechanism based on the Fabric oracle machine is proposed. The Massive Conflict Preprocessing(MCPP) method is designed to ensure the integrity of transaction data with primary key conflicts through techniques including detection, monitoring, delayed submission, transaction locking, and reordering caching. Data transmission protection measures are introduced to utilize asymmetric encryption technology during transmission, preventing malicious nodes from forging authentication information and ensuring consistency before and after off-chain processing of transaction data. Theoretical analysis and experimental results demonstrate that this mechanism can effectively address concurrent conflict issues regarding up-chain massive transaction data in alliance chain platforms. When the transaction data scales reach 1 000 and 10 000, the MCPP method achieves time efficiency improvements of 38% and 21.4%, respectively, compared with the LMLS algorithm, with a success rate close to 100%. Thus, the proposed method exhibits efficiency and security, and does not impact Fabric system performance when concurrent conflicts do not occur.

  • Development Research and Engineering Application
    Xingya YAN, Yaxi KUANG, Guangrui BAI, Yue LI
    Computer Engineering. 2023, 49(7): 251-258. https://doi.org/10.19678/j.issn.1000-3428.0065369

    Student classroom behaviors can directly reflect the quality of the class, whereby the analysis and evaluation of classroom behaviors through artificial intelligence and big data can help improve the quality of teaching. Traditional student classroom behavior recognition methods rely on teachers' direct observation of students or an analysis of student surveillance videos after class. This method is time-consuming, labor-intensive, and has low recognition rate, making it difficult to follow problems in the classroom and during exams in real time. This study proposes a posture recognition method based on deep learning BetaPose. The data enhancement technology is used to improve the robustness of the subsequent detection model. The improved YOLOv5 target detection algorithm is used to obtain the human detection frame. Based on the MobileNetV3 model, the lightweight posture recognition model is designed to improve the accuracy of posture recognition in crowded scenes. The keypoints of the human body thus obtained are input into the linear classifier with improved modeling and expression ability to determine the final behavior results. The experimental show that the proposed lightweight posture recognition model BetaPose had the highest average recognition accuracy of 82.6% for various parts of the human body, and the recognition rates for various behaviors in simple and crowded scenes are above 91% and 85%, respectively. Therefore, the proposed model can be effectively recognize multiple behaviors in the classroom.