Author Login Editor-in-Chief Peer Review Editor Work Office Work

Highlights

Please wait a minute...
  • Select all
    |
  • Artificial Intelligence and Pattern Recognition
    Hongchen ZHANG, Linyu LI, Li YANG, Chenjun SAN, Chunlin YIN, Bing YAN, Hong YU, Xuan ZHANG
    Computer Engineering. 2024, 50(4): 168-176. https://doi.org/10.19678/j.issn.1000-3428.0067543

    A knowledge graph is a structured knowledge base comprising various types of knowledge or data units obtained through extraction and other processes. It is used to describe and represent information, such as entities, concepts, facts, and relationships. The limitations of Natural Language Processing(NLP) technology and the presence of noise in the texts of various knowledge or information units affect the accuracy of information extraction. Existing Knowledge Graph Completion(KGC) methods typically account for only single structural information or text semantic information, whereas the structural and text semantic information in the entire knowledge graph is disregarded. Hence, a KGC model based on contrastive learning and language model-enhanced embedding is proposed. The input entities and relationships are obtained using a pretrained language model to obtain the textual semantic information of the entities and relationships. The distance scoring function of the translation model is used to capture the structured information in the knowledge graph. Two negative sampling methods for contrastive learning are used to fuse contrastive learning to train the model to improve its ability to represent positive and negative samples. Experimental results show that compared with the Bidirectional Encoder Representations from Transformers for Knowledge Graph completion(KG-BERT) model, this model improves the average proportion of triple with ranking less than or equal to 10(Hits@10) indicator by 31% and 23% on the WN18RR and FB15K-237 datasets, respectively, thus demonstrating its superiority over other similar models.

  • Artificial Intelligence and Pattern Recognition
    Jida ZHAO, Guoyong ZHEN, Chengqun CHU
    Computer Engineering. 2024, 50(4): 113-120. https://doi.org/10.19678/j.issn.1000-3428.0068268

    In the Unmanned Aerial Vehicle(UAV) target detection task, missed and false detections are caused by the small size of the detection target and complex background of the detection image. To address the problem of small target detection, the UAV image target detection algorithm is proposed by improving YOLOv8s. First, for application scenarios where drone shooting targets are generally small, the number of Backbone layers of the algorithm is reduced, and the size of the feature map to be detected is increased such that the network model can focus more on small targets. Second, because a certain number of low-quality examples commonly influence the training effect in the dataset, the Wise-IoU loss function is introduced to enhance the training effect of the dataset. Third, by introducing a context enhancement module, the characteristic information of small targets in different receptive fields is obtained, and the positioning and classification effect of the network model on small targets in complex environments is improved. Finally, a spatial-channel filtering module is designed to enhance the characteristic information of the target during the convolution process to filter out useless interference information and address the problem of some small target characteristic information being submerged and lost during the convolution process. Experiment results on the VisDrone2019 dataset demonstrate that the average detection accuracy(mAP@0.5) of the proposed algorithm reaches 45.4%, which is 7.3 percentage points higher than that of the original YOLOv8s algorithm, and the number of parameters is reduced by 26.13%. Under similar experimental conditions, compared with other common small target detection algorithms, the detection accuracy and speed are improved to a certain extent.

  • Development Research and Engineering Application
    Shuai HU, Hualing LI, Dechen HAO
    Computer Engineering. 2024, 50(4): 286-293. https://doi.org/10.19678/j.issn.1000-3428.0067779

    Medical image segmentation accuracy plays a key role in clinical diagnosis and treatment. However, because of the complexity of medical images and diversity of target regions, existing medical image segmentation methods are limited to incomplete edge region segmentation and insufficient use of image context feature information. An improved Multistage Edge-Enhanced(MEE) medical image segmentation network of the U-Net, known as MDU-Net model, is proposed to solve these problems. First, a MEE module is added to the encoder structure to extract double-layer low-stage feature information, and the rich edge information in the feature layer is obtained by expanding the convolution blocks at different expansion rates. Second, a Detailed Feature Association(DFA) module integrating the feature information of adjacent layers is embedded in the skip connection to obtain deep-stage and multiscale context feature information. Finally, the feature information extracted from the different modules is aggregated in the corresponding feature layer of the decoder structure, and the final segmentation result is obtained by an upsampling operation. The experimental results on two public datasets show that compared with other models, such as Transformers make strong encoders for medical image segmentation(TransUNet), the MDU-Net model can efficiently use the feature information of different feature layers in medical images and achieve an improved segmentation effect in the edge region.

  • Graphics and Image Processing
    Liqun CUI, Huawei CAO
    Computer Engineering. 2024, 50(4): 228-236. https://doi.org/10.19678/j.issn.1000-3428.0067790

    Although target detection technology has advanced, many challenges still exist in the detection of remote-sensing images. An improved YOLOv5-based remote-sensing image target detection algorithm is proposed to address the issues of low target detection accuracy caused by complex backgrounds, large target scale differences, and arbitrary target orientation in remote-sensing images. First, a joint multiscale feature enhancement network with attention is constructed to fully fuse high-level and low-level features such that the feature layers contain semantic and rich detailed information. During the fusion process, the designed feature focusing module is used to help the model select key features and suppress irrelevant information. Second, a Receptive Field Block(RFB) is used to update the fused feature map and expand the receptive field of the feature map to reduce feature information loss. Finally, by adding rotation angles to the targets and using circular smooth labels to transform the regression problem into a classification problem, the accuracy of remote-sensing target localization is improved. The experimental results on the a large-scale Dataset for Object deTection in Aerial images(DOTA) show that compared with the YOLOv5 algorithm, the mean Average Precision(mAP) when the Intersection over Union (IoU) values of the proposed algorithm are 0.5 and 0.5-0.95 (mAP@0.5 and mAP@0.5∶0.95) increase by 7.3 and 3.3 percentage points, respectively. This can significantly improve the detection accuracy of remote-sensing image targets in a complex background and improve the missing and false detection of remote-sensing targets.

  • Intelligent Transportation
    Wei CHEN, Xiaolong WANG, Yanwei ZHANG, Guocheng AN, Bo JIANG
    Computer Engineering. 2024, 50(4): 11-19. https://doi.org/10.19678/j.issn.1000-3428.0068901

    In highway service areas, complex environments such as lighting and weather changes can cause a sharp decline in vehicle detection accuracy. In addition, factors such as the inclination angle of the camera and the height of installation can increase false-negative and false-positive rates. To this end, a vehicle violation detection algorithm based on the improved YOLOv8 is proposed for highway service areas. First, the feature pyramid pooling layer of the YOLOv8 network, a Dilated Space Pyramid Pooling(DSPP) module, and a DSPP based on branch Attention(DSPPA) module are constructed to reduce the loss of semantic information in the backbone. The Branch Attention(BA) mechanism in DSPPA assigns different weights to the branches with varying degrees of contribution, making the model focus more on features that are suitable for the target size. Second, a parking space allocation strategy based on global matching is designed to effectively reduce the false-negative and false-positive rates of illegal parking detection in situations involving tilted views and overlapping vehicles. The experimental results show that the improved algorithm reduces the false-negative rate of parking violation detection from 15% to 8% and the false-positive rate from 7.5% to 6.1%, demonstrating considerable performance improvement in vehicle violation detection.

  • Research Hotspots and Reviews
    Zhe LIAN, Yanjun YIN, Fei YUN, Min ZHI
    Computer Engineering. 2024, 50(3): 16-27. https://doi.org/10.19678/j.issn.1000-3428.0067427

    Natural scene text detection technology based on deep learning has become a crucial research focal point in the fields of computer vision and natural language processing. Not only does it possess a wide range of potential applications but also serves as a new platform for researchers to explore neural network models and algorithms. First, this study introduces the relevant concepts, research background, and current developments in natural scene text detection technology. Subsequently, an analysis of recent deep learning-based text detection methods is performed, categorizing them into four classes: detection boxes-, segmentation-, detection-boxes and segmentation-based, and others. The fundamental concepts and main algorithmic processes of classical and mainstream methods within these four categories are elaborated, summarizing the usage mechanisms, applicable scenarios, advantages, disadvantages, simulation experimental results, and environment settings of different methods, while clarifying their interrelationships. Thereafter, common public datasets and performance evaluation methods for natural scene text detection are introduced. Finally, the major challenges facing current deep learning-based natural scene text detection technology are outlined, and future development directions are discussed.

  • Cyberspace Security
    Huawei SONG, Shengqi LI, Fangjie WAN, Yuping WEI
    Computer Engineering. 2024, 50(3): 166-172. https://doi.org/10.19678/j.issn.1000-3428.0067791

    Federated Learning(FL) can collaborate to train global models without compromising data privacy. Nonetheless, this collaborative training approach faces the challenge of Non-IID in the real world; slow model convergence and low accuracy. Numerous existing FL methods improve only from one perspective of global model aggregation and local client update, and inevitably will not cause the impact of the other perspective and reduce the quality of the global model. In this context, we introduce a hierarchical continuous learning optimization method for FL, denoted as FedMas, which is based on the idea of hierarchical fusion. First, clients with similar data distribution are divided into different layers using the DBSCAN algorithm, and only part of clients of a certain layer are selected for training each time to avoid weight differences caused by different data distributions when the server global model is aggregated. Further, owing to the different data distributions of each layer, the client combines the solution of continuous learning catastrophic forgetting during local update to effectively integrate the differences between the data of different layers of clients, thus ensuring the performance of the global model. Experiments on MNIST and CIFAR-10 standard datasets demonstrate that the global model test accuracy is improved by 0.3-2.2 percentage points on average compared with FedProx, Scaffold, and FedCurv FL algorithms.

  • Graphics and Image Processing
    Fangxin XU, Rong FAN, Xiaolu MA
    Computer Engineering. 2024, 50(3): 250-258. https://doi.org/10.19678/j.issn.1000-3428.0067741

    Aiming at the problem that the detection algorithm is prone to omission and false detection in crowded pedestrian detection scenarios, this study proposes an improved YOLOv7 crowded pedestrian detection algorithm. Introducing a BiFormer visual transformer and an improved RepConv and Channel Space Attention Module (CSAM)-based Efficient Layer Aggregation Network (RC-ELAN) module in the backbone network, the self-attention mechanism and the attention module enable the backbone network to focus more on the important features of the occluded pedestrians, effectively mitigating the adverse effects of the missing target features on the detection. The improved neck network based on the idea of a Bidirectional Feature Pyramid Network (BiFPN) is used, and the transposed convolution and improved Rep-ELAN-W module enable the model to efficiently utilize the small-target feature information in the middle and low-dimensional feature maps, effectively improving the small-target pedestrian detection performance of the model. The introduction of an Efficient Complete Intersection-over-Union (E-CIoU) loss function allows the model to further converge to a higher accuracy. Experimental results on the WiderPerson dataset containing a large number of small target-obscuring pedestrians demonstrate that the average accuracies of the improved YOLOv7 algorithm when the IoU thresholds are set to 0.5 and 0.5-0.95 are improved by 2.5 and 2.8, 9.9 and 7.1, and 12.3 and 10.7 percentage points compared with the YOLOv7, YOLOv5, and YOLOX algorithms, respectively, which can be better applied to crowded pedestrian detection scenarios.

  • Graphics and Image Processing
    Yanqiong SHI, Zhao ZHA, Wenliang ZHANG, Eryu DAI, Zhong CHEN
    Computer Engineering. 2024, 50(3): 233-241. https://doi.org/10.19678/j.issn.1000-3428.0067143

    Shape From Focus (SFF) is an important technique in the field of non-contact 3D reconstruction. Owing to the influence of the environment and the limitations of the camera, the image acquisition process inevitably generates noise, which affects the reconstruction accuracy. To address this problem, a high-precision, noise-resistant SFF method is proposed. First, the defocused sequence image is evaluated using the focus measure function to obtain the focus measure sequence image, and the initial depth map is obtained by locating the pixel focused position using the Gaussian fitting peak search method. Subsequently, the confidence map of the initial depth map is generated by measuring the confidence of the depth estimation based on the similarity between the focus measure curve and the grayscale curve of the pixel. Finally, a confidence map is used as the guide map to filter the initial depth map and obtain the optimized depth map. In the experiment, multiple sets of simulated defocused sequence images and real micro-defocused sequence images are used to verify the performance of the proposed method. The results demonstrate that the proposed method achieves excellent 3D reconstruction results in both simulation and real defocus sequences. In real data experiments, the root mean square error is reduced by at least 64.8% and 47.3%, respectively, and the correlation coefficient is improved by at least 2.18% and 6.35%, respectively, compared with the traditional methods. The proposed method has higher accuracy and stronger noise immunity, which can effectively improve the accuracy of the SFF.

  • Development Research and Engineering Application
    Jiayuan ZHAO, Yuru ZHANG, Xiaodong SU, Hongyan XU, Shizhou LI, Yurong ZHANG
    Computer Engineering. 2024, 50(3): 317-325. https://doi.org/10.19678/j.issn.1000-3428.0067134

    Human pose estimation necessitates the use of visual cues and anatomical joint relationships to pinpoint key points. Existing Convolutional Neural Network(CNN) methods falter in addressing long-range contextual cues and modeling dependencies among distant joints. This paper introduces an attention-based implicit modeling method that iteratively computes feature correlations between joints, thus implicitly modeling the constraint relationships among key points. This method diverges from the localized operations characteristic of CNN by expanding the network's receptive field and modeling dependencies between distantly positioned joints. To counteract the diminished visibility of crucial keypoints during network training, a focal loss function is implemented, prompting the network to concentrate on complex keypoints. Comparative experiments were performed under identical conditions using the state-of-the-art High-Resolution Network(HRNet) and the classic Residual Network(ResNet) as backbone networks. Results reveal that the implicit modeling network enhances human pose estimation performance. For instance, utilizing HRNet as the backbone, the algorithm's accuracy on the MPII and MSCOCO human pose estimation benchmark datasets improved by 1.7% and 2.6%, respectively, surpassing the original network's performance.

  • Research Hotspots and Reviews
    Haoyang LI, Xiaowei HE, Bin WANG, Hao WU, Qi YOU
    Computer Engineering. 2024, 50(2): 43-50. https://doi.org/10.19678/j.issn.1000-3428.0066399

    Load prediction is an essential part of cloud computing resource management. Accurate prediction of cloud resource usage can improve cloud platform performance and prevent resource wastage. However, the dynamic and mutative use of cloud computing resources makes load prediction difficult, and managers cannot allocate resources reasonably. In addition, although Informer has achieved better results in time-series prediction, it does not impose restrictions on the causal dependence of time, causing future information leakage. Moreover, it does not consider the increase in network depth leading to model performance degradation. A multi-step load prediction model based on an improved Informer, known as Informer-DCR, is proposed. The regular convolution between attention blocks in the encoder is replaced by dilated causal convolution, such that the upper layer in the deep network can receive a wider range of input information to improve the prediction accuracy of the model, and ensure the causality of the time-series prediction process. Simultaneously, the residual connection is added to the encoder, such that the input information of the lower layer of the network is directly transmitted to the subsequent higher layer, and the deep network degradation is solved to improve the model performance. The experimental results demonstrate that compared with the mainstream prediction models such as Informer and Temporal Convolutional Network(TCN), the Mean Absolute Error(MAE) of the Informer-DCR model is reduced by 8.4%-40.0% under different prediction steps, and Informer-DCR exhibits better convergence than Informer during the training process.

  • Research Hotspots and Reviews
    Douwei LEI, Debiao HE, Min LUO, Cong PENG
    Computer Engineering. 2024, 50(2): 15-24. https://doi.org/10.19678/j.issn.1000-3428.0067167

    The rapid development of quantum computing seriously threatens the security of widely used public-key cryptography. Lattice-based cryptography occupies an essential position in Post-Quantum Cryptography(PQC) owing to its excellent anti-quantum security and efficient computational efficiency. In May 2022, the National Institute of Standards and Technology(NIST) published four PQC standards, three of which are lattice-based cryptography algorithms, along with Kyber. With the identification of post-quantum standards, the importance and need for their efficient implementation is increasing. This study presents an optimized and high-speed parallel implementation of the Kyber algorithm based on the Advanced Vector eXtensions 512(AVX512). It utilizes techniques such as lazy reduction, optimized Montgomery modular reduction, and optimized Number-Theoretic Transformation(NTT) to reduce unnecessary modular reduction operations and improve the efficiency and parallelism of polynomial computations by fully utilizing computer storage space. It also employs redundant bit technology to improve the parallel processing capability of bits during polynomial sampling. The 512 bit width of AVX512 is utilized to perform 8-way parallel Hash operations, and the resulting pseudo-random bit strings are properly scheduled to fully leverage parallel performance. Finally, this study implements polynomial computations and sampling on Kyber in high-speed parallel using the AVX512 instruction set and further implements the entire Kyber public-key encryption scheme. Performance test results indicate that the key generation and encryption algorithms in this study achieve 10 to 16 times acceleration compared to the C language implementation provided in the standard documentation, while the decryption algorithm achieves approximately 56 times acceleration.

  • Research Hotspots and Reviews
    Yi SUN, Huimei WANG, Ming XIAN, Hang XIANG
    Computer Engineering. 2024, 50(2): 25-32. https://doi.org/10.19678/j.issn.1000-3428.0067396

    Kubeflow is a project that integrates machine learning and cloud computing technology, integrating a large number of machine learning tools and providing a feasible solution for the deployment of production-grade machine learning platforms. Machine learning relies on specialized Graphics Processing Unit(GPU)s to improve training and inference speed. As the size of cloud computing clusters is dynamically adjusted, computing nodes of different computing architectures can be added or removed from the cluster, and traditional round-robin scheduling strategies cannot realize the dynamic adjustment of heterogeneous computing power resources. To solve the allocation and optimization problems of Kubeflow's heterogeneous computing power, improve the utilization rate of platform resources, and achieve load balancing, a cloud-based Central Processing Unit-GPU(CPU-GPU) heterogeneous computing power scheduling strategy is proposed. This scheduling strategy adopts two judgment indicators: weighted load balancing degree and priority, and fine-grained allocation of display memory to achieve granularity of computing power resources. The optimal deployment scheme of Pod is designed according to the resource weight matrix of each node in the cluster, and an improved genetic algorithm is used for optimal deployment. The experimental results show that this scheduling strategy performs better for parallel tasks. It can execute optimal loads under overflow of resource requests. Compared with the original platform-native strategy, the degree of resource fine-tuning is one order of magnitude higher, and the cluster load balancing performance is also significantly improved.

  • Research Hotspots and Reviews
    Jiaxin WU, Yifei SUN, Yalan WU, Jigang WU
    Computer Engineering. 2024, 50(2): 59-67. https://doi.org/10.19678/j.issn.1000-3428.0066761

    Unmanned Aerial Vehicles(UAVs) are widely used to collect large-scale discrete node data because of their flexible maneuverability and high data transmission rates. The limited onboard energy also makes UAV energy consumption optimization a trending research hotspot.However, when eavesdropping nodes are formed in the environment, optimizing the energy consumption of drones while ensuring the secure data transmission from multiple discrete data nodes poses a significant challenges. Based on this, the introduction of relay nodes and secure capacity aims to ensure the secure transmission of data from a physical level and proposes a low-energy UAV trajectory optimization algorithm for secure transmission.The channel model between drones and ground nodes, the secure capacity between drones and data nodes, and the energy consumption of drone flight communication are established. The problem is formulated as a Non-deterministic Polynomial(NP) hard to solve the optimization problem to minimize drone energy consumption and the main constraint of secure data transmission between the data nodes and drones. The problem is decomposed into subproblems, and a self-organizing mapping method and customized particle swarm optimization algorithm are used to solve the optimal order of drone access to data nodes and the optimal position to hover around data nodes.Based on previous studies, three benchmark schemes are proposed for performance comparison. The simulation experimental results show that when the maximum output power of the energy collection circuit of the relay node changes, the proposed optimization algorithm is, on average, 7.25%, 8.59%, and 11.57% better than the BASE_D, BASE_M, BASE_R three benchmark schemes in reducing the total energy consumption of the drone. In addition, the performance of the proposed algorithm is superior to existing solutions in terms of the secure capacity implementation rate. For example, when the secure capacity threshold increases from 0.001 to 0.500, the proposed algorithm outperforms the benchmark scheme BASE_M is 23.45%.

  • Cyberspace Security
    Shuaiwei LIU, Zhi LI, Guomei WANG, Li ZHANG
    Computer Engineering. 2024, 50(2): 180-187. https://doi.org/10.19678/j.issn.1000-3428.0067077

    Adversarial attack and defense is a popular research area in computer security. Trans-GAN, an adversarial example generation algorithm based on the combination of Transformer and Generate Adversarial Network(GAN), is proposed to address the problems of the poor visual quality of existing gradient-based adversarial example generation methods and the low generation efficiency of optimization-based methods. First, the algorithm utilizes the powerful visual representation capability of the Transformer as a reconstruction network for receiving clean images and generating adversarial noise. Second, the Transformer reconstruction network is combined with a deep convolutional network-based discriminator as a generator to form a GAN architecture, which improves the authenticity of the generated images and ensures the stability of training. Meanwhile, the improved attention mechanism, Targeted Self-Attention, is proposed to introduce target labels as a priori knowledge when training the network, which guides the network model to learn to generate adversarial perturbations with specific attack targets. Finally, adversarial noise is added to the clean examples using skip-connections to form adversarial examples. Experimental results demonstrate that the proposed algorithm achieves an attack success rate of more than 99.9% on both models used for the MNIST dataset and 96.36% and 98.47% on the two models used for the CIFAR10 dataset, outperforming the current state-of-the-art generative-based adversarial attack methods. The qualitative results show that compared to the Fast Gradient Sign Method(FGSM)and Projected Gradient Descent(PGD)algorithms, the generated adversarial noise of the Trans-GAN algorithm is less perturbed, and the formed adversarial examples are more natural and meet the requirements of human vision, which is not easily distinguished.

  • Graphics and Image Processing
    Bingyan ZHU, Zhihua CHEN, Bin SHENG
    Computer Engineering. 2024, 50(1): 216-223. https://doi.org/10.19678/j.issn.1000-3428.0066941

    Owing to the rapid development of remote sensing technology, remote sensing image detection technology is being used extensively in agriculture, military, national defense security, and other fields. Compared with conventional images, remote sensing images are more difficult to detect; therefore, researchers have endeavored to detect remote sensing images efficiently and accurately. To address the high calculation complexity, large-scale range variation, and scale imbalance of remote sensing images, this study proposes a perceptually enhanced Swin Transformer network, which improves the detection of remote sensing images. Exploiting the hierarchical design and shift windows of the basic Swin Transformer, the network inserts spatial local perceptually blocks into each stage, thus enhancing local feature extraction while negligibly increasing the calculation amount. An area-distributed regression loss is introduced to assign larger weights to small objects for solving scale imbalance; additionally, the network is combined with an improved IoU-aware classification loss to eliminate the discrepancy between different branches and reduce the loss of classification and regression. Experimental results on the public dataset DOTA show that the proposed network yields a mean Average Precision(mAP) of 78.47% and a detection speed of 10.8 frame/s, thus demonstrating its superiority over classical object detection networks(i.e., Faster R-CNN and Mask R-CNN) and existing excellent remote sensing image detection networks. Additionally, the network performs well on all types of objects at different scales.

  • Artificial Intelligence and Pattern Recognition
    Zhiqiang WU, Qing XIE, Lin LI, Yongjian LIU
    Computer Engineering. 2024, 50(1): 91-100. https://doi.org/10.19678/j.issn.1000-3428.0066929

    Many existing Graph Neural Network(GNN) recommendation algorithms use the node number information of the user-item interaction graph for training and learn the high-order connectivity among user and item nodes to enrich their representations. However, user preferences for different modal information are ignored, modal information such as images and text of items are not utilized, and the fusion of different modal features is summed without distinguishing the user preferences for different modal information types. A multimodal fusion GNN recommendation model is proposed to address this problem. First, for a single modality, a unimodal graph network is constructed by combining the user-item interaction bipartite graph, and the user preference for this modal information is learned in the unimodal graph. Graph ATtention(GAT) network is used to aggregate the neighbor information and enrich the local node representation, and the Gated Recurrent Unit(GRU) is used to decide whether to aggregate the neighbor information to achieve the denoising effect. Finally, the user and item representations learned from each modal graph are fused by the attention mechanism to obtain the final representation and then sent to the prediction module. Experimental results on the MovieLens-20M and H&M datasets show that the multimodal information and attention fusion mechanism can effectively improve the recommendation accuracy, and the algorithm model has significant improvements in Precision@K, Recall@K, and NDCG@K compared with the baseline optimal algorithm for the three indicators. When an evaluation index K value of 10 is selected, Precision@10, Recall@10, and NDCG@10 increase by 4.67%, 2.42%, 2.03%, and 2.49%, 5.24%, 2.05%, respectively, for the two datasets.

  • Research Hotspots and Reviews
    Ying LIU, Yupeng MA, Fan ZHAO, Yi WANG, Tonghai JIANG
    Computer Engineering. 2024, 50(1): 39-49. https://doi.org/10.19678/j.issn.1000-3428.0067004

    Hyperledger Fabric is an alliance chain framework widely adopted both domestically and internationally. It exhibits characteristics such as numerous participating organizations, frequent transaction operations, and increased transaction conflicts in certain businesses based on Fabric technology. The multi-version concurrency control technology used in Fabric can partially resolve transaction conflicts as well as enhance system concurrency. However, this mechanism is imperfect and certain transaction data cannot be properly stored on the chain. To achieve complete, efficient, and trustworthy up-chain storage of massive transaction data, a data preprocessing mechanism based on the Fabric oracle machine is proposed. The Massive Conflict Preprocessing(MCPP) method is designed to ensure the integrity of transaction data with primary key conflicts through techniques including detection, monitoring, delayed submission, transaction locking, and reordering caching. Data transmission protection measures are introduced to utilize asymmetric encryption technology during transmission, preventing malicious nodes from forging authentication information and ensuring consistency before and after off-chain processing of transaction data. Theoretical analysis and experimental results demonstrate that this mechanism can effectively address concurrent conflict issues regarding up-chain massive transaction data in alliance chain platforms. When the transaction data scales reach 1 000 and 10 000, the MCPP method achieves time efficiency improvements of 38% and 21.4%, respectively, compared with the LMLS algorithm, with a success rate close to 100%. Thus, the proposed method exhibits efficiency and security, and does not impact Fabric system performance when concurrent conflicts do not occur.

  • Graphics and Image Processing
    Xinlu JIANG, Tianen CHEN, Cong WANG, Chunjiang ZHAO
    Computer Engineering. 2024, 50(1): 232-241. https://doi.org/10.19678/j.issn.1000-3428.0067030

    Intelligent pest detection is an essential application of target detection technology in the agricultural field. This detection method effectively improves the efficiency and reliability of pest detection and reporting work and ensures crop yield and quality. Under fixed-trapping devices such as insect traps and sticky insect boards, the image background is simple, the lighting conditions are stable, and the pest features are significant and easy to extract. Pest detection can achieve high accuracy, but its application scenario is fixed, and the detection range is limited to the surrounding equipment and cannot adapt to complex field environments. A small object pest detection model called Pest-YOLOv5 is proposed to improve the flexibility of pest detection and prediction to address the difficulties and missed detections attributed to complex image backgrounds and small pest sizes in field environments. By adding a Coordinate Attention(CA) mechanism in the feature extraction network and combining spatial and channel information, the ability to extract small object pest features is enhanced. The Bidirectional Feature Pyramid Network(BiFPN) structure is used in the neck connection section, and multi-scale features are combined to alleviate the problem of small object information loss caused by multiple convolutions. Based on this, SIoU and VariFocal loss functions are used to calculate losses, and the optimal classification loss weight coefficients are obtained experimentally, making the model more focused on object samples that are difficult to classify. The experimental results on a subset of the publicly available dataset, AgriPest, show that the Pest-YOLOv5 model has mAP0.5 and recall of 70.4% and 67.8%, respectively, which are superior to those of classical object detection models, such as the original YOLOv5s model, SSD, and Faster R-CNN. Compared with the YOLOv5s model, the Pest-YOLOv5 model improves the mAP0.5, mAP0.50∶0.95, and recall by 8.1%, 7.9%, and 12.8%, respectively, enhancing the ability to detect targets.

  • Research Hotspots and Reviews
    Yimeng QIAO, Yinan JING, Hanbing ZHANG
    Computer Engineering. 2024, 50(1): 30-38. https://doi.org/10.19678/j.issn.1000-3428.0066743

    Owing to the significant latency of exact queries on large-scale datasets, Approximate Query-Processing(AQP) techniques are typically applied to online analytical processing to return query results within interactive timescales with minimal error. The existing learning-based AQP methods decouple the underlying data and convert I/O-intensive calculations into CPU-intensive calculations. However, because of the limitations of computing resources, model training is typically performed based on random data samples.Such training data eliminate rare populations, thus resulting in unsatisfactory prediction accuracy by the model. Hence, this paper proposes a Stratified Sampling-based Sum-Product Network(SSSPN) model and designs an AQP framework based on the abovementioned model.Stratified samples can effectively avoid the elimination of rare populations and significantly improves the model accuracy. Additionally, in terms of dynamic data updates, this paper proposes an adaptive model-update strategy that allows the model to detect data shifts timely and automatically perform updates adaptively.Experimental results show that compared with the performance of AQP methods based on sampling and machine learning, the average relative errors of this model on real and synthetic datasets are approximately 18.3% and 2.2% lower, respectively; in scenarios where data are dynamically updated, both the accuracy and query latency of the model are favorable.

  • Frontiers in Computer Systems
    Yanfei FANG, Qi LIU, Enming DONG, Yanbing LI, Feng GUO, Di WANG, Wangquan HE, Fengbin QI
    Computer Engineering. 2023, 49(12): 10-24. https://doi.org/10.19678/j.issn.1000-3428.0066548

    Manycore has become the mainstream processor architecture for building HPC supercomputer systems, providing powerful computing power for High Performance Computing(HPC) exascale supercomputers. With the increasing number of cores integrated on manycore processor chips, the competition for large-scale cores for memory resources has become more intense. Manycore on-chip memory hierarchy is an important structure that alleviates the "memory wall" problem, aids HPC applications better play the computing advantages of manycore processors, and improves the performance of practical applications. The design has a significant impact on the performance, power consumption, and area of an on-chip system. The design of a many-call on-chip memory hierarchy has a significant impact on the performance, power consumption, and area of manycore systems. It is an important part of the structural design of manycore systems and is a research interest in the industry. Owing to the differences in the development history of manycore chips, the design technology of on-chip microarchitecture, and the different requirements of the application fields, the current HPC mainstream manycore on-chip storage hierarchy is different; however, from the perspective of horizontal comparison and the vertical development trend of each processor, as well as from the changes in application requirements brought by the continuous integration and development of HPC, data science, and machine learning, the hybrid structure of the SPM+Cache would most likely become the mainstream choice for the on-chip storage hierarchy designs of manycore processors in HPC exascale supercomputer systems in the future. For exascale computing software and algorithms, the designs and optimization based on the characteristics of the manycore memory hierarchy can aid HPC applications benefit from the computing advantages of manycore processors, thus effectively improving the performance of practical applications. Therefore, software, algorithm design, and optimization technology for the characteristics of the manycore on-chip storage hierarchy is also a research interest in the industry. This study first partitioned the on-chip memory hierarchy into multilevel Cache, SPM, and SPM+Cache hybrid structures according to different organizations, and then summarized and analyzed the advantages and disadvantages of these structures. This study analyzed the current status and development trend of the memory hierarchy designs of the chips of mainstream exascale supercomputer systems, such as the international mainstream GPU, homogeneous manycore, and domestic manycore. In summary, the research status of software and hardware technologies is related to the design and optimization of the memory hierarchy from the manycore of the manycore LLC management and cache consistency protocol, SPM management and data movement optimization, and the global perspective optimization of the SPM+cache hybrid architecture. Thus, this study looks forward to the future research direction of on-chip memory hierarchy based on different perspectives, such as hardware, software, and algorithm designs.

  • Artificial Intelligence and Pattern Recognition
    Qiru LI, Xia GENG
    Computer Engineering. 2023, 49(12): 111-120. https://doi.org/10.19678/j.issn.1000-3428.0066348

    The traditional Deep Q Network(DQN) algorithm solves the dimensionality problem of Q-learning algorithms in complex environments by integrating deep neural networks and reinforcement learning methods that are widely used in the path planning of mobile robots. However, the traditional DQN algorithm has a low network convergence speed and poor path planning effect, and consequently, obtaining the optimal path in a short training round is challenging. To solve these problems, an improved ERDQN algorithm is proposed. The Q value is recalculated by recording the frequency of the repeated states. The more times a state is repeated in the process of network training, the lower the probability of the next occurrence of the state. This phenomenon can improve the robot's ability to explore the environment, reduce the risk of network convergence to the local optima to a certain extent, and reduce the number of training rounds required for network convergence. The reward function is redesigned according to the moving direction of the robot, and the distance between the robot and target point. The robot obtains a positive reward when it is close to the target point and a negative reward when it is far from the target point. The absolute value of the reward is adjusted according to the current moving direction of the robot, and the distance between the robot and target point; thus, the robot can plan a better path while avoiding obstacles. The experimental results show that compared with the DQN algorithm, the average score of the ERDQN algorithm is increased by 18.9%, whereas the path length and number of planned rounds reduced by approximately 20.1% and 500, respectively. These results prove that the ERDQN algorithm can effectively improve network convergence speed and path planning performance.

  • Frontiers in Computer Systems
    Junchao YE, Cong XU, Yao HUANG, Zhilei CHAI
    Computer Engineering. 2023, 49(12): 35-45. https://doi.org/10.19678/j.issn.1000-3428.0066260

    As a third-generation neural network, the Spiking Neural Network(SNN) uses neurons and synapses as the basic computing units, and its working mechanism is similar to that of the biological brain. Its complex topology of intra-layer connections and reverse connections has the potential to solve complex problems. Compared with the Leaky-Integrate-and-Fire(LIF) model, the Izhikevich neuron model can support a wider range of neuromorphic computing by simulating more biological impulse phenomena; however, the Izhikevich neuron model has higher computational complexity, leading to potential issues of suboptimal performance and increased power consumption within the network. To address these problem, a customized calculation method of Izhikevich neurons based on FPGA is proposed. First, by studying the value range of the parameters of Izhikevich neurons in the SNN and balancing the relative errors of the membrane potential and resource consumption, a fixed-point solution with mixed-precision is designed. Second, for a single neuron, the data path of the calculation equation is updated by balancing the neuron to achieve the minimum pipeline length. Furthermore, at the network level, a scalable computing architecture is devised to accommodate varying FPGA scales, ensuring adaptability across different configurations. Finally, the customized computing method is used to accelerate the classical NEST simulator. The experimental results reveal that, compared with that of the i7-10700 CPU, the performance of the classic lateral geniculate nucleus network model and the liquid state machine model on the ZCU102 is 2.26 and 3.02 times better in average, and the energy efficiency ratio is improved by 8.06 and 10.8 times in average.

  • Frontiers in Computer Systems
    Yi CHEN, Bosheng LIU, Yongqi XU, Jigang WU
    Computer Engineering. 2023, 49(12): 1-9. https://doi.org/10.19678/j.issn.1000-3428.0066701

    Deep Convolutional Neural Network(CNN) have large models and high computational complexity, making their deployment in Programmable Gate Array(FPGA) with limited hardware resources difficult. Hybrid precision CNNs can provide an effective trade-off between model size and accuracy, thus providing an efficient solution for reducing the model's memory footprint. As a fast algorithm, the Fast Fourier Transform(FFT) can convert traditional spatial domain CNNs into the frequency domain, effectively reducing the computational complexity of the model. This study presents an FPGA-based accelerator design for 8 bit and 16 bit hybrid precision frequency domain CNNs that supports the dynamic configuration of 8 bit and 16 bit frequency domain convolutions and can pack 8 bit frequency domain multiplication operations to enable the reuse of DSPs for performance improvement. A DSP-based Frequency-domain Processing Element(FPE) is designed to support 8 bit and 16 bit frequency domain convolution operations. It can pack a couple of 8 bit frequency domain multiplications to reuse DSPs to boost throughput. In addition, a mapping dataflow that supports both 8 bit and 16 bit computation patterns and can maximize the reduction of redundant data processing and data movement through data reuse is proposed. The proposed accelerator is evaluated based on the ResNet-18 and VGG16 models using the ImageNet dataset. The experimental results reveal that the proposed model can achieve 29.74 and 56.73 energy efficiency ratio(ratio of GOP to energy consumption)on the ResNet-18 and VGG16 models, respectively, which is 1.2-6.0 times better than those of frequency domain FPGA accelerators.

  • Graphics and Image Processing
    Hong ZHAO, Yubo FENG
    Computer Engineering. 2023, 49(12): 194-204. https://doi.org/10.19678/j.issn.1000-3428.0066520

    In tasks involving traffic sign detection, the YOLOv5 detection algorithm encounters several issues including missed detections, erroneous detections, and a complex model in complex environments and road conditions. To address these challenges, an improved CGS-Ghost YOLO detection model is proposed. YOLOv5 uses the focus module for sampling, which introduces more parameters. In this study, the StemBlock module is used to replace the focus module for sampling after input, which can reduce the number of parameters while maintaining the accuracy. CGS-Ghost YOLO uses a Coordinate Attention(CA) mechanism, which improves the semantic and location information within the features and enhances the feature extraction ability of the model. Additionally, a CGS convolution module, which combines the SMU activation function with GroupNorm(GN) normalization, is proposed. The CGS convolution module is designed to avoid the influence of the batch Size on the model during training and improve model performance. This study aims to use GhostConv to reduce the number of model parameters and effectively improve the detection accuracy of the model.The loss function, $ \alpha $-CIoU Loss+VFocal Loss, is used to solve the problem of unbalanced positive and negative samples in traffic sign detection tasks and improve the overall performance of the model. The neck part uses a Bi-FPN bidirectional feature pyramid network, ensuring that the multi-scale features of the detection target are effectively fused. The results of an experiment on the TT100K traffic sign detection dataset show that the detection accuracy of the improved CGS-Ghost YOLO model reaches 93.1%, which is 11.3 percentage points higher than the accuracy achieved by the original model. Additionally, the proposed network model reduces the model parameter quantity by 21.2 percentage points compared to the original model. In summary, the network model proposed in this study optimizes the convolution layer and the downsampling part, thus considerably reducing the model parameters while enhancing the model detection accuracy.

  • Research Hotspots and Reviews
    Chang WANG, Leixiao LI, Yanyan YANG
    Computer Engineering. 2023, 49(11): 1-12. https://doi.org/10.19678/j.issn.1000-3428.0066661

    The fatigue driving detection method based on computer vision has the advantage of being noninvasive and does not affect driving behavior, making it easy to apply in practical scenarios.With the development of computer technology, an increasing number of researchers are studying fatigue driving detection methods based on computer vision. Fatigue driving behavior is mainly reflected in the face and limbs. Furthermore, in the field of computer vision, facial behavior is easier to obtain than physical behavior. Therefore, facial-feature-based fatigue driving detection methods have become an important research direction in the field of fatigue driving detection. Various fatigue driving detection methods are analyzed comprehensively based on multiple facial features of drivers, and the latest research results worldwide are summarized.The specific behaviors of drivers with different facial features under fatigue conditions are introduced, and the fatigue driving detection process is discussed based on multiple facial features. Results from research conducted worldwide are classified based on different facial features, and different feature extraction methods and state discrimination methods are classified. The parameters used to distinguish driver fatigue status are summarized based on the various behaviors generated by different features in a state of fatigue. Furthermore, current research results on the use of facial multi-feature comprehensive discrimination for fatigue driving are described, and the similarities and differences of different methods are analyzed. On this basis, the shortcomings in the current field of fatigue driving detection based on facial multi-feature fusion are discussed, and future research directions in this field are described.

  • Research Hotspots and Reviews
    Jinsheng CHEN, Wenzhen MA, Shaofeng FANG, Ziming ZOU
    Computer Engineering. 2023, 49(11): 13-23. https://doi.org/10.19678/j.issn.1000-3428.0066521

    With the construction of the Meridian Project all-sky airglow imager observation network, a large amount of raw airglow image data has been accumulated. The current atmospheric gravity wave research based on airglow observation is extremely dependent on manual identification, which is very time-consuming, and the quality of labeling is difficult to guarantee. Therefore, there is an urgent need for a fast and effective automatic identification method. To solve the problem of sparsely labeled samples of atmospheric gravity waves, this paper proposes an algorithm based on the improved Cycle GAN model to expand the atmospheric gravity wave airglow observation dataset, thereby greatly improving the recognition accuracy of atmospheric gravity waves by labeling only a small number of samples. A new intelligent recognition algorithm for atmospheric gravity waves is also proposed by improving the YOLOv5s model backbone network and bounding box prediction, considering the characteristics of low Signal-to-Noise Ratio(SNR) between the recognition target and background in airglow images. The experimental results showed that using the augmented dataset and improved YOLOv5s target detection algorithm, the average precision reached 75.8% under an Intersection-over-Union(IoU) threshold of 0.5, which is 9.7 percentage points higher than that of the original model. Meanwhile, the detection speed and average recognition accuracy are superior to mainstream target detection algorithms compared.

  • Artificial Intelligence and Pattern Recognition
    Jun LUO, Qingwei GAO, Yi TAN, Dawei ZHAO, Yixiang LU, Dong SUN
    Computer Engineering. 2023, 49(11): 49-60. https://doi.org/10.19678/j.issn.1000-3428.0065787

    Label-specific features are a research hotspot in multi-label learning, which utilizes label feature extraction to solve the problem of multiple class labels in a single instance. Existing research on multi-label classification usually considers only the correlation between labels and ignores the local manifold structure between the original data, which results in a decrease in classification accuracy. In addition, in label correlation, the structural relationship between features and labels, as well as the inherent causal relationship between labels, are often overlooked. To address these issues, in this study, a multi-label learning algorithm based on double Laplace regularization and causal inference is proposed. Linear regression models are used to establish a basic multi-label classification framework which is combined with causal learning to explore the inherent causal relationships between labels, to achieve the goal of mining the essential connections between labels. To fully utilize the structural relationship between features and labels, double Laplace regularization is added to mine local label association information and effectively maintain the local manifold structure of the original data. The effectiveness of the proposed algorithm is verified on a public multi-label dataset. The experimental results showed that compared to algorithms such as LLSF, ML-KNN, and LIFT, the proposed algorithm achieved an average performance improvement of 8.82%, 4.98%, 9.43%, 16.27%, 12.19%, and 3.35% in terms of Hamming Loss(HL), Average Precision(AP), One Error(OE), Ranking Loss(RL), coverage, and AUC, respectively.

  • Research Hotspots and Reviews
    Qilin WU, Yagu DANG, Shanwei XIONG, Xu JI, Kexin BI
    Computer Engineering. 2023, 49(11): 24-29, 39. https://doi.org/10.19678/j.issn.1000-3428.0066181

    Taking the sentiment analysis task of students' teaching evaluation text as the starting point, in view of the insufficient feature-extraction ability of the traditional basic depth learning model, the low training efficiency of the recurrent neural network, and the inaccurate semantic representation of word vectors, a sentiment classification algorithm for student evaluation text based on a hybrid feature network is proposed. The lightweight pre-training model ALBERT is used to extract the dynamic vector representation of each word that conforms to the current context, solve the problem of polysemy in the traditional word vector model, and increase the accuracy of vector semantic representation.The hybrid feature network comprehensively captures the global context sequence features of the teaching evaluation text and the local semantic information at different scales by combining the simple recurrent unit, multi-scale local convolution learning module, and self-attention layer, to improve the deep feature representation ability of the model. The self-attention mechanism identifies the key features that significantly impact the emotional recognition results by calculating the importance of each classification feature to the classification results. To prevent irrelevant features from interfering with the results and affecting the classification performance, the classification vectors are spliced, and the emotional classification results of the evaluation text are output from the linear layer. In an experiment based on a real student teaching evaluation text dataset, the model achieves an F1 score of 97.8%, which is higher than that of the BERT-BiLSTM、BERT-GRU-ATT depth learning model. Additionally, an ablation experiment proves the effectiveness of each module.

  • Research Hotspots and Reviews
    Enxu WANG, Xiaohong WANG, Kun ZHANG, Dongwen ZHANG
    Computer Engineering. 2023, 49(11): 40-48, 69. https://doi.org/10.19678/j.issn.1000-3428.0066255

    In response to the challenge of capturing both timing and feature information in current load forecasting models, we propose a dual attention mechanism-based load forecasting model. This model seamlessly integrates both feature attention and temporal attention mechanisms, allowing it to adaptively extract feature and temporal information from server load data. This enhanced approach effectively emphasizes key information within feature and temporal data within the network. To comprehensively and accurately evaluate server load status for the next moment, we employ the CRITIC objective weighting method. This method assigns weights to various server characteristics, facilitating precise load value calculations. The resulting dual attention mechanism network builds upon a foundation of short-term and Long Short-Term Memory(LSTM) networks. It introduces both characteristic and temporal attention mechanisms while utilizing historical load data as input to predict future server load values. This approach significantly enhances the accuracy of the network for both single-step and multi-step load predictions. Experimental results using the Alibaba Cluster-trace-v2018 public dataset demonstrate the superiority of our dual attention mechanism network over LSTM-based load prediction networks. Specifically, the Mean Absolute Error(MAE) and Mean Square Error(MSE) of the dual attention mechanism network show impressive reductions of 9.2% and 16.8% respectively. This performance improvement underscores the network's stability and accuracy.

  • Development Research and Engineering Application
    Jianhao ZHAN, Lipeng GAN, Yonghui BI, Peng ZENG, Xiaochao LI
    Computer Engineering. 2023, 49(10): 280-288, 297. https://doi.org/10.19678/j.issn.1000-3428.0065152

    The multi-modality fusion method is a core technique for effectively exploring complementary features from multiple modalities to improve action recognition performance at data-, feature-, and decision-level fusion. This study mainly investigated the multimodality fusion method at the feature and decision levels through knowledge distillation, transferring feature learning from other modalities to the RGB model, including the effects of different loss functions and fusion strategies. A multi-modality distillation fusion method is proposed for action recognition, whereby knowledge distillation is performed using the MSE loss function at the feature level, KL divergence at the decision-prediction level, and a combination of the original skeleton and optical flow modalities as multi-teacher networks so that the RGB student network can simultaneously learn with better recognition accuracy. Extensive experiments show that the proposed method achieved state-of-the-art performance with 90.09%, 95.12%, 97.82%, and 81.26% accuracies on the NTU RGB+D 60, UTD-MHAD, N-UCLA, and HMDB51 datasets, respectively. The recognition accuracy on the UTD-MHAD dataset has increased by 3.49, 2.54, 3.21, and 7.34 percentage points compared to single mode RGB data, respectively.

  • Graphics and Image Processing
    Yang LIU, Jun CHEN, Shijia HU, Jiahua LAI
    Computer Engineering. 2023, 49(10): 247-254. https://doi.org/10.19678/j.issn.1000-3428.0065825

    In the mainstream feature-based Simultaneous Localization and Mapping(SLAM) method, feature matching is a key step in estimating camera motion. However, the local characteristics of image features cause widespread mismatch and have become a major bottleneck in visual SLAM. In addition, the sparse maps generated by the feature-based method can only be used for localization, as they do not satisfy higher-level requirements. To address the problems of low efficiency in ORB feature point matching and failure to generate dense maps in ORB-SLAM3, an improved ORB Grid-based Motion Statistics(ORB-GMS) matching strategy is proposed, whereby a dense point cloud construction thread is added to ORB-SLAM3 to realize dense mapping. The motion smoothness constraint is used for the feature point motion statistics method, and the number of matches in the feature point neighborhood and threshold are compared to efficiently determine whether the current match is correct. The gridded images are used for fast computation to perform camera pose estimation. Finally, the dense point cloud map is constructed according to the key frame and the corresponding pose, using the outlier point removal and voxel-grid filters to reduce the size of the point cloud. The experimental results on the RGB-D dataset of TUM show that compared with ORB-SLAM3, the proposed algorithm can reduce matching time by approximately 50% and average positioning error by 32%, while increasing the number of matches by an average of 60%. In addition, compared to sparse maps, this method generates dense point cloud maps that are easy for secondary processing, thereby expanding the application scenarios of the algorithm.

  • Research Hotspots and Reviews
    Bin YANG, Yitong WANG
    Computer Engineering. 2023, 49(10): 13-21. https://doi.org/10.19678/j.issn.1000-3428.0065807

    Heterogeneous Information Network(HIN) typically contains different types of nodes and interactions. Richer semantic information and complex relationships have posed significant challenges to current representation learning in HINs. Although most existing approaches typically use predefined meta-paths to capture heterogeneous semantic and structural information, they suffer from high cost and low coverage. In addition, most existing methods cannot precisely and effectively capture and learn influential high-order neighbor nodes. Accordingly, this study attempts to address the problems of meta-paths and influential high-order neighbor nodes with a proposed original HIN-HG model. HIN-HG generates a hyperadjacency graph of the HIN, precisely and effectively capturing the influential neighbor nodes of the target nodes. Then, convolutional neural networks are adopted with a multichannel mechanism to aggregate different types of neighbor nodes under different relationships. HIN-HG can automatically learn the weights of different neighbor nodes and meta-paths without manually specifying them. Meanwhile, nodes similar to the target node can be captured in the entire graph as higher-order neighbor nodes and the representation of the target node can be effectively updated through information propagation. The experimental results of HIN-HG on three real datasets-DBLP, ACM, and IMDB demonstrate the improved performance of HIN-HG compared with state-of-the-art methods in HIN representation learning, including HAN, GTN, and HGSL. HIN-HG exhibits improved accuracy of node classification by 5.6 and 5.7 percentage points on average in the multiple classification evaluation indices Macro-F1 and Micro-F1, respectively, thus improving the accuracy and effectiveness of node classification.

  • Research Hotspots and Reviews
    Jian CAO, Yimei CHEN, Haisheng LI, Qiang CAI
    Computer Engineering. 2023, 49(10): 1-12. https://doi.org/10.19678/j.issn.1000-3428.0065984

    Small target detection in complex road scenes can improve the vehicle's perception of the surrounding environment. Thus, it is an important research direction in the field of computer vision and intelligent transportation. With the development of deep learning technology, a combination of deep learning and small target detection on roads can effectively improve detection accuracy, allowing the vehicle to quickly respond to the surrounding environment. Starting with the latest classic research results in small target detection, this research provides two definitions for small targets and analyzes the reasons for the difficulty encountered in small target detection on roads. Subsequently, five types of optimization methods based on deep learning are expounded upon to improve detection accuracy of small targets on roads. The optimization methods include enhanced data, multi-scale strategy, generated Super-Resolution(SR) detail information, strengthened contextual information connection and improved loss function. The core ideas of various methods and the latest research progress at home and abroad are summarized. Large and public datasets commonly used in road small target detection are introduced along with corresponding indicators to evaluate the performance of small target detection. In comparing and analyzing the performance detection results of various methods on different datasets, this research presents the current research on road small target and associated problems, looking forward to future research directions from multiple perspectives.

  • Research Hotspots and Reviews
    Jingyi WANG, Baixiang LIU, Ning FANG, Lingqi PENG
    Computer Engineering. 2023, 49(10): 41-52. https://doi.org/10.19678/j.issn.1000-3428.0065202

    Information security and privacy protection are critical requirements in the era of big data. Identity-based cryptography is a type of public-key cryptography that solves the main management problem of the traditional public key infrastructure.However, it will leak the identity information of the signer. The traditional attribute-based access control schemes achieve the dynamic expansion of subjects and fine-grained access to objects, but a centralized authority exists. This study proposes an anonymous data sharing and access control scheme based on blockchain and Attribute-Based Cryptography(ABC) to solve the above problems. Using the anonymity of Attribute-Based Signature(ABS), the reliability of data sources can be verified without knowing the user identity information before storing the data, and fine-grained access control is achieved through Attribute-Based Encryption(ABE). The distributed ABC system is used to enable users to cooperate in building an attribute authority. Authority creation and key distribution can only be performed when the users exceed a specified threshold. The experimental results show that the scheme can resist collusion and replay attacks. Under the condition that the number of concurrent requests is 1 000-5 000 and the number of attributes is 10-30, the total response time of the system does not exceed 120 ms, and the maximum throughput can reach 62 T/s, which satisfy the requirements of the actual environment.

  • Research Hotspots and Reviews
    Xingxing DONG, Jixun GAO, Xiaotong WANG, Song LI
    Computer Engineering. 2023, 49(9): 1-15. https://doi.org/10.19678/j.issn.1000-3428.0064822

    As indispensable components of spatial relations, spatial directional relations are widely used in many fields such as urban intelligent traffic control, environmental resource detection, and disaster prevention and reduction. Spatial directional relations represent a significant and challenging issue in fields such as geographic information systems, spatial database, artificial intelligence, and pattern recognition.This study conducts a comprehensive analysis and comparison of existing spatial directional relationship expression and inference models.First, the research progress on current models for directional relations between objects in two-dimensional space are introduced in detail in terms of single and group target objects.In addition, the characteristics, advantages, and disadvantages of the current models for directional relations in three-dimensional space are analyzed from point to block.The study expounds the current research status of models that use uncertainty directional relations from the two aspects of extended models based on those that use precision objects for directional relations and models based on uncertainty set theory for uncertain objects. The study then discusses the advantages, drawbacks, and applicable fields for each type of model.Finally, the shortcomings of current research are explained, and the future research directions of spatial orientation relations are prospected in terms of automatic reasoning technology, joint representation of spatial relations, and group target objects.

  • Graphics and Image Processing
    Wenzhuo FAN, Tao WU, Junping XU, Qingqing LI, Jianlin ZHANG, Meihui LI, Yuxing WEI
    Computer Engineering. 2023, 49(9): 217-225. https://doi.org/10.19678/j.issn.1000-3428.0065689

    Traditional deep learning image super-resolution reconstruction network only extracts features at a fixed resolution and cannot integrate advanced semantic information. The challenges include difficulties integrating advanced semantic information, reconstructing images with specific scale factors, limited generalization capability, and managing an excessive number of network parameters. An arbitrary scale image super-resolution reconstruction algorithm based on multi-resolution feature fusion is proposed, termed as MFSR. In the phase of multi-resolution feature fusion encoding, a multi-resolution feature extraction module is designed to extract different resolution features. A dual attention module is constructed to enhance the network feature extraction ability. The information-rich fused feature map is obtained by fully interacting with different resolution features. In the phase of image reconstruction, the fused feature map is decoded by a multi-layer perception machine to realize a super-resolution image at any scale. The experimental results indicate that tests were conducted on the Set5 data set with scaling factors of 2, 3, 4, 6, 8, and the Peak Signal-to-Noise Ratios (PSNR) of the proposed algorithm were 38.62, 34.70, 32.41, 28.96, and 26.62 dB, respectively. The model parameters correspond to 0.72×106, which significantly reduce the number of parameters, maintain the reconstruction quality, and realize super-resolution image reconstruction at any scale. Furthermore, the model can realize better performance than mainstream algorithms, such as SRCNN, VDSR, and EDSR.

  • Graphics and Image Processing
    Xianguo LI, Bin LI
    Computer Engineering. 2023, 49(9): 226-233, 245. https://doi.org/10.19678/j.issn.1000-3428.0065513

    Convolutional Neural Network(CNN) has limitations when applied solely to image deblurring tasks with restricted receptive fields.Transformer can effectively mitigate these limitations.However, the computational complexity increases quadratically as the spatial resolution of the input image increases.Therefore, this study proposes an image deblurring network based on Transformer and multi-scale CNN called T-MIMO-UNet. The multi-scale CNN is used to extract spatial features while the global feature of the Transformer is employed to capture remote pixel information.The local enhanced Transformer module, local Multi-Head Self-Attention(MHSA) computing network, and Enhanced Feed-Forward Network(EFFN) are designed.The block-by-block MHSA computation is performed using a windowing approach. The information interaction between different windows is enhanced by increasing the depth of the separable convolution layer.The results of the experiment conducted using the GoPro test dataset demonstrate that the Peak Signal-to-Noise Ratio(PSNR) of the T-MIMO-UNet increases by 0.39 dB, 2.89 dB, 3.42 dB, and 1.86 dB compared to the MIMO-UNet, DeepDeblur, DeblurGAN, and SRN networks, respectively.Additionally, the number of parameters is reduced by 1/2 compared to MPRNet.These findings prove that the T-MIMO-UNet effectively addresses the challenge of image blurring in dynamic scenes.

  • Graphics and Image Processing
    Jiaxin LI, Jin HOU, Boying SHENG, Yuhang ZHOU
    Computer Engineering. 2023, 49(9): 256-264. https://doi.org/10.19678/j.issn.1000-3428.0065935

    In remote sensing imagery, the detection of small objects poses significant challenges due to factors such as complex background, high resolution, and limited effective information. Based on YOLOv5, this study proposes an advanced approach, referred to as YOLOv5-RS, to enhance small object detection in remote sensing images. The presented approach employs a parallel mixed attention module to address issues arising from complex backgrounds and negative samples. This module optimizes the generation of a weighted feature map by substituting fully connected layers with convolutions and eliminating pooling layers. To capture the nuanced characteristics of small targets, the downsampling factor is tailored, and shallow features are incorporated during model training. At the same time, a unique feature extraction module combining convolution and Multi-Head Self-Attention (MHSA) is designed to overcome the limitations of ordinary convolution extraction by jointly representing local and global information, thereby extending the model's receptive field. The EIoU loss function is employed to optimize the regression process for both prediction and detection frames to enhance the localization capacity of small objects. The efficacy of the proposed algorithm is verified via experiments on datasets comprising small target remote sensing images. The results show that compared with YOLOv5s, the proposed algorithm has an average detection accuracy improvement of 1.5 percentage points, coupled with a 20% reduction in parameter count. Particularly, the proposed algorithm's average detection accuracy of small vehicle targets increased by 3.2 percentage points. Comparative evaluations against established methodologies such as EfficientDet, YOLOx, and YOLOv7 underscore the proposed algorithm's capacity to adeptly balance the dual objectives of detection accuracy and real-time performance.

  • Research Hotspots and Reviews
    Yuyao GAO, Mingquan SHI, Yu QIN, Jianping CHEN, Xi ZHOU, Peng ZHANG
    Computer Engineering. 2023, 49(9): 43-51. https://doi.org/10.19678/j.issn.1000-3428.0066234

    Station ridership data are among the most important basic data in the network planning of routine bus systems.The type, number, and distance of the Point of Interest(POI) around a station can lead to different ridership trends. However, this important feature is not reflected in the structure of traditional fully connected neural networks that are commonly used to study ridership prediction because of the mutual independence of POI influence on ridership, which tends to make prediction results unsatisfactory.This study improves the basic structure of a fully connected neural network by considering the specificity of the relationship between POI and ridership and constructs a specific, non-fully connected neural network. The simulation and prediction of ridership at each time period of the station are achieved using historical ridership data at all bus stations as well as weights of various POI types. The model creates a connection matrix to realize a non-fully connected network, thereby constructing a composite error transfer function to associate meaning with some of the hidden layers, to enhance the interpretability of the neural network based on the nature of ridership.The proposed neural network addresses some of the problems of traditional neural networks, such as slow convergence, poor fitting, and entrapment into local optima. Experiments demonstrate that the proposed model converges to the global optimal solution more rapidly and the probability of accurate prediction exceeded 88% when applying the model to 50 people to predict ridership per hour. The model has an excellent effect compared to other common prediction models and can accurately simulate the daily ridership trend.

  • Graphics and Image Processing
    Fangyu FENG, Xiaoshu LUO, Zhiming MENG, Guangyu WANG
    Computer Engineering. 2023, 49(8): 190-198. https://doi.org/10.19678/j.issn.1000-3428.0065224

    As it is difficult to extract effective features in facial expression recognition and the high similarity between categories and easy confusion lead to low accuracy of facial expression recognition, a facial expression recognition method based on anti-aliasing residual attention network is proposed. First, in view of the problem that the traditional subsampling method can easily cause the loss of expression discriminative features, an anti-aliasing residual network is constructed to improve the feature extraction ability of expression images and enhance the representation of expression features, enabling more effective global facial expression information to be extracted.At the same time, the improved channel attention mechanism and label smoothing regularization strategy are used to enhance the attention to the local key expression regions of the face: the improved channel attention focuses on the highly discriminative expression features and suppresses the weight of non-expressive regions, so as to locate more detailed local expression regions in the global information extracted by the network, and the label smoothing technology corrects the prediction probability by increasing the amount of information of the decision-making expression category, avoiding too absolute prediction results, which reduces misjudgment between similar expressions. Experimental results show that, the recognition accuracies of this method on the facial expression datasets RAF-DB and FERPlus reach 88.14% and 89.31%, respectively.Compared with advanced methods such as DACT and VTFF, this method has better performance. Compared with the original residual network, the accuracy and robustness of facial expression recognition are effectively improved.

  • Graphics and Image Processing
    Chunbo XU, Juan YAN, Huibin YANG, Bo WANG, Han WU
    Computer Engineering. 2023, 49(8): 199-206, 214. https://doi.org/10.19678/j.issn.1000-3428.0065522

    Currently, most Visual Simultaneous Localization And Mapping(VSLAM) algorithms are based on static scene design and do not consider dynamic objects in a scene.However, dynamic objects in an actual scene cause mismatches among the feature points of the visual odometer, which affects the positioning and mapping accuracy of the SLAM system and reduce its robustness in practical applications. Aimed at an indoor dynamic environment, a VSLAM algorithm based on the ORB-SLAM3 main framework, known as RDTS-SLAM, is proposed. An improved YOLOv5 target detection and semantic segmentation network is used to accurately and rapidly segment objects in the environment.Simultaneously, the target detection results are combined with the local optical flow method to accurately identify dynamic objects, and the feature points in the dynamic object area are eliminated. Only static feature points are used for feature point matching and subsequent positioning and mapping.Experimental results on the TUM RGB dataset and actual environment data show that compared to ORB-SLAM3 and RDS-SLAM algorithms, the Root Mean Square Error(RMSE) of trajectory estimation for sequence walking_rpy of RDTS-SLAM algorithm is reduced by 95.38% and 86.20%, respectively, which implies that it can significantly improve the robustness and accuracy of the VSLAM system in a dynamic environment.

  • Graphics and Image Processing
    Zhihao LIU, Fanyun MENG, Jinhe WANG, Nan ZHANG
    Computer Engineering. 2023, 49(8): 223-231. https://doi.org/10.19678/j.issn.1000-3428.0065628

    Most of the stereo matching algorithms based on convolutional neural networks require a large receptive field. However, the number of parameters in most algorithms is easy to increase when the receptive field is enlarged, which leads to high requirements on the scale of training data. A stereo matching algorithm, based on atrous convolution and attention module, is proposed. An atrous convolution module is used to combine residual structure and atrous convolution to enlarge the receptive field of the network with fewer parameters. The attention module is used to integrate multiple levels of information via different levels of convolution to increase the integrity of the extracted information. The spatial pyramid pool module is used to enlarge the receptive field of the model through the pyramid pool with the right, and different levels of information have different importance. The experimental results show that the proposed algorithm has a faster convergence speed than DispNetC and other algorithms with the same data set and training times. Moreover, it has a simple structure, few parameters, and is suitable for small sample data.

  • Research Hotspots and Reviews
    Qun WANG, Fujuan LI, Xueli NI, Lingling XIA, Guangjun LIANG
    Computer Engineering. 2023, 49(8): 1-12. https://doi.org/10.19678/j.issn.1000-3428.0065926

    The blockchain technology utilizes cryptography, consensus algorithms, incentive mechanism, Peer-to-Peer(P2P) network, distributed ledgers, smart contracts, and other key technologies.This enabled its application in a network environment without third-party authority and mutual distrust to realize distributed consistency, irreversibility, traceability, and other function characteristics for transaction records, and has constituted a new trusted, safe, and programmable network ecology.Simultaneously, the technologies and mechanisms underpinning blockchain realization have raised privacy concerns. Privacy is studied as a subset of personal data; thus, the blockchain data are divided into transaction data and block data. The blockchain data is deconstructed, the obtained information is associated to analyze the privacy information hidden in the data.To elaborate, the blockchain data structure is introduced from the perspective of the blockchain data transmission mode and data structure. Based on the analysis of the characteristics of blockchain data and by integrating factors such as understanding, measurement, and the privacy disclosure approach, a definition of blockchain privacy is provided. Blockchain privacy is analyzed in terms of identity, data, and network privacy. The challenges and research directions of blockchain privacy threats are highlighted by focusing on the analysis of privacy leakage risks brought about by network security, cryptographic security, cross-chain operations, and consensus algorithms.

  • Research Hotspots and Reviews
    Jinshuo LIU, Daichen WANG, Juan DENG, Lina WANG
    Computer Engineering. 2023, 49(8): 13-19, 28. https://doi.org/10.19678/j.issn.1000-3428.0067003

    Currently, most existing methods for classifying harmful information on Internet overlook imbalanced data and long-tailed distributions, biasing the model towards more numerous data samples during classification. This makes them unable to effectively identify small data samples, which results in a decrease in overall recognition accuracy. To address this issue, a classification method LTIC for long-tailed harmful information datasets is proposed. By integrating few-shot learning with knowledge transfer strategies, the BERT model is used to learn the weights of the head class. The prototype of the head class is obtained through a Prototyper network specifically designed for few-shot learning.This design allows for the processing of head and tail data separately, thereby avoiding the data imbalance caused by mutual training. Researchers then use the mapping relationship learned from the prototype to convert the prototype of the tail class into weights. Subsequently, the head and tail class weights are combined to obtain the final classification result. In experiments, the LTIC method achieves classification accuracies of 82.7% and 83.5% on the Twitter and THUCNews datasets, respectively. This method also significantly improves the F1 value compared to the non-long tailed model, thus effectively improving classification accuracy. When compared with the latest classification methods such as BNN and OLTR, this method exhibits superior classification performance on long-tailed datasets, with an average accuracy improvement of 3%. When new categories of harmful information emerge, the LTIC method demonstrates the capability to predict them with minimal computation, achieving an accuracy of 70% and showcasing impressive scalability.

  • Research Hotspots and Reviews
    YANG Wenzhong, DING Tiantian, KANG Peng, BU Wenxiu
    Computer Engineering. 2023, 49(3): 1-17. https://doi.org/10.19678/j.issn.1000-3428.0064374
    The keyword extraction algorithm for public opinion events is used as a basic technique for public opinion monitoring.To quickly understand the news content, the algorithm aims to extract the core words associated with the concerns of the people at different events.With the development of deep learning, the traditional unsupervised keyword extraction techniques and classification models in supervised algorithms have been gradually replaced by sequence annotation models.The limitations of unsupervised keyword extraction, the advantages and disadvantages associated with classification models for keyword extraction, and the application of existing deep learning to assist in the development of keyword extraction technology have been addressed. The development of the overall keyword extraction technology is focused on analyzing the development of the deep learning keyword extraction methods, such as convolutional neural networks and recurrent neural networks.Furthermore, the advantages, disadvantages, and development trends of existing methods are summarized. In addition, although deep learning has an important function in the field of keyword extraction, the associated disadvantages of reliance on large-scale labeled samples, long training time, and high complexity need to be addressed further in future development. To ensure the authenticity of the analysis process, experimental replications were conducted using six public opinion news datasets and two small datasets.The experimental results were consistent with the theoretical analysis presented.On this basis, the various keyword extraction techniques and their associated difficulties and challenges are reviewed and analyzed. Additionally, the prospects for the development of this field are discussed in view of the existing problems.
  • Research Hotspots and Reviews
    WANG Qun, LI Fujuan, NI Xueli, XIA Lingling, LIANG Guangjun
    Computer Engineering. 2023, 49(4): 1-13. https://doi.org/10.19678/j.issn.1000-3428.0065927
    As a type of distributed ledger maintained by multiple nodes, blockchain integrates a P2P network, consensus algorithm, smart contract, and cryptography technology to build an efficient and low-cost decentralized trust mechanism in an open network environment and realizes security-related functions such as tamper-proof and anti-forgery functions. Howerer, it also faces serious data privacy disclosure. Based on an understanding of blockchain privacy and analysis of data formation involving privacy, the specific technologies, workings, and implementation protocols of blockchain privacy protection are analyzed in detail, focusing on identity, data, and network privacy. For identity privacy protection, based on discussion of the technology of currency mixing, the implementation principles and application characteristics of centralized currency mixing technology and decentralized currency mixing technology are compared and analyzed.For data privacy protection, the applications of zero-knowledge proof and ring signature technology in blockchain privacy protection are introduced.For network privacy protection, network data hiding and channel isolation technologies are discussed. This study compares and analyzes identity, data, and network privacy from three aspects:privacy content, privacy threats, and privacy protection mechanisms.It also anticipates the future development trend of the blockchain privacy protection mechanism based on a systematic review of recent progress on privacy protection combined with current blockchain applications.
  • Research Hotspots and Reviews
    GAO Jianbo, ZHANG Jiashuo, LI Qingshan, CHEN Zhong
    Computer Engineering. 2023, 49(5): 12-21,28. https://doi.org/10.19678/j.issn.1000-3428.0065584
    RegLang is a regulatory-oriented smart contract programming language for the digitization and contractualization of regulatory rules with preliminary applications in finance and other fields. However,in practice,the rule conflicts such as "scope conflict" and "multi-track regulation" in the financial field may have a serious impact on blockchain applications.While increasing the compliance cost,such conflicts also diminish the effectiveness of the regulatory contract. To address these problems,the variable type dependency and propagation analysis methods are proposed based on the dependency graph to infer the potential types of variables in regulatory contracts and realize the symbolization of variables,statements,and rules according to the symbol types supported by Satisfiability Module Theories(SMT) solvers.A rule conflict detection method is proposed based on symbol analysis,which transforms the regulatory rule conflict problems into satisfiability problems and detects the self-conflict,complete conflict,and partial conflict problems.Moreover,a subset partitioning method is proposed to optimize the state space explosion in detecting complete conflicts among multiple regulatory rules.The experimental results show that the proposed methods can effectively detect various regulatory rule conflicts.When the conflict detection is performed on the regulatory rules with 300 lines of code,the average time of detecting self-conflict,complete conflict,and partial conflict is 1 234.9 ms,1 977.8 ms,and 2 364.5 ms,respectively. The time consumption is acceptable in practical applications and can provide a validity guarantee for the digitalization of regulatory rules.
  • Research Hotspots and Reviews
    LI Xuesong, ZHANG Qieshi, SONG Chengqun, KANG Yuhang, CHENG Jun
    Computer Engineering. 2023, 49(5): 1-11. https://doi.org/10.19678/j.issn.1000-3428.0065627
    Abstract (1403) Download PDF (801) HTML (129)   Knowledge map   Save
    Trajectory prediction is a key technology in the fields of autonomous driving and intelligent transportation. The accurate prediction of trajectories for vehicles and moving pedestrians can improve the perception of environmental changes in autonomous driving systems,thereby ensuring overall safety.The data-driven trajectory prediction method accurately captures interaction characteristics between agents,analyzes the historical motion and static environment information of all agents within a scene,and predicts the agents' future trajectories.The mathematical models of trajectory prediction are introduced and categorized as traditional and data-driven trajectory prediction methods.The four main challenges faced by mainstream data-driven trajectory prediction methods include intelligent agent interaction modeling,motion behavior intention prediction,trajectory diversity prediction,and static environmental information fusion within a scene.Herein,starting from the use of trajectory prediction datasets,the performance evaluation indicators,model characteristics,and other aspects of typical data-driven trajectory prediction methods are analyzed and compared. On this basis,the solutions and application scenarios of the said methods to address the abovementioned challenges are summarized,and future development directions of trajectory prediction technology in autonomous driving are proposed.
  • Computer Architecture and Software Technology
    GAO Xiuwu, JIANG Jun, BAI Shujing, HUANG Liangming
    Computer Engineering. 2023, 49(1): 173-180. https://doi.org/10.19678/j.issn.1000-3428.0062878
    In domestic Sunway high-performance multi-core server systems, when the basic compilation system generates the code for access operation in the application program, the instructional characteristics of the domestic processor are not considered.As a result, the access address calculation code generated by the compiler is inefficient, which hinders the performance of the high-performance processor.To fully realize the high-performance computing capabilities of domestic processors, a compiler optimization method is proposed to accelerate the computation of access addresses.The compiler optimization of accelerated access address computation is based on the processor's support of operational instructions using an extension factor.During the compiler back-end memory address expression validity check, an address calculation expression validity check algorithm for multiply-add mode is added.The algorithm automatically recognizes the multiplicative operation in the address expression and verifies its validity.It then generates operational instructions using the extension factor to calculate the access address quickly in the code generation stage, which accelerates the launch and execution of the access instructions.The optimization method can significantly speed up the generation of the access address in the application and improve access efficiency.The method is automatically implemented by the compiler and is transparent to the program developer.Experimental results show that the average performances of the two subsets of SPECspeed Integer and SPECspeed Float Point are improved by 2.53% and 1.50%, respectively, as compared with that prior to optimization.