Author Login Editor-in-Chief Peer Review Editor Work Office Work

15 April 2024, Volume 50 Issue 4
    

  • Select all
    |
    Intelligent Transportation
  • Fei GE, Shan MIN, Han QIU, Zhenyang DAI, Zhimin YANG
    Computer Engineering. 2024, 50(4): 1-10. https://doi.org/10.19678/j.issn.1000-3428.0068790
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    The Ant Colony Optimization(ACO) algorithm is an optimization algorithm that simulates the behavior of ants identifying food paths. It can solve the Non-deterministic Polynomial(NP)-hard combination problem of geometric distributions in a dynamically changing environment without any external guidance or control. To prevent the ACO algorithm from falling easily into the local optimum and to mitigate the difficulty in balancing the depth and breadth of search when solving NP-hard problems, a Green Intelligent Evolutionary Ant Colony Optimization(G-IEACO) algorithm is proposed. By introducing four types of domain operators, the state transition rules and pheromone update methods of the ACO algorithm are improved, thus enhancing the optimization performance and preventing premature convergence. Additionally, a congestion avoidance strategy is adopted to balance between time and environmental costs. Results of numerical analysis show that the G-IEACO algorithm outperforms the Genetic Algorithm(GA) in terms of the Total driving Time(TT) and vehicle carbon emission(TCO2) of the fleet. Specifically, it reduces the TT and TCO2 by 13.32% and 13.64% on average, respectively, in test cases of R2 and RC2 involving 100 clients, thus implying that it can effectively promote the realization of green and low-carbon goals.

  • Wei CHEN, Xiaolong WANG, Yanwei ZHANG, Guocheng AN, Bo JIANG
    Computer Engineering. 2024, 50(4): 11-19. https://doi.org/10.19678/j.issn.1000-3428.0068901
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    In highway service areas, complex environments such as lighting and weather changes can cause a sharp decline in vehicle detection accuracy. In addition, factors such as the inclination angle of the camera and the height of installation can increase false-negative and false-positive rates. To this end, a vehicle violation detection algorithm based on the improved YOLOv8 is proposed for highway service areas. First, the feature pyramid pooling layer of the YOLOv8 network, a Dilated Space Pyramid Pooling(DSPP) module, and a DSPP based on branch Attention(DSPPA) module are constructed to reduce the loss of semantic information in the backbone. The Branch Attention(BA) mechanism in DSPPA assigns different weights to the branches with varying degrees of contribution, making the model focus more on features that are suitable for the target size. Second, a parking space allocation strategy based on global matching is designed to effectively reduce the false-negative and false-positive rates of illegal parking detection in situations involving tilted views and overlapping vehicles. The experimental results show that the improved algorithm reduces the false-negative rate of parking violation detection from 15% to 8% and the false-positive rate from 7.5% to 6.1%, demonstrating considerable performance improvement in vehicle violation detection.

  • Junze HUANG, Wenyuan WU, Yi LI, Mingquan SHI, Zhengjiang WANG
    Computer Engineering. 2024, 50(4): 20-30. https://doi.org/10.19678/j.issn.1000-3428.0068931
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    With the development of smart cities and transport, the continuous improvement of mobile Internet and smart transport infrastructure and data, a new transport operation method in which users order transport services on their mobile phones-dynamic transport has become an important exploration direction for public transport development in several urban cities. However, studies in the field of modeling and algorithms for the dynamic transport problem are limited. Therefore, a dynamic transport problem model and a discrete hierarchical memory Particle Swarm Optimization(PSO) algorithm for dynamic public transport are proposed. This study mainly involves providing the objective function and constraint conditions for dynamic transport problems; providing the form of the solution to the dynamic transport problem and defining the editing distance of the solution; proposing an algorithm for generating high-quality initial solutions for the PSO algorithm using data-driven precomputed path sets; providing the calculation method of particle mutation probability and adaptive convergence coefficient in the PSO algorithm based on the edit distance of the solution; and proposing a hierarchical solution method for PSO in which lower-level particles can be reused and inherited, thereby reducing the performance loss caused by copying and re-initialization within a single time slice and between time slices. Based on a real scene and historical data from Caijiagang Street in Beibei District, Chongqing, a simulation environment is established for the experiments. Experiments have demonstrated that compared to non-hierarchical PSO algorithms, the hierarchical PSO algorithm can reduce computational time by more than 80% through reuse and inheritance, and adaptive parameters and mutation mechanisms can help algorithms converge to additional optimal solutions more stably than traditional public transport systems. Dynamic public transport can increase passenger order acceptance rate by 22% and save passenger travel time by 39.1% under the same capacity constraints. Moreover, the algorithm proposed in this study can meet the needs of transport operators for dynamic public transport scheduling within the area. Compared to non-hierarchical PSO algorithms, the algorithm proposed in this study reduced the calculation time by an average of 85.3% and improved the order acceptance rate by at least 12% while consuming only 80% of the mileage.

  • Lei ZHANG, Guochen SHEN, Dongxiu OU
    Computer Engineering. 2024, 50(4): 31-40. https://doi.org/10.19678/j.issn.1000-3428.0069176
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Commonly used visible light image data may fail because of harsh weather or poor light conditions; however, infrared thermal imaging data can effectively complement visible light image data. Existing studies often rely on Domain Adaptation(DA) to apply Convolutional Neural Network(CNN) to visible light image data to process thermal imaging data and overcome the lack of a large annotated training set for infrared data. However, DA methods cannot completely avoid the training process. Researchers have found that the domain-invariant and domain-variant components of an image can be separated in the frequency domain. Inspired by this phenomenon, a filtering method for feature maps from CNN based on Discrete Cosine Transform(DCT) and chi-square independence index is proposed. The domain-invariant and domain-variant components are separated in the frequency domain. By imitating the chi-square independence test, an independence index based on frequency components is proposed to measure the degree of difference in feature maps. According to this index, clustering is used to classify feature maps and identify the class(es) to be maintained or dropped. Thereafter, neural network suitable for thermal imaging data is constructed. The experimental results indicate that this method can determine the latent capabilities of pre-trained CNN for visible light images to extract the features of thermal imaging data without retraining the network. Although the pre-trained network failed to predict the thermal imaging data, the network constructed using the proposed method achieves up to 90% matching between the object and the top five prediction results.

  • Mingyue SI, Bin QI, Wensheng ZHANG, Lei ZHANG
    Computer Engineering. 2024, 50(4): 41-49. https://doi.org/10.19678/j.issn.1000-3428.0069223
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    A comprehensive model combining tensor calculation and few-shot learning is proposed to address the problem of limited and difficult-to-obtain samples in intelligent transportation scenarios such that the issue of unsatisfactory training effect caused by insufficient samples in the target domain can be solved. A multi-dimensional computing model is constructed based on tensor calculation, multi-dimensional heterogeneous data in intelligent transportation scenarios are processed, fused data tensors are obtained based on the spatio-temporal correlation of the data, fused data are used as input data, training is performed using few-shot learning models, and the performances of tensor few-shot learning models based on different tensor calculation schemes and ablation experimental results are compared and analyzed. Simulation results show that compared with two metric-based few-shot learning models, i.e., the prototype network and matching network, the combination of a meta-learning-based few-shot learning model and a tensor calculation model presents higher credibility. Moreover, by adopting different tensor-fusion schemes, the accuracy and F1 values of the meta-learning model improved to varying degrees. The model based on the inverse-decomposition tensor-fusion scheme offers a maximum accuracy of 0.95, which renders it superior to the CANDECOMP/PARAFAC Decomposition (CPD) fusion scheme in terms of performance.

  • Liang HUANG, Peng ZOU, Jingjing CAO, Jian HU, Zexin YAN, Xiaodie HUANG
    Computer Engineering. 2024, 50(4): 50-59. https://doi.org/10.19678/j.issn.1000-3428.0069225
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    To address the challenges posed by the diversity of sources, structures, and protocols in transportation infrastructure monitoring point data, this study analyzes the requirements for data access and proposes a virtual gateway for the access of monitoring point data within a transportation infrastructure based on the Netty architecture. The study describe in detail the configuration methods and cluster allocation strategies for integrating monitoring points through the gateway, and it defines the data transmission formats under protocols such as the HyperText Transfer Protocol (HTTP), Transmission Control Protocol (TCP), and User Datagram Protocol (UDP). The study also designs an encoding and verification mechanism for data transmission messages. Through enhanced real monitoring data samples, the performance of the virtual gateway is evaluated using distributed message simulation tools. The findings reveal that the virtual gateway facilitates unified access to multipoint and multiprotocol transportation infrastructure monitoring data, achieving data access and storage time of 8.14 s and 9.75 s per hundred million records, respectively. The average time for data traceability is 2.96 s, thus showing the gateway's capacity to handle monitoring data of points at the billion-scale level. This capability significantly supports the research and application of digital monitoring and transportation infrastructure analysis.

  • Tiantian DU, Xiaolong WANG, Jing HE
    Computer Engineering. 2024, 50(4): 60-67. https://doi.org/10.19678/j.issn.1000-3428.0069244
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Real-time and accurate River Surface Velocity(RSV) data in rivers serve as crucial foundations for modern waterway dispatching and flood prevention. However, most traditional velocity measurement methods require manual field participation, which pose high risks and cannot satisfy the demands for large-scale system deployment. By contrast, image-based velocity measurement methods, which do not require direct contact with rivers, can provide near-real-time velocity information based on continuous frames captured by cameras. Nevertheless, optical flow estimation, as a mainstream image-based velocity measurement method, is designed for rigid object motion and lacks robustness in scenes with high similarity, such as river surfaces. To enhance the estimation accuracy of the water flow velocity algorithm based on the Recurrent All-Pairs Field Transformer(RAFT) model for optical flow estimation, a Convolutional Block Attention Module(CBAM) attention module is introduced in the feature extraction section. This module effectively improves the ability of the RAFT model to recognize river surface ripples and the movement of tracer particles. The loss functions in the optical flow iteration section are optimized by incorporating the angular error loss and divergence gradient smoothness loss, which reflect fluid motion characteristics. In addition, a weight factor that exponentially increases with the number of iterations is introduced to match the loss functions, emphasizing the significant effects that high-order iterations have on the overall results. Performance evaluations are conducted using river datasets from different scenarios to validate the effectiveness of the improved method. The results show that the proposed method yields an average relative error of 11.37% in complex optical noise scenarios, thus demonstrating good robustness and enabling the generation of more accurate spatial distribution maps of surface velocity.

  • Artificial Intelligence and Pattern Recognition
  • Chi ZHANG, Zhong WANG, Tianhao JIANG, Kangmin XIE
    Computer Engineering. 2024, 50(4): 68-77. https://doi.org/10.19678/j.issn.1000-3428.0068019
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Regarding the issue of the frequency-domain enhancement of speech affected by interference, a speech enhancement network based on a parallel multi-attention mechanism and an encoding and decoding structure, known as PMAN, is proposed. The network uses speech frequency-domain features obtained through a Short-Time Fourier Transform(STFT), including amplitude and complex spectra. The encoder integrates input data using dense convolutional modules. The parallel multi-attention module of the intermediate layer learns both local and global information in the frequency-domain and incorporates a Local Patch Attention(LPA) mechanism to capture the Two-Dimensional(2D) structure of the speech frequency-domain, achieving separation between clean speech and interference factors in the 2D space. The decoder integrates the learned information and generates amplitude masks and complex spectra separately. The final speech complex spectrum is obtained via weighted summation, and a joint time- and frequency-domain loss function is used to fuse the phase information. Experimental results on the VoiceBank+DEMAND speech dataset demonstrate that PMAN achieves better speech enhancement performance than a time-domain speech enhancement Neural Network based on a Two-Stage Transformer(TSTNN), with improvements of 10.8% in Perceptual Evaluation of Speech Quality(PESQ), 1.1% in Short-Time Objective Intelligibility(STOI), and 11.8% in Segmental Signal-to-Noise Ratio(SSNR).

  • Zhifen HAO, Kaipei DING, Ruich CAI, Wei CHEN
    Computer Engineering. 2024, 50(4): 78-86. https://doi.org/10.19678/j.issn.1000-3428.0066901
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Causal discovery aims to mine the causal relationship between variables through observed data. Most existing methods assume that the data-generation process is stationary. However, this assumption is often not satisfied in the application environments, leading to unreliable results. This study reveals that non-stationary disturbances in some scenes are highly correlated with time-series information. Therefore, based on the additive noise model, the method portrays non-stationary disturbances as a mapping of time series information and proposes a non-stationary additive noise model and its identification conditions. This study proposes a two-stage causality discovery algorithm based on identification conditions. Specifically, residuals are obtained through regression analysis and are used to evaluate the independence of selecting a leaf node in the initial phase of the algorithm. The causal order of the observed variable sets is thereafter obtained iteratively until all the variables have been included. In the second phase of the algorithm, regression analysis and independence tests are performed again to eliminate redundant causal relationships identified in the first stage, which helps to obtain the final causal structure of the observed variable set. Experimental results demonstrate that the proposed algorithm outperforms other algorithms such as Constraint-based causal Discovery heterogeneous/Non-stationary Data (CD-NOD), LPCMCI, and TiMINo. For the synthetic datasets, the proposed algorithm achieves an average F1 value of 0.85. In real-world structural datasets, the F1 value of the proposed algorithm increases by an average of 41.12%, signifying that the algorithm can learn more information about the causal structure from a dataset of non-stationary variables.

  • Jingcan LI, Cuilin XIAO, Xiaoting QIN, Xia XIE
    Computer Engineering. 2024, 50(4): 87-94. https://doi.org/10.19678/j.issn.1000-3428.0068501
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Relation extraction is a basic and important task that aims to extract the relations between entities from unstructured text. Recent developments show that Large-Language Model (LLM) and basic models can improve the performance of several Natural Language Processing (NLP) tasks. These models utilize the language-representation ability of deep-learning and pre-training models and can automatically learn the semantic features of relations. A method to effectively use of a large model for solving the problems of entity overlap and unsatisfactory information exchange is yet to be revealed. Hence, a relational-extraction model based on large language is proposed. First, the Large-Language model Meta AI (LLaMA) is adapted to the task in this study via fine-tuning. To extract relations, the self-attention mechanism is used to enhance the correlation between entity pairs and information sharing between entities. Subsequently, average pooling is performed to generalize an entire sentence. A filtering matrix is designed for entity pairs, part-of-speech information is introduced to enhance semantics, and invalid triples are filtered out based on the relevance of entity pairs in the filtering matrix. Experimental results show that the F1 value results of the proposed model on the New York Times (NYT) and WebNLG open datasets are 93.1% and 90.4%, respectively. In the case where the LLaMA model becomes an encoder after fine-tuning, the proposed algorithm is superior to the baseline model in terms of accuracy and the F1 value index, thus verifying its effectiveness.

  • Zhengyang WU, Guangtao ZHANG, Li HUANG, Yong TANG
    Computer Engineering. 2024, 50(4): 95-103. https://doi.org/10.19678/j.issn.1000-3428.0067554
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    The network formed by the online education platform is characterized by a large amount of data, rich entity types, and complex relationships. On the one hand, online education is being popularized, but on the other hand, online courses are facing the problems of low utilization, low completion, and high dropout rates. Personalized course recommendations are conducive to improving students' enthusiasm for learning. Among these, whether courses can be successfully completed is an important factor that students consider when selecting courses. Considering this, this study proposes a personalized course recommendation model based on the prediction of learning completion. This approach models the students' course learning session graph, and generates their learning status representations according to their course learning sequence and the completion of the course. Simultaneously, considering the influence of online learning environment factors on courses, a heterogeneous graph of online course learning is constructed, and a graph neural network is used to generate the embedding of course nodes in the graph. Thereafter, the course embeddings are fused with the students' learning status representation and the embedding of courses through an interactive mechanism to predict their degree of completion of the next course they will take. Finally, the courses are sorted according to the degree of recommendation completion. The experimental results on three large-scale online course learning datasets, namely CNPC, HMXPC, and Scholat, demonstrate that the model can effectively improve the accuracy of recommendations, and has significantly improved both the NDCG and MRR metrics compared to the baseline model optimal results. When K of the evaluation index is 5, NDCG@5 is improved by 21.08%, 17.73%, and 5.41%, respectively, and MRR@5 is improved by 25.66%, 31.59%, and 26.96%, respectively.

  • Zhilei XU, Rui HUANG
    Computer Engineering. 2024, 50(4): 104-112. https://doi.org/10.19678/j.issn.1000-3428.0067602
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    In multilabel learning, the classification performance can be improved through the effective use of label correlations. However, owing to the subjectivity of manual tagging and the similarity of label semantics in practical applications, an incomplete label space is typically observed, which results in an inaccurate estimation of label correlations and thus degraded algorithm performance. Hence, a Multilabel Learning with incomplete labels using Dual-Manifold Mapping(ML-DMM) algorithm is proposed. The algorithm constructs two types of manifold mappings: feature manifold mapping, which preserves local structural information in the instance data space, and label manifold mapping, which is based on label correlations obtained through iterative learning. The algorithm first constructs a low-dimensional manifold of data through Laplace mapping and then maps the original feature space and original label space onto the low-dimensional manifold via a regression coefficient matrix and label correlation matrix, respectively. Thus, a dual-manifold mapping structure is formed to improve the algorithm performance. Finally, the regression coefficient matrix obtained via iterative learning is used for multilabel classification. Experimental results on eight multilabel datasets with three missing rates of class labels show that ML-DMM performs better than other multilabel classification methods for missing labels.

  • Jida ZHAO, Guoyong ZHEN, Chengqun CHU
    Computer Engineering. 2024, 50(4): 113-120. https://doi.org/10.19678/j.issn.1000-3428.0068268
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    In the Unmanned Aerial Vehicle(UAV) target detection task, missed and false detections are caused by the small size of the detection target and complex background of the detection image. To address the problem of small target detection, the UAV image target detection algorithm is proposed by improving YOLOv8s. First, for application scenarios where drone shooting targets are generally small, the number of Backbone layers of the algorithm is reduced, and the size of the feature map to be detected is increased such that the network model can focus more on small targets. Second, because a certain number of low-quality examples commonly influence the training effect in the dataset, the Wise-IoU loss function is introduced to enhance the training effect of the dataset. Third, by introducing a context enhancement module, the characteristic information of small targets in different receptive fields is obtained, and the positioning and classification effect of the network model on small targets in complex environments is improved. Finally, a spatial-channel filtering module is designed to enhance the characteristic information of the target during the convolution process to filter out useless interference information and address the problem of some small target characteristic information being submerged and lost during the convolution process. Experiment results on the VisDrone2019 dataset demonstrate that the average detection accuracy(mAP@0.5) of the proposed algorithm reaches 45.4%, which is 7.3 percentage points higher than that of the original YOLOv8s algorithm, and the number of parameters is reduced by 26.13%. Under similar experimental conditions, compared with other common small target detection algorithms, the detection accuracy and speed are improved to a certain extent.

  • Minghu WANG, Zhikui SHI, Jia SU, Xinsheng ZHANG
    Computer Engineering. 2024, 50(4): 121-131. https://doi.org/10.19678/j.issn.1000-3428.0068307
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Since the emergence of recommendation systems, further development of recommendation algorithms has been constrained by limited data. To reduce the impact of data sparsity and enhance the utilization of nonrated data, text-recommendation models based on neural networks have been successively proposed. However, mainstream convolutional and recurrent neural networks have clear disadvantages as concerns text semantic understanding and capturing long-distance relationships. To better explore the deep latent features between users and items, and further improve the quality of recommendations, a sequence recommendation method based on RoBERTa and a Graph-enhanced Transformer (RGT) is proposed. This model incorporates textual comment data by first utilizing a pre-trained RoBERTa model to capture the semantic features of words in the comment text, thereby modeling the personalized interests of the user. Subsequently, based on historical interaction information between users and items, a graph attention mechanism network model with the temporal characteristics of item associations is constructed. Using the graph-enhanced Transformer method, the feature representations of various items learned by the graph model are sequentially input to the Transformer encoding layer. Finally, the obtained output vectors, along with the previously captured semantic and computed global representations of the item association graph, are input into a fully connected layer to capture the global interest preferences of the user and achieve prediction ratings for items. The experimental results on three groups of real Amazon public datasets demonstrate that the proposed recommendation model significantly improves the Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) compared to several existing classical text recommendation models, such as DeepFM and ConvMF. Compared to the optimal comparison model, the highest increases are 4.7% and 5.3%, respectively.

  • Huazhen WANG, Ze XU, Yue SUN, Bin QIU, Jian CHEN, Qiangbin QIU
    Computer Engineering. 2024, 50(4): 132-140. https://doi.org/10.19678/j.issn.1000-3428.0067498
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Multilabel event prediction refers to the prediction of whether multiple associated events will occur in the future, which requires the simultaneous prediction of multiple target events and comparing it with the conventional single-label event prediction. Because the issue of multi-label event contexts in various fields is yet to be addressed and studies regarding multi-label event prediction are few, this paper proposes a Multi-Label Event Prediction(MLEP) model based on Event-Evolution Graph(EEG). First, an EEG is constructed based on event chains. Subsequently, problem transformation is performed on the multi-label event-prediction problem to transform it into a single-label problem, followed by obtaining vector representations of all events using event-representation learning methods to encode multi-label events. Finally, a multi-label event prediction model is constructed using the Gated Graph Neural Network(GGNN) framework. The optimal subsequent events are matched based on their similarity to predict multi-label events. Experimental results on real datasets show that the proposed MLEP model can effectively predict multi-labeled events with a prediction accuracy of 65.58%, thus outperforming most existing benchmark models with an improvement level exceeding 4.94%. Results of ablation experiments show that better event-representation learning methods provide better event representations and multi-label event predictions.

  • Chunxia YANG, Yalei WU, Han YAN, Yukun HUANG
    Computer Engineering. 2024, 50(4): 141-149. https://doi.org/10.19678/j.issn.1000-3428.0067557
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    The aspect-level sentiment analysis aims to determine the sentiment polarity of a given aspect of a sentence. The existing graph neural network-based aspect-level sentiment analysis has two shortcomings: first, it ignores different types of syntactic dependencies and word co-occurrence information in the corpus; second, it cannot accurately control the flow of sentiment information to a given aspect. To address these problems, this study proposes an aspect-level sentiment analysis model that combines dual graph convolution and a Gated Linear Unit(GLU). The model first uses the global vocabulary map to encode word co-occurrence information in the corpus, and thereafter uses the classification summary structure to distinguish the frequency of co-occurrence of various words and different types of syntactic dependencies on the vocabulary and syntax maps. Double-layer convolution is thereafter performed on the two graphs, and the BiAffine transform module is used as a bridge to effectively exchange relevant features between the two Graph Convolution Network(GCN) modules, thus effectively integrating syntactic and lexical information. Finally, the GLU is used to control the flow of sentiment information to a given aspect such that the model can focus more on analyzing the sentiment information related to this aspect and avoid irrelevant sentiment information from affecting the sentiment analysis results of a given aspect, thus improving the accuracy of the analysis. The experimental results demonstrate that on the four datasets of Twitter, Laptop14, Restaurant15, and Restaurant16, the accuracy of the model reached 74.82%, 77.61%, 82.29%, and 89.81%, respectively, and the F1 value reached 72.97%, 73.52%, 67.72%, and 73.37%, respectively. The aspect-level sentiment classification performance is significantly better than those of the other baseline models.

  • Shaocong MO, Qingfeng CHEN, Ze XIE, Chunyu LIU, Junlai QIU
    Computer Engineering. 2024, 50(4): 150-159. https://doi.org/10.19678/j.issn.1000-3428.0067814
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Entity alignment is an effective approach for multi-source database fusion with the aim of identifying co-referring entities in multi-source knowledge graphs. Recently, Graph Convolutional Network (GCN) have emerged as a new paradigm for entity alignment representation learning. However, there are significant differences in the objectives and rules for constructing knowledge graphs in different organizations, which require entity alignment models to accurately explore the long-tail entity features among knowledge graphs. Moreover, existing GCN entity alignment models focus overly on the structural representation of relationship triplets and neglect the rich semantic information of the attribute triplets. Accordingly, an entity alignment model is proposed that introduces a dynamic graph attention network to aggregate the attribute structure triplet representations and reduce the impact of irrelevant attribute structures on the entity representations. Simultaneously, to alleviate the problem of heterogeneous relationships in knowledge graphs, multi-dimensional label propagation is introduced to compress the different dimensions of the entity adjacency matrix. The entity features are propagated along the compressed knowledge graph adjacency relationship to obtain a relationship structure representation. Finally, a linear programming algorithm is used to iterate the entity representation similarity matrix to obtain the final alignment result. Experiments are conducted on publicly available datasets EN-FR-15K, EN-ZH-15K, and the Chinese medical dataset MED-BBK-9K, and the results demonstrate that the Hits@1 of the model are 0.942, 0.926, and 0.427, the Hits@10 are 0.963, 0.952, and 0.604, and the Mean Reciprocal Rank (MRR) values are 0.949, 0.939, and 0.551, respectively. The ablation experimental results verify the effectiveness of each module in the model.

  • Haipeng WU, Yurong QIAN, Hongyong LENG
    Computer Engineering. 2024, 50(4): 160-167. https://doi.org/10.19678/j.issn.1000-3428.0067700
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Conventional relation extraction methods identify the relationships between pairs of entities from plain text, whereas multimodal relation extraction methods enhance relation extraction by leveraging information from multiple modalities. To address the issue of existing multimodal relation extraction models being easily disturbed by redundant information when processing image data, this study proposes a multimodal relation extraction model based on a bidirectional attention mechanism. First, Bidirectional Encoder Representations from Transformers(BERT) and a scene graph-generation model are used to extract textual and visual semantic features, respectively. Subsequently, a bidirectional attention mechanism is employed to establish bidirectional alignment between images and text, and from text to images, thus facilitating bidirectional information exchange. This mechanism assigns lower weights to redundant information in images, thereby reducing interference to the semantic representation of text and mitigating the adverse effect of redundant information on the result of relation extraction. Finally, the aligned textual and visual feature representations are concatenated to form integrated text and image features. A Multi-Layer Perceptron(MLP) is used to calculate the probability scores for all relation classifications and output the predicted relations. Experimental results on a Multimodal dataset for Neural Relation Extraction(MNRE) show that the model achieves precision, recall, and F1 scores of 65.53%, 69.21%, and 67.32%, respectively, which are significantly higher than those of baseline models, thus demonstrating its effective improvement in relation extraction.

  • Hongchen ZHANG, Linyu LI, Li YANG, Chenjun SAN, Chunlin YIN, Bing YAN, Hong YU, Xuan ZHANG
    Computer Engineering. 2024, 50(4): 168-176. https://doi.org/10.19678/j.issn.1000-3428.0067543
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    A knowledge graph is a structured knowledge base comprising various types of knowledge or data units obtained through extraction and other processes. It is used to describe and represent information, such as entities, concepts, facts, and relationships. The limitations of Natural Language Processing(NLP) technology and the presence of noise in the texts of various knowledge or information units affect the accuracy of information extraction. Existing Knowledge Graph Completion(KGC) methods typically account for only single structural information or text semantic information, whereas the structural and text semantic information in the entire knowledge graph is disregarded. Hence, a KGC model based on contrastive learning and language model-enhanced embedding is proposed. The input entities and relationships are obtained using a pretrained language model to obtain the textual semantic information of the entities and relationships. The distance scoring function of the translation model is used to capture the structured information in the knowledge graph. Two negative sampling methods for contrastive learning are used to fuse contrastive learning to train the model to improve its ability to represent positive and negative samples. Experimental results show that compared with the Bidirectional Encoder Representations from Transformers for Knowledge Graph completion(KG-BERT) model, this model improves the average proportion of triple with ranking less than or equal to 10(Hits@10) indicator by 31% and 23% on the WN18RR and FB15K-237 datasets, respectively, thus demonstrating its superiority over other similar models.

  • Graphics and Image Processing
  • Mingxu MA, Hong MA, Huawei SONG
    Computer Engineering. 2024, 50(4): 177-186. https://doi.org/10.19678/j.issn.1000-3428.0067733
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    To address the problem that existing attitude estimation algorithms are not effective in detecting small target pedestrians in an urban streetscape, this study proposes a pose estimation algorithm for small target pedestrian, YOLO-Pose-CBAM, based on YOLO-Pose. First, the CBAM attention mechanism module is introduced to enhance the ability of the network to focus on small target pedestrian areas and improve the sensitivity of the algorithm to small target pedestrians on the premise of not increasing the computation excessively. Simultaneously, four detection heads of different sizes are used in the trunk network to enrich the detection means of the algorithm for pedestrians of different sizes. Second, two cross layer cascading channels are constructed between the Backbone and Neck, which improves the feature fusion ability between the shallow and deep networks, further enhancing the information exchange and reducing the missed rate of small target pedestrians. Furthermore, the SIoU is introduced to redefine the location loss function of the boundary box regression, which can accelerate the convergence speed of the training and improve the detection accuracy. Finally, the k-means++ algorithm is used instead of the k-means algorithm to cluster the tagged anchor frames in the dataset, avoiding the local optimal solution problem caused by the initialization of the clustering center to select the anchor frame that is more suitable for detecting small target pedestrians. Compared with the experimental results, the Average Precision(AP) of the proposed algorithm for the small target pedestrian WiderKeypoints dataset is improved by 4.6 percentage points compared with that of YOLO-Pose and by 6.5 percentage points compared with that of YOLOv7-Pose.

  • Yu AN, Haibo GE, Wenhao HE, Sai MA, Mengyang CHENG
    Computer Engineering. 2024, 50(4): 187-196. https://doi.org/10.19678/j.issn.1000-3428.0067601
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    To tackle prevalent challenges in visual object tracking, including variations in target size, motion blur, occlusion, and interference from similar objects, the Compensatory Dual Attention Mechanism (CDAM)-Siam was introduced. This Siamese network tracking algorithm leverages a compensatory attention mechanism for enhanced performance. First, the ResNet-50 network is used to construct the backbone network of the Siamese network for feature extraction at different levels, deepening the network while fully utilizing the features extracted from different layers. The CDAM-Siam algorithm integrates a compensatory dual attention network, enhancing key features and reducing-edge details to improve robustness in complex environments. Finally, a feature fusion network is constructed and added to the backbone network to effectively fuse feature maps from different levels to obtain high-resolution and informative feature maps, ultimately achieving accurate target tracking. After training the CDAM-Siam algorithm on the GOT-10K and YouTube-BB datasets, the detection was performed on the OTB100 dataset. The results showed that the tracking success rate and accuracy of CDAM-Siam were 68.3% and 89.5%, respectively. Despite challenges, the algorithm maintains strong performance, tracking at up to 56 frames per second for real-time requirements. On the VOT2018 dataset, it achieves 53.8% accuracy, 39.4% robustness, and a 26.5% Expected Average Overlap (EAO).

  • Ruikang LIU, Weiming LIU, Mengfei DUAN, Wei XIE, Yuan DAI
    Computer Engineering. 2024, 50(4): 197-207. https://doi.org/10.19678/j.issn.1000-3428.0067217
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Recently, Transformers have achieved more competitive results than Convolutional Neural Network(CNN) in foreign object detection owing to their global self-attention advantages. However, they still face problems such as high computing costs, a fixed scale of input image patches, and less interaction between local and global information. To address the aforementioned challenges, a DualFormer model that incorporates a dual-channel Transformer backbone, pyramid lightweight Transformer blocks, and a channel cross-attention mechanism is proposed. The model aims to detect foreign objects in the gap between metro platform screen and train doors. A dual-channel strategy is proposed to address the fixed input image patch size issue by designing two feature extraction channels to extract features from input image patches of various scales, thus improving the ability of the network to extract both coarse-grained and fine-grained features and enhancing the recognition accuracy of multiscale targets. To address the issue of high computational cost, a pyramid lightweight Transformer block is proposed, which introduces cascaded convolution into the Multi-Head Self-Attention(MHSA) module and leverages the dimensionality compression capability of the convolution to decrease the computational cost of the model. Regarding the issue of limited interaction between local and global information, a channel cross-attention mechanism is proposed, which allows coarse-grained and fine-grained features to interact at the channel level and optimizes the weight allocation of local and global information in the network. The results demonstrate that DualFormer has a mean average precision of 89.7% on the standardized metro anomaly detection dataset with a detection speed of 24 frame/s and 1.98×107 model parameters, which is superior to those of existing Transformer detection algorithms.

  • Jie BAI, Yan ZHAO
    Computer Engineering. 2024, 50(4): 208-218. https://doi.org/10.19678/j.issn.1000-3428.0066413
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Currently, several algorithms used in image hashing can only handle grayscale images. In this study, a hashing algorithm based on quaternion Laguerre moments and a three-dimensional energy structure is proposed to improve the range of application and performance of the image hashing algorithm, particularly its robustness against rotation attacks. First, the input color image is preprocessed, multiscale fusion is performed, and the Laguerre moment coefficients are extracted from the fused image as the global features of the image. The energy information of the fused image is used to establish a model in the YCbCr color space, and the angle between the energy peak and valley points connected with the horizontal plane at different viewpoints in the three-dimensional(3D) model is selected as the local structure feature. The features with rotational invariance are extracted by the positions of the near and far points on the specific points and each contour of the 3D model. Finally, the global and 3D structural features are combined, quantized, and encrypted to generate hash sequences. Finally, the global and 3D structural features are combined to quantify and encrypt the generated hash sequences. The results show that the subject operating characteristic curve exhibits a correct acceptance rate of 0.999 2 when the error reception rate is 0. A Hash sequence length of 120 bit possesses optimal compactness, and the average computation time reaches 0.097 9 s. In copy detection experiments, the algorithm performs multiple extraction experiments with an average check-all rate, and the average check rate and accuracy rate of the algorithm for multiple extraction experiments are higher than 95.83%.

  • Zhenlu LI, Wei HUANG, Kai SUN
    Computer Engineering. 2024, 50(4): 219-227. https://doi.org/10.19678/j.issn.1000-3428.0067576
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Road-target recognition is a core technology used in intelligent transportation systems to solve urban congestion problems. However, existing algorithms exhibit unsatisfactory recognition performance in complex traffic environments, with numerous missed and false detections. Moreover, the model parameters are large, thus rendering them unsuitable for deployment on resource-limited mobile devices in practical scenarios. Hence, a lightweight road-target-recognition algorithm for complex environments is proposed in this study. A reconfigurable feature-extraction framework is designed based on the structure of the Single Shot Multi-Box Detector (SSD) algorithm. Three lightweight modules are used to construct shallow feature-extraction networks, and a custom Additional Block is used to construct deep-feature-extraction networks. The channel attention mechanism and a Lightweight Receptive Field Expansion (RFB-L) module are used to improve the detection performance of the model on targets of various sizes. Utilizing a custom pixel and a channel-information-fusion module to combine shallow and deep features enriches the information in the detection feature map. Meanwhile, a multi-feature, fusion learning-rate-adjustment algorithm is proposed to ensure the stable convergence of the model during training. A custom-developed dataset reflecting the complex and congested road of Hohhot_city is used to train and test the proposed algorithm. Comparative experimental results yielded by mainstream algorithms show that the proposed algorithm performs significantly better than YOLOv4-tiny and YOLOv5s algorithms under the same number of parameters. Its detection accuracy is similar to that of the YOLOv5m algorithm when the parameters are less than 40%. Additionally, its inference time and mean Average Precision (mAP) are 12.8 ms and 99.1%, respectively.

  • Liqun CUI, Huawei CAO
    Computer Engineering. 2024, 50(4): 228-236. https://doi.org/10.19678/j.issn.1000-3428.0067790
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Although target detection technology has advanced, many challenges still exist in the detection of remote-sensing images. An improved YOLOv5-based remote-sensing image target detection algorithm is proposed to address the issues of low target detection accuracy caused by complex backgrounds, large target scale differences, and arbitrary target orientation in remote-sensing images. First, a joint multiscale feature enhancement network with attention is constructed to fully fuse high-level and low-level features such that the feature layers contain semantic and rich detailed information. During the fusion process, the designed feature focusing module is used to help the model select key features and suppress irrelevant information. Second, a Receptive Field Block(RFB) is used to update the fused feature map and expand the receptive field of the feature map to reduce feature information loss. Finally, by adding rotation angles to the targets and using circular smooth labels to transform the regression problem into a classification problem, the accuracy of remote-sensing target localization is improved. The experimental results on the a large-scale Dataset for Object deTection in Aerial images(DOTA) show that compared with the YOLOv5 algorithm, the mean Average Precision(mAP) when the Intersection over Union (IoU) values of the proposed algorithm are 0.5 and 0.5-0.95 (mAP@0.5 and mAP@0.5∶0.95) increase by 7.3 and 3.3 percentage points, respectively. This can significantly improve the detection accuracy of remote-sensing image targets in a complex background and improve the missing and false detection of remote-sensing targets.

  • Yudan YANG, Junhua ZHANG, Yunfeng LIU
    Computer Engineering. 2024, 50(4): 237-246. https://doi.org/10.19678/j.issn.1000-3428.0067751
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    The automatic segmentation of spine Computed Tomography(CT) images can assist doctors in diagnosing related diseases. Compared to Three-Dimensional(3D) reconstruction after Two-Dimensional(2D) segmentation, the 3D segmentation method is more convenient and can retain the spatial information of the image. To address the problem of the low accuracy of 3D spine segmentation, a U-Net based on 3D recurrent residual convolution to segment CT images of the spine is proposed in this study. A coordination attention mechanism is introduced in the network front to focus the network on the region of interest. A 3D recurrent residual module is used instead of a typical convolution module to accumulate features effectively and mitigate gradient disappearance. An efficient connected hybrid convolution module is added to preserve the tiny features. The dual-feature residual attention module is used instead of the jump connection for multiscale fusion to fuse semantics between high and low levels, and the global context is modeled by aggregating the features of different levels to improve the segmentation performance. First, the model is tested on the public datasets of CSI2014, and compared with other 3D segmentation networks and different spine segmentation methods, the Dice Similarity Coefficient(DSC) reaches 93.85%, which is 1.77-7.65 percentage points higher than those of other six segmentation networks and 1.67-10.85 percentage points higher than those of other spine segmentation methods. The model is also tested on the local lumbar dataset, and the DSC is increased by 1.51-19.86 percentage points compared with those of the other six segmentation models, verifying the effectiveness of the method proposed in this study and the feasibility of applying it to computer-aided diagnosis and treatment.

  • Yanhong LIU, Qiuxiang YANG, Shuai HU
    Computer Engineering. 2024, 50(4): 247-257. https://doi.org/10.19678/j.issn.1000-3428.0068583
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Haze, formed by the accumulation and concentration of atmospheric pollutants under meteorological conditions, such as temperature inversion, severely limits visibility. Image dehazing techniques aim to eliminate issues caused by haze, such as image blur and low contrast, thereby enhancing image clarity and visibility. However, challenges persist regarding the loss of image details. To address this issue, a feature difference-based multi-scale feature fusion dehazing network known as FD-CA dehaze is proposed in this study. In this network, the basic block structure of the FFA-Net is enhanced by extracting intermediate feature information from the feature difference, coordinate, and channel dimensions. An Effective Coordinate Attention (ECA) module that combines global pooling, max pooling, and coordinate positional information is introduced. This module mitigates the positional information loss during feature fusion. By integrating channel attention with the ECA module, a Dual Attention (D-CA) model that enables better utilization of spatial and channel information is constructed. Consequently, the model exhibits enhanced performance in image dehazing tasks. Furthermore, the loss function is improved by combining L1 loss function with perceptual loss. Experimental results on the Synthetical Objective Test Set (SOTS) and Hybrid Subjective Test Set (HSTS) demonstrate that the FD-CA dehaze network achieves a Peak Signal-to-Noise Ratio (PSNR) of 37.93 dB and a Structural Similarity Index (SSIM) of 0.990 5. Experimental results demonstrate that compared to classic dehazing networks such as FFA-Net and GridDehazeNet, FD-CA dehaze achieves significant improvement and better dehazing performance.

  • Development Research and Engineering Application
  • Zhengxue LI, Zhiming LI, Dezhong PENG, Jie CHEN
    Computer Engineering. 2024, 50(4): 258-266. https://doi.org/10.19678/j.issn.1000-3428.0067327
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    The user classification of social networks aims to determine the interests and hobbies of users through their personal attributes and social relations. This can be regarded as a node classification problem for graph data. Most node classification algorithms based on Graph Convolutional Neural Network(GCN) can handle datasets with high heterogeneity. However, social network datasets often exhibit high heterogeneity rate. This study proposes a feature Contrastive Learning-based GCN(CLGCN) model to alleviate this problem. A similarity matrix is constructed from the combined labels during the pretraining stage and used to perform graph convolution operation. Node pairs of features are defined as positive or negative sample pairs based on whether they belong to the same or different categories, respectively, using feature contrastive learning. Consequently, the representations of node pairs from the same category become more similar, whereas those of node pairs from different categories become more distinguishable by minimizing the loss function of feature contrastive learning. The experimental results on three low homogeneity rate social network datasets demonstrate that the accuracies of the proposed model for node classification are 93.5%, 81.4%, and 67.9%, respectively, which are all better than those of the other comparative models.

  • Yiheng ZHANG, Yian LIU, Hailing SONG
    Computer Engineering. 2024, 50(4): 267-276. https://doi.org/10.19678/j.issn.1000-3428.0067092
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Frequency-hopping technology has excellent anti-jamming and multiple access networks. Frequency-Hopping Sequence(FHS) is faced with the problems of poor performance index and difficulty in considering multiple indexes in design. Therefore, a design method of FHS based on an Enhanced Runge-Kutta optimizer(ERUN) is proposed. First, an objective function is constructed based on the Hamming correlation, complexity, uniformity, and average frequency-hopping interval of the FHS, and a design model of the FHS suitable for a heuristic optimization algorithm is established. Thereafter, aiming at the slow convergence speed and poor optimization accuracy of the Runge-Kutta optimizer(RUN) in complex optimization problems, the ERUN is proposed. The ERUN uses chaos opposition-based learning to improve the quality of the initial population, obtains a better individual update direction based on the quadratic interpolation method, and helps the population jump out of the local optimum through an adaptive t-distribution perturbation. The test results for the six benchmarks and objective functions demonstrate that ERUN has a faster convergence speed and higher solution accuracy than the three latest RUN variants. The obtained FHS is applied to a frequency-hopping system. The experimental results demonstrate that the Bit Error Rate(BER) of this method is approximately 4% in a fixed jamming environment and does not increase significantly in a changing jamming environment, demonstrating strong anti-jamming ability and complex environmental adaptability.

  • Wenbin WANG, Zhenjiang QIAN, Yong JIN, Gaofei SUN, Xiaoshuang XING, Chao SU, Tianqi SUN
    Computer Engineering. 2024, 50(4): 277-285. https://doi.org/10.19678/j.issn.1000-3428.0067091
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    To construct a trusted operating system, the correctness of file system design and implementation is essential. Even file systems that are widely used can have bugs. Using formal methods to verify the correctness of file system design and implementation is feasible. Most of the current formal verification research conducted on file systems are based on macro-kernel operating systems, whereas the verification of file systems under a micro-kernel operating system architecture is lacking. Accordingly, in this study, a formal design and verification method for a file system using an inline data mechanism with a micro-kernel architecture is proposed. Based on the Higher-Order Logic(HOL) and automaton models, the working state of the file system is constructed by abstracting the working objects and system resources in the file system into system objects, and the functional semantics of the related system calls of the file system are formally described. The process of invoking and providing the service is abstracted as the process of transitioning the system working state, and asserting the correctness of the file system function and security attributes is given. Using the implemented microkernel Verified Secure Operating System(VSOS) file system called Verified Secure File System(VSFS) as an example, the finite state machine model of VSFS is built in the design stage, Portable Operating System Interface of UNIX(POSIX) system of the VSFS is abstractly described in the Isabelle/HOL call, correctness assertion of the VSFS file system is analyzed and summarized, and a theorem proof is used to verify the correctness of VSFS. The experimental results depict that the proposed method can complete the fine-grained formal verification of the VSFS finite-state machine model in Isabella/HOL and meet the expected security requirements.

  • Shuai HU, Hualing LI, Dechen HAO
    Computer Engineering. 2024, 50(4): 286-293. https://doi.org/10.19678/j.issn.1000-3428.0067779
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Medical image segmentation accuracy plays a key role in clinical diagnosis and treatment. However, because of the complexity of medical images and diversity of target regions, existing medical image segmentation methods are limited to incomplete edge region segmentation and insufficient use of image context feature information. An improved Multistage Edge-Enhanced(MEE) medical image segmentation network of the U-Net, known as MDU-Net model, is proposed to solve these problems. First, a MEE module is added to the encoder structure to extract double-layer low-stage feature information, and the rich edge information in the feature layer is obtained by expanding the convolution blocks at different expansion rates. Second, a Detailed Feature Association(DFA) module integrating the feature information of adjacent layers is embedded in the skip connection to obtain deep-stage and multiscale context feature information. Finally, the feature information extracted from the different modules is aggregated in the corresponding feature layer of the decoder structure, and the final segmentation result is obtained by an upsampling operation. The experimental results on two public datasets show that compared with other models, such as Transformers make strong encoders for medical image segmentation(TransUNet), the MDU-Net model can efficiently use the feature information of different feature layers in medical images and achieve an improved segmentation effect in the edge region.

  • Yuanfei DENG, Jiawei LI, Yuncheng JIANG
    Computer Engineering. 2024, 50(4): 294-302. https://doi.org/10.19678/j.issn.1000-3428.0067595
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    A patent is a legal right conferred to inventors to protect their inventions for a limited time, and it plays a crucial role in present-day social activities. Existing research has not optimized the adaptation of patent similarity data, which has negatively affected matching patent phrase similarity. Previous research has shown that in low-resource scenarios, prompt learning uses text fragments (i.e., templates) as input, transforming the classification problem into a mask language modeling problem; here, a key step is to construct a projection between the label space and label word space. This study presents a knowledge-based prompt learning method and applies it to the similarity matching of patent phrases. To solve the problem of insufficient information related to patent phrases, this study uses similarity label information in patent phrases and knowledge to enhance the patent phrases and label information. This study first establishes the relationship between patent phrases and external knowledge using entity-linking technology. The study then designs a neighborhood information filtering mechanism based on the degree of entity influence to expand the problem of insufficient patent phrase information. Finally, based on the effects of different types of external knowledge on the similarity calculation of patent phrases, the study generates a variety of enhanced prompt text applied to patent phrases. Experimental results show that the Pearson Correlation Coefficient (PCC) and Spearman Rank Correlation (SRC) of the proposed method are increased by 6.8% and 5.7%, respectively, as compared with the suboptimal method.

  • Anzheng WANG, Jianwu DANG, Biao YUE, Jingyu YANG
    Computer Engineering. 2024, 50(4): 303-312. https://doi.org/10.19678/j.issn.1000-3428.0067758
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Road cracks are the main cause of highway safety problems. Traditional crack detection is typically based on manual detection, which faces problems such as low efficiency and insecurity. In addition, the existing deep learning detection model causes incomplete crack detection when facing interference factors, such as shadow occlusion and complex backgrounds. To address these problems, a road crack detection model based on location information and an attention mechanism, known as PA-TransUNet, is proposed. First, the hybrid encoder receives the input image, extracts the crack feature information, and introduces the position information of the query, key, and value to improve the ability of the self-attention mechanism in the encoder Transformer to capture the crack shape and compensate for the loss of feature information. Subsequently, the crack features are input into the decoder for upsampling, and an Attention Gating-based Decoding Module(AGDM) is designed to strengthen the learning of crack regions by suppressing non-crack regions and improving the accuracy and integrity of crack detection. The experimental results demonstrate that the F1 values of the PA-TransUNet model on the CrackForest Dataset(CFD) and Cracktree200 public datasets reach 87.44% and 82.58%, respectively. In addition, to further test the crack detection ability of the PA-TransUNet model in practical engineering, an F1 value of 88.68% is achieved on the self-made Unmanned Aerial Vehicle Cracks(UAV Cracks) dataset, which shows that it can better meet the needs of crack detection in practical engineering.

  • Lin WANG, Hao HUANG
    Computer Engineering. 2024, 50(4): 313-320. https://doi.org/10.19678/j.issn.1000-3428.0067272
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Pre-trained models have achieved significant breakthroughs in nonparallel corpus Voice Conversion (VC) via Self-Supervised Pre-trained Representation (SSPR). Features extracted by pre-trained models contain a significant amount of content information owing to the widespread use of SSPR. This study proposes a VC model based on the combination of SSPR Vector Quantization (VQ) and Connectionist Temporal Classification (CTC). It uses the SSPR extracted from a pre-trained model as input to improve the quality of single VC. The effective decoupling of content and speaker representations has become a key issue in VC. Using SSPR as the initial content information, VQ is performed to decouple content and speaker representations from speech. However, performing only VQ discretizes only the content information, thus rendering it difficult to separate pure content representations from speech. To further eliminate speaker-invariant information from the content information, a CTC loss-guided content encoder is proposed. CTC not only serves as an auxiliary network to accelerate model convergence but also its additional text supervision can be jointly optimized with VQ to achieve complementary performance and learn pure-content representations. Speaker representations adopt style-embedding learning, and two representations are used as inputs for VC in the system. The proposed method is evaluated on the open-source CMU dataset and VCTK corpus. Experimental results show that the proposed method achieves an objective Mel-Cepstrum Distortion(MCD) of 8.896 dB, as well as subjective Mean Opinion Score (MOS) of speech naturalness and speaker similarity of 3.29 and 3.22, respectively, both of which are better than those of the baseline model. This method achieves the best performance in terms of VC quality and speaker similarity.

  • Qiang SONG, Junlong TANG, Zhaoyun CHEN, Yang SHI, Qixuan TAN, Ziyang XIAO, Wanghui ZOU
    Computer Engineering. 2024, 50(4): 321-331. https://doi.org/10.19678/j.issn.1000-3428.0067000
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    National University of Defense Technology independently developed a high-performance accelerator that uses an on-chip heterogeneous fusion architecture of a Central Processing Unit(CPU) and General Purpose Digital Signal Processor(GPDSP). The GPDSP, with its Very Long Instruction Word(VLIW)+ Single Instruction Multiple Datastream(SIMD) vectorization structure, is the main support for the peak performance acceleration core. However, mainstream compilers cannot adequately support high-performance accelerators in intensive data calculation instruction layouts, static allocation of hardware execution units for instructions, and GPDSP-specific vector instructions. In this study, based on the Low Level Virtual Machine(LLVM) compilation framework, the PERP method, Ant Colony Optimization(ACO) algorithm, and GPDSP structural characteristics are combined to optimize the cost model in the pre-RA-sched stage, and the instruction scheduling module is designed to support register pressure awareness. This study proposes an instruction scheduling strategy that supports static functional unit allocation in the post-RA-sched stage, which guarantees correct functional unit allocation through a conflict detection mechanism, and provides a software basis for the parallel execution of instructions. Furthermore, a series of rich and regular vector instruction interfaces are encapsulated in the backend to support the GPDSP vector instructions. The experimental results demonstrate that the LLVM compilation architecture optimization method proposed in this study provides good support for the GPDSP in terms of both functionality and performance. Specifically, the overall performance average speedup ratio of GCC testsuite is 4.539, the overall performance average speedup ratio of SPEC CPU 2017 floating-point test is 4.49, and the overall performance average speedup ratio of SPEC CPU 2017 integer test is 3.24. Additionally, the vector program using vector interfaces achieves an average performance improvement ratio of 97.1%.

  • Yutao HOU, Abulizi Abudukelimu, Yaqing SHI, Musideke Mayilamu, Abudukelimu Halidanmu
    Computer Engineering. 2024, 50(4): 332-341. https://doi.org/10.19678/j.issn.1000-3428.0068700
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    With the development of the ″Belt and Road″ initiative, the demand for cross-language communication between countries and regions along the ″Belt and Road″ has grown, and Machine Translation(MT) technology has gradually become an important means of in-depth exchange between countries. However, owing to the abundance of low-resource languages and scarcity of language materials in these countries, progress in machine translation research has been relatively slow. This paper proposes a low-resource language machine translation training method based on the NLLB model. An improved training strategy based on a multilingual pre-training model is deployed to optimize the loss function under the premise of data augmentation, thereby effectively improving the translation performance of low-resource languages in machine translation tasks. The ChatGPT and ChatGLM models are used to evaluate translation performance for Laotian-Chinese and Vietnamese-Chinese, respectively. Large Language Models (LLM) are already capable of translating low-resource languages, and the ChatGPT model significantly outperforms the traditional Neural Machine Translation (NMT) model in Vietnamese-Chinese translation tasks. However, the translation of Laotian requires further improvement. The experimental results show that compared to the NLLB-600M baseline model, the proposed model achieves average improvements of 1.33 in terms of BiLingual Evaluation Understudy (BLEU) score and 0.82 in terms of chrF++ score in Chinese translation tasks for four low-resource languages. These results fully demonstrate the effectiveness of the proposed method in low-resource language machine translation. In another experiment, this method uses the ChatGPT and ChatGLM models to conduct preliminary studies on Laotian-Chinese and Vietnamese-Chinese, respectively. In Vietnamese-Chinese translation tasks, the ChatGPT model significantly outperformed the traditional NMT models with a 9.28 improvement in BLEU score and 3.12 improvement in chrF++ score.

  • Chenzhi LONG, Ping CHEN, Chuankun LI
    Computer Engineering. 2024, 50(4): 342-349. https://doi.org/10.19678/j.issn.1000-3428.0067715
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Despite advancements, existing multi-person 2D pose estimation methods cannot effectively identify the poses of small objects. A multi-person pose estimation method that integrates global and local contextual information is proposed to address this problem. The method uses the different scale features output by a High-Resolution Network (HRNet) to roughly locate multiple anatomical centers of the human body, thereby providing more supervisory information to small objects through multiple center points to improve their localization ability. The coordinates of the human center point are used as a clue to extract local contextual information of different scales near the center point through deformable sampling, whereby the comparative loss between the local contextual information of different objects is calculated to improve the discriminative ability between objects. Using the low-resolution features of HRNet as global contextual information and local contextual information as cross-attention queries, a multilayer Transformer model is constructed by combining global and local contextual information to enhance the contextual information of small objects. This enhanced information is then used as clustering centers, and multi-scale fusion features are decoupled to obtain keypoint heatmaps corresponding to different objects to achieve multi-person pose estimation of small objects. The experimental results show that the propoesd method can effectively improve the recognition performance of small object poses, realizing an Average Precision (AP) of 69.0% on the COCO test-dev2017 dataset and an APM improvement of 1.4 percentage points compared to Dual Anatomical Centers(DAC).

  • Kuanguang XU, Dongyu HE, Bing HAN, Yujia LIU, Jiadong LI
    Computer Engineering. 2024, 50(4): 350-356. https://doi.org/10.19678/j.issn.1000-3428.0068471
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Correct identification and inspection of steel plate numbers are major conditions for achieving automated production in production lines. In recent years, many production lines have been equipped with inkjet printers at the material preparation positions for automatically marking material numbers. Spray-printed handwriting is clear and heat-resistant, and the use of steel plate number recognition equipment can achieve a recognition rate of nearly 100% without application. However, due to equipment failures or limited funding and space, installing printing equipment and relying only on manual handwriting to mark numbers on the surfaces of steel plates are sometimes impossible. Compared with spray-printed numbers, handwritten numbers involve complex features, such as arbitrary writing, continuous strokes, and distorted handwriting, which limit the accuracy of the recognition system. Due to poor recognition performance, relying on manual visual inspection to assist in recognition is often necessary, which affects the implementation of material-tracking automation. To improve the recognition of handwritten steel plate numbers, this study introduces improvements to the traditional machine learning Optical Character Recognition (OCR) text-region detection algorithm. An algorithm for image enhancement and distortion correction is also proposed based on the characteristics of handwritten steel plate numbers. These algorithms are designed to improve the image quality and shapes of handwritten steel plate numbers, thereby increasing recognition accuracy. Overall, the study aims to improve the recognition of handwritten steel plate numbers to solve the difficulties associated with automated production. Through image enhancement and correction, the recognition system can process handwritten steel plate numbers more effectively, further promoting the automated implementation of material tracking.