Author Login Editor-in-Chief Peer Review Editor Work Office Work
Home Browse Just accepted

Just accepted

Accepted, unedited articles published online and citable. The final edited and typeset version of record will appear in the future.
Please wait a minute...
  • Select all
    |
  • YANG Xingrui, MA Bin, LI Senyao, ZHONG Xian
    Accepted: 2024-04-19
    In the field of natural language processing, large language models are currently witnessing vigorous development. However, in the process of application in educational digitization, a series of important challenges still exist. Aiming at addressing the problem posed by the scarcity of domain-specific data, unstable summarization leading to information loss or redundancy, a lightweight idempotent model framework, IGLM, is introduced for educational text summarization. The model first employs multi-source training for adaptive augmentation to enhance data diversity. Subsequently, various fine-tuning procedures are applied to the downstream text summarization task. Concurrently, an idempotent summarization generation strategy is designed to mitigate the impact of text length, the initial summaries are brought closer to idempotent summaries to constrain the model and mitigate biases resulting from uneven language corpora and combining quantization techniques to generate more precise and fluent summary texts under low-resource conditions. The experiments use ROUGE F1 scores as the evaluation metric and validate on the publicly available Chinese text summarization datasets, LCSTS, EDUCATION and NLPCC. The results of experiments reveal significant enhancements in precision and coherence within this framework. Specifically, in comparison to the baseline model, the ROUGE-1/2/L scores experienced respective increases of 7.9, 7.4, and 8.7 on the LCSTS dataset. Moreover, on the EDUCATION dataset, the scores exhibited enhancements of 12.9, 15.4, and 15.7 for ROUGE-1/2/L, respectively. Similarly, on the NLPCC dataset, there were improvements of 12.2, 11.7, and 12.7 for ROUGE-1/2/L, respectively. This validation confirms the model's efficacy, offering a robust solution for educational digitalization tasks.
  • ZHENG Yazhou, LIU Wanping, and HUANG Dong
    Accepted: 2024-04-19
    To address the issue of poor detection performance of existing methods on short domain names, a detection approach combining BERT-CNN-GRU with an attention mechanism is proposed. Initially, BERT is employed to extract the effective features and inter-character composition logic of the domain name. Subsequently, a parallel fusion simplifying attention Convolutional Neural Network (CNN) and Gated Recurrent Network (GRU) based on the multi-head attention mechanism are used to extract deep features of the domain name. The CNN, organized in an n-gram arrangement, can extract domain name information at different levels. Batch Normalization (BN) is applied to optimize the convolution results. GRU is utilized to better capture the composition differences of domain names before and after, and the multi-head attention mechanism excels in capturing the internal composition relationships of domain names. Concatenating the results of parallel detection network output, by maximizing the advantages of both networks and employing a local loss function to focus on the domain name classification problem, the classification performance is ultimately improved. Experimental results demonstrate that the model achieves optimal performance in binary classification. Specifically, on the short-domain multi-classification dataset, the Weighted F1-score for 15 categories reaches 86.21%, surpassing the BiLSTM-Seq-Attention model by 0.88%. On the UMUDGA dataset, the Weighted F1-score for 50 categories reaches 85.51%, representing an improvement of 0.45%. Moreover, the model exhibits outstanding performance in detecting variant domain names and word DGA, showcasing the ability to handle imbalances in domain name data distribution and a broader range of detection capabilities.
  • Wang Qian, Zhang Junhua, Wang Zetong and Li Bo
    Accepted: 2024-04-19
    The three-dimensional model of the spine plays an important role in treating spinal disorders such as scoliosis. However, traditional methods for spinal three-dimensional reconstruction suffer from issues such as long processing time, subjectivity, and high radiation exposure. We propose a spinal three-dimensional reconstruction network based on biplanar X-ray images, termed X2S-Net. The network takes the anteroposterior and lateral X-ray images of the patient as input and reconstructs the corresponding voxel model of the spine using a parallel encoder, a three-dimensional reconstruction module, and a segmentation supervision module, achieving end-to-end generation from X-ray images to visualized three-dimensional models. In the feature extraction stage, X2S-Net employs a parallel feature encoder designed for the characteristics of biplanar X-ray images to extract spatial information of the spine and incorporates a multi-scale channel attention mechanism for feature extraction. In the three-dimensional modeling stage, X2S-Net combines traditional image segmentation tasks with a segmentation supervision module to improve the three-dimensional reconstruction results. The experimental results on the dataset demonstrate that this method effectively utilizes the input information from biplanar X-ray images for three-dimensional reconstruction of the spine, achieving an average Hausdorff distance of 6.95mm and a Dice coefficient of 92.01% across the datasets.
  • WU Ruolan, CHEN Yulin, DOU Hui, ZHANG Yangwen, LONG Zhong
    Accepted: 2024-04-19
    Federated Learning, as an emerging distributed learning framework, facilitates multiple clients to collectively engage in global model training without sharing raw data, thus effectively safeguarding data privacy. Nevertheless, traditional Federated Learning still harbors latent security vulnerabilities, susceptible to the threats of poisoning attacks and inference attacks. Therefore, to enhance the security and model performance of Federated Learning, it becomes imperative to precisely identify and mitigate malicious client behaviors, while employing gradient noise as a countermeasure to prevent attackers from gaining access to client data through gradient monitoring. This paper proposes a robust Federated Learning framework that combines mechanisms for malicious client detection with local differential privacy techniques. Specifically, the algorithm initially employs gradient similarity for the identification and classification of potential malicious clients, thereby minimizing their adverse impact on model training tasks. Subsequently, a dynamic privacy budget is designed based on local differential privacy, accommodating the sensitivity of different queries and individual privacy requirements, with the objective of achieving a balance between privacy preservation and data quality. Experimental results conducted on MNIST, CIFAR-10, and MR datasets demonstrate that, in comparison to three baseline algorithms, this approach results in an average 3% increase in accuracy for sP-type clients and a 1% increase for other attack methods, consequently achieving a higher security level and significantly enhanced model performance within the Federated Learning framework.
  • Zhang Cai , Ma Ziqiang , Yan Bo
    Accepted: 2024-04-19
    This article addresses the issues of chaotic comments and difficult moderation in government Weibo discussions and proposes a machine learning-based model for sentiment analysis. This model quantitatively analyzes emotions in government Weibo posts, providing an effective basis for automated moderation. Using the examples of the Winter Olympics and the Chinese Football Association Weibo accounts, the research first expands the relevant vocabulary associated with these topics and performs data cleaning and text feature representation. Subsequently, machine learning models are employed to determine sentiment tendencies, and the Chinese sentiment lexicon from Dalian University of Technology is used to calculate sentiment intensity. This paper employs decision tree, Naïve Bayes, and Support Vector Machine models based on both the bag-of-words model and Word2Vec model and evaluates their performance comparatively. Experimental results demonstrate that, under the Word2Vec-based Support Vector Machine model, the accuracy of sentiment classification reaches 84.3%. This suggests the effectiveness and comprehensiveness of the proposed model in predicting sentiment in government Weibo posts and its potential application in automated moderation.
  • Liu Yuxin , Li Fengyong
    Accepted: 2024-04-18
    Image encryption is an essential method of protecting image security. Existing image encryption schemes are not very secure and have low encryption and decryption efficiency, and cannot resist various types of attacks. In response to the above problems, an image encryption algorithm based on fully scrambled hyperchaotic sequences and multi-ary DNA encoding is proposed, which can improve the encryption efficiency while ensuring the security of ciphertext images. First, combined with the content of the grayscale image, an image hash algorithm and an external key are used to generate the initial values of the five-dimensional hyperchaotic system and the logical map. Secondly, the original image is converted into a four-valued image, and the random sequence generated by five-dimensional hyperchaotic system and logical map is used to perform DNA encryption on the image, including four stages of DNA encoding, DNA scrambling, DNA diffusion and DNA decoding. Finally, the image is decomposed into bit planes, and the random matrix generated by five-dimensional hyperchaotic system and logical map is used to perform XOR operations with the high four-bit plane and the low four-bit plane respectively to obtain the final ciphertext image. Experimental results show that the image encryption algorithm has the advantages of large key space, strong key sensitivity, good encryption effect and high encryption efficiency. It can resist a variety of conventional attacks such as statistical analysis, differential attacks, clipping attacks, and noise attacks.
  • Zeng Jian-zhou, LI Ze-ping, Zhang Su-qing
    Accepted: 2024-04-18
    In order to reduce content acquisition delay and transmission overhead in mobile edge networks, a multi-agent cooperative caching algorithm (MACC) based on Twin Delayed Deep Deterministic policy gradient (TD3) is proposed. Firstly, a multi-agent edge cache model is constructed, and the multi-node cache replacement problem is modeled as a partially observable Markov process (POMDP). The cache state and content request information of adjacent nodes are integrated into the observation space of each node to improve the agent's ability to perceive the environment, and the prevalence characteristics of each node's content request are extracted by the three-times exponential smoothing method. The algorithm can adapt to the change in content popularity and improve the cache hit rate. Then, a guiding reward function is designed by combining the transmission delay and overhead of local and adjacent nodes to guide agents to cooperate in caching, reducing the cache redundancy and content transmission overhead of the system. Finally, the Wolpertinger Architecture method is combined to extend the TD3 algorithm with multiple agents, so that each edge node can learn the cache strategy adaptively, to improve the system performance. The experimental results show that edge nodes in the MACC algorithm sacrifice part of cache space to help neighboring nodes cache request content to improve the cache hit rate. Compared with MAAC, DDPG, and independent TD3 algorithms on the same data set, the cache hit rate of the MACC algorithm is respectively improved by 8.50%, 13.91%, and 29.21%. It can adapt to a dynamic edge environment to achieve small content acquisition delay and transmission overhead.
  • Song Hang , Zhou Feng , Xiong Wei
    Accepted: 2024-04-16
    Traditional time series anomaly detection models face challenges in accurately extracting temporal relationships among multivariate sensor and actuator data in Cyber-Physical Systems (CPS), impacting the performance of anomaly detection. To address this issue, this paper proposes a novel time series anomaly detection method named Auto-Correlation-Variational Autoencoder-Generative Adversarial Network (AM-VAE-GAN, abbreviated as AMVG). Built upon GAN, the method utilizes NOISE data augmentation to expand the training dataset. By introducing auto-correlation matrices to enhance data dependencies and combining the data reconstruction capability of variational autoencoders, the model's robustness is strengthened, leading to further improvements in anomaly detection performance. The two decoders of AMVG form mutually antagonistic G and D networks, engaging in continuous adversarial training to optimize the model's detection capability. Experimental validation on three real-world CPS datasets demonstrates that the AMVG method achieves significant improvements in accuracy, recall, and F1 scores compared to state-of-the-art methods. Specifically, the F1 scores on the three datasets are 0.953, 0.758, and 0.891, with respective increases of 6.2%, 3.4%, and 7.5%. These results underscore the accuracy and effectiveness of the proposed method in CPS anomaly detection.
  • WANG Xiang, WEI Yuxin, MAO Guojun
    Accepted: 2024-04-16
    In graph neural networks, graph pooling is a critical operation used for downsampling graph data to extract graph representation. Due to the complex topology structure and high-dimensional feature of graph data, existing graph pooling methods have the following problems in their design: 1. Existing methods fail to fully utilize and simultaneously fuse the topological structure information and long-range dependence information of graph data; 2. In the graph pooling process, the features of discarded nodes are not considered, inevitably resulting in the loss of important information in graph data. Based on these issues, this paper proposes a graph pooling method based on multi-feature fusion to simultaneously capture the local topology structure, global topology structure and long-range dependencies of graph data. And then we use an aggregation module to combine these features to obtain a new pooled graph. Additionally, to avoid the loss of node feature during the graph pooling process, a new feature fusion method is proposed, which aggregates the features of dropping nodes onto the retained nodes in a certain proportion. Based on this pooling method, we construct a graph classification model based on hierarchical pooling and conducts experiments on multiple public datasets. The experimental results show that the model proposed in this paper achieves better performance on the graph classification task compared to the best baseline model, with classification accuracy increases of 2.97%, 3.59%, 0.48%, and 0.24% on the D&D, PROTEINS, NCI1, and NCI109 datasets, respectively. This suggests that it can effectively utilize the feature information, topological information, and long-range dependencies information of graph data to improve the performance of graph classification.
  • ZHAO Nannan, GAO Feichen
    Accepted: 2024-04-16
    To achieve assisted driving and vehicle-road coordination, high-precision real-time detection and segmentation of traffic scenes are crucial. However, instance segmentation in traffic scenarios has its challenges, including complex environments, object stacking, and low object resolution that may cause false detections, missing detections, and missing masks. Moreover, the widely used two-stage models in high-precision instance segmentation studies often come with a large number of parameters, making real-time requirements challenging to achieve. Proposing an Instance Segmentation Algorithm (DE-YOLO) based on Improved YOLOv8. To decrease the effect of complex backgrounds in images, efficient multi-scale attention is introduced, and cross-dimensional interaction ensures an even spatial feature distribution within each feature group. In the backbone network, deformable convolution using DCNv2 is combined with the C2f convolutional layer to surpass the limitations of traditional convolutions and increase flexibility. This is done to reduce harmful gradient effects and improve the overall accuracy of the detector. The dynamic non-monotonic Wise-IoU (WIoU) focusing mechanism is used instead of the traditional CIoU loss function to evaluate the quality, optimize detection frame positioning, and improve segmentation accuracy. Meanwhile, Mixup data enhancement processing is enabled to enrich the training features of the dataset and improve the model's learning ability. The experimental results demonstrate that DE-YOLO improves the average accuracy (mAPmask) by 2.0 percentage points and 3.2 percentage points by APmask@0.5 compared to the benchmark model YOLOv8n-seg in the cityscapes dataset of urban landscapes. Furthermore, DE-YOLO maintains excellent detection speed and small parameter quantity while improving the accuracy, with the model requiring 2.2-31.3 percentage points fewer parameters than similar models.
  • Huang Homgqiong, Hu Yongtao
    Accepted: 2024-04-16
    Clothes-changing person Re-identification (CC Re-ID) is an emerging research topic on person re-identification, which aims to retrieve pedestrians who have changed their clothes. This task has not been fully studied to date. At present, the proposed methods mainly focus on the method of using multi-modal data to assist decoupling representation learning, such as decoupling the pedestrian's own attributes through auxiliary data such as face, gait and body contour to reduce the influence of clothing, but the generalization ability is poor and lot of additional work is needed to obtain auxiliary information. However, the method using only the original data is not enough to extract the relevant information, and the performance of the model is weak. To solve the problem of CC RE-ID, a new multi-branch CC RE-ID method combining feature fusion and channel attention (MBFC) is proposed. This method integrates channel attention mechanism into the backbone network to learn key information at the feature channel level, and designs local and global feature fusion methods to improve the network's ability to extract fine-grained pedestrian features. In addition, the model adopts a multi-branch structure and uses multiple loss functions such as clothing counter loss and smooth label cross-entropy loss to guide the model to learn information unrelated to clothing, reduce the influence of clothing on the model, and thus extract more effective pedestrian information. The model method in this study is extensively tested on the PRCC dataset and VC-Clothes dataset. The experimental results show that the performance of the proposed model is superior to most advanced CC RE-ID methods in RANK-1 and mAP.
  • Yang Lisha, Li Maojun, Hu Jianwen, Wang Dingxiang
    Accepted: 2024-04-15
    To address the problems of low efficiency of small target detection, inaccurate defect localization, large number of parameters in the detection algorithm, and difficult to use the model on the terminal equipment, in the task of detecting surface defects on strip steel, an improved YOLOv7-tiny detection algorithm is proposed. Firstly, the GSConv is used to replace the standard convolution in the Neck network, and then an improved and efficient aggregation network (ELAN-G) is designed based on GSConv, which reduces the parameter amounts of the model while ensuring that the information of the strip steel surface defect features is adequately fused; secondly, the SPDConv module for low-resolution and small defects is added between the Head and the Neck network, and the module is first generated into an intermediate feature map, the final feature map is obtained by filtering and learning the small defect feature information in the intermediate feature map, to improve the detection accuracy of the Head for small defects; finally, the MPDIoU loss function is adopted, and the geometric properties of the bounding regression box are reasonably utilized to simplify the calculation process of the loss function and improve the accuracy of defect localization. The experimental results show that the improved algorithm is better than the other six advanced target detection algorithms on the NEU-DET dataset, with more balanced performance, the mean average accuracy (mAP) of the improved algorithm can reach up to 74.1%, and the parameter amounts and computation is lower than that of all the comparative algorithms, which can be arranged on steel surface defects detection system in the industrial environment.
  • JIA Shuo, LIN Shih-yang, YANG Miao-hui, SUN Teng
    Accepted: 2024-04-15
    As an unavoidable bottleneck in the traffic scene, the short-time traffic flow prediction of narrow roads is very important to optimize the path planning and improve the traffic condition. Aiming at the timeliness of narrow roads and considering the accuracy of applicable model, a short-time narrow roads traffic prediction model based on Good Node set initialization population, nonlinear parameter control and Cauchy variation perturbation was proposed. An empirical study was carried out with SUMO simulation data. The experimental results show that the improved whale algorithm has better global performance, convergence speed and stability. Compared with WOA-GRU, PSO-GRU and LSTM, RMSE decreased by 10.96%, 28.71% and 42.23%, and MAPE decreased by 13.92%, 46.18% and 52.83%, respectively, showing significant accuracy and stability.
  • Ren ShuYu, Wang XiaoDing, Lin Hui
    Accepted: 2024-04-15
    Transformers have shown remarkable performance in natural language processing, inspiring researchers to explore their application in computer vision tasks. DETR views object detection as a set prediction problem and introduces the Transformer model to solve the task, thereby avoiding the proposal generation and post-processing steps in traditional methods. The original DETR had some issues with training convergence and small object detection. To address these issues, researchers made various improvements, resulting in substantial improvements to DETR and demonstrating its state-of-the-art performance. We conducted an in-depth study of the basic modules and recent enhancements of DETR, including modifications to the backbone structure, query design strategies, and attention mechanisms. We also compared and analyzed various detectors, evaluated their performance and network architectures, delved into the limitations and challenges of DETR, and looked forward to future developments in the field. Through this paper’s research, we demonstrate the potential and application prospects of DETR in computer vision tasks.
  • Zhang Ming, Guo Wenkang, Wang Haifeng†
    Accepted: 2024-04-15
    GPU is not fully utilized when processing large-scale dynamic graph and the limitations of GPU-oriented graph partitioning methods lead to performance bottlenecks. To improve the performance of graph computing, a CPU/GPU heterogeneous graph computing engine is proposed to improve the performance of heterogeneous processors. Firstly, a new heterogeneous graph partitioning algorithm is proposed. It uses streaming algorithm for graph partitioning as the core to achieve dynamic load balancing between computing nodes and between CPU and GPU. The greedy strategy assigns vertices based on the maximum neighboring vertices during initial graph partitioning and dynamically adjusts vertex position based on the minimum connected edges during iteration. Secondly, the system introduces a GPU heterogeneous computing model to improve graph computing efficiency by functional parallelism. The experiment takes PageRank, Connected Component, SSSP, and K-core as examples to conduct comparative experiments with other graph computing systems. Compared to other graph engines, heterogeneous graph engine can better balance the computing load of each node and the load between heterogeneous processors to shorten delay and accelerate the overall computing speed. The results show that the CPU/GPU synergy of this system tends to 1 and the graph computing speedup ratio reaches 5 times compared with others. The Distributed Heterogeneous Engine (DH-Engine) can provide better graph heterogeneous scheme.
  • Zhang Yujie, Gao Han
    Accepted: 2024-04-15
    In the process of industrial quality inspection, image segmentation of stamping defects is an important part of defect detection, which directly affects the effectiveness of defect detection. However, traditional FCM clustering algorithms do not consider spatial neighborhood information and are sensitive to noise interference, resulting in poor segmentation accuracy; And overall, it is susceptible to the influence of initial values, leading to a slower convergence speed. To address these issues, an improved FCM algorithm is proposed in this paper, which replaces Euclidean distance with simple two terms of kernel induced distance , map the original spatial pixels to the high-dimensional feature space to increase the linear separability probability and computation speed; By utilizing the spatial correlation between image pixels and introducing an improved Markov random field to modify the FCM objective function, the algorithm's noise resistance and segmentation accuracy are improved; Using the Bald Eagle Search Algorithm to determine the initial clustering center of FCM, improves detection accuracy and convergence speed, at the same time, it also avoids the situation where the algorithm is prone to falling into local extremum. To verify the performance of the improved FCM algorithm, partition entropy, partition coefficient, Xie_Beni coefficient, and iteration number were used as evaluation indicators, and compared with image segmentation algorithms proposed by different scholars in recent years through experiments, the effectiveness of the algorithm was verified. The experimental results show that the algorithm proposed in this paper has good noise resistance and can achieve good defect segmentation results, which has a certain degree of application value for defect detection of stamping parts in industry.
  • WANG Lei, MA Chi-cheng, QI Jun-yan, YUAN Rui-fu
    Accepted: 2024-04-15
    Safety concerns in coal mining, particularly the phenomenon of surface subsidence in goaf areas, pose threats to personnel and engineering safety. Investigating appropriate prediction methods for surface subsidence in mining areas is of great significance. The influencing factors of surface subsidence in mining areas are complex. Single deep learning models exhibit poor fitting performance for subsidence data, and existing subsidence prediction studies often focus on either probabilistic predictions or point predictions considering temporal characteristics. It is challenging to quantitatively describe randomness while considering the temporal features of the data simultaneously. Addressing this issue, this study, after observing and analyzing the nature of the data, selected the Autoregressive Integrated Moving Average (ARIMA) model for probabilistic prediction of temporal features. This was combined with the Long Short-Term Memory (LSTM) model to learn complex and long-term dependent nonlinear temporal sequences. We propose an ARIMA-LSTM-based surface subsidence prediction model. The ARIMA model predicts the temporal linear part of the data, and the residual data predicted by the ARIMA model assists in training the LSTM model. This approach allows for the consideration of both temporal features and the randomness of the data. The research results indicate that compared to using ARIMA or LSTM models alone, this method demonstrates higher prediction accuracy (MSE of 0.26287, MAE of 0.40815, RMSE of 0.51271). Further comparative validation shows that the predicted results align with trends observed in radar satellite imagery data (processed through SBAS-INSAR), confirming the effectiveness of the proposed method.
  • LI Haifeng, LIU Sensen, WANG Huaichao, LI Nansha, ZHANG Yifan
    Accepted: 2024-04-15
    To promote the deep integration of domain knowledge of the underground pavement with the object detection algorithm, alleviate the feature distortion caused by the feature complexity and similarity between different disease samples and improve the automatic disease detection effect, proposed a detecting airport pavement underground disease algorithm integrated association reasoning. First, a feature extraction method combined residual networks and multi-scale feature pyramid modules was used to extract disease feature information. Second, by mining the correlation matrix of airport pavement underground disease, a module for underground disease association reasoning was designed based on graph inference. The feature vectors generated by the regional proposal network were used as input features, and self-learning weight matrices were used as disease correlation weights to achieve integration of the association reasoning module. The experimental results demonstrated that the airport pavement underground disease detection algorithm integrating association reasoning effectively utilized the correlation relationships between underground diseases, eliminated mutual interference between defect samples, and achieved optimal detection performance with an average accuracy of 87.38%.
  • Liu ZhaoWei, Fang YanHong, Zhen MingYu
    Accepted: 2024-04-15
    There are many types of lung diseases and small lesion areas. The existing data sets also have the problem of small data volume, resulting in unsatisfactory model results. In order to improve the diagnosis effect, a lung diagnosis network (ASNet) based on multi-task joint attention is proposed. A multi-task diagnostic network is built based on U-Net, and pathological classification tasks are added to the original lesion segmentation tasks to strengthen the connection between tasks and supplement by segmentation tasks to improve the accuracy of classification tasks; a multi-scale squeeze excitation module is proposed to enhance spatial and information fusion between channels; introduce an axial attention mechanism, emphasizing global context information and location information to alleviate the under-fitting problem caused by the lack of medical data; design an adaptive multi-task hybrid loss function to achieve segmentation and classification tasks Loss weighted equilibrium. Detailed experiments were conducted on the self-built data set. The average results of Dice coefficient, SP,SE, HD and accuracy on the lesion segmentation task were 81.1%, 99.0%,84.1%, 24.6mm and 97.5%, which are better than other advanced segmentation such as SAUNet++ and SwinUnet. Network; in the pathological classification task, the better network (MobileNetV2) improved the Precision, recall and accuracy indicators by 2%, 1.8% and 1.7% respectively. Experiments show that the proposed network improves the accuracy of classification and segmentation, has better segmentation effects on small target lesions, and has a reasonable number of parameters suitable for assisting in the diagnosis of lung diseases.
  • ZHANG Huan, WANG Chen, SHAN Jingdong, QIU Runhe
    Accepted: 2024-04-15
    As one of the special equipment, the operation safety risk prediction of elevators is crucial. At present, most of the research on elevators is based on elevator component data, and the prediction method will have problems such as low prediction accuracy and poor generalization ability in the case of changing application scenarios. Therefore, a method of elevator safety risk prediction based on domain adaptation and attention mechanism is proposed. This method is based on adversarial domain adaptive network, and uses the attention mechanism to optimize the feature extraction ability of the network. The method includes three parts: feature extractor, label classifier and domain classifier, the input data is the elevator safety risk factor containing both source domain and target domain data, the feature extractor optimized by the attention mechanism adaptively extracts and retains the common key features between the source domain and the target domain, and then the key features are input to the label classifier and the domain classifier at the same time, the transfer learning from the source domain to the target domain is realized through domain adaptation, and the elevator operation status is output through the label classifier. The experimental results show that the prediction accuracy of the proposed method can reach 86.9% when it is transferred to the target domain application scenario, which is 2.6 percentage points higher than that before optimization, and 9.5%, 8.3%, 3.7% and 1.2% higher than that of LSTM-AE, CNN-LSTM, TrAdaBoost.R2 and DSAN, respectively, which can effectively predict elevator safety risks.
  • WANG Lei, Li Wenjie, WANG Hai
    Accepted: 2024-04-15
    With respect to the personnel-position matching in an information environment characterized by a multi-attribute probabilistic linguistic set, a two-sided matching model is developed based on an improved ORESTE (organísation, rangement et Synthèse dedonnées relarionnelles, in French)ranking method and matching aspiration. The proposed model introduces a probabilistic linguistic generalized Lance distance formula, employs the probabilistic linguistic power mean operator to determine objective attribute weights, and optimizes the combination of subjective and objective weights based on principles from game theory. This method overcomes the impact of extreme values on decision outcomes and ensures that attribute weights consider both the subjective analysis of decision-makers' experiential judgments and the objective analysis of information structure, enhancing scientific validity. Subsequently, the ORESTE ranking method is enhanced by incorporating the probabilistic linguistic generalized Lance distance formula and the Borda function, considering both weak and strong rankings. By simultaneously optimizing the combination of subjective and objective weights, the ranking results become more realistic and aligned with practical scenarios. To maximize satisfaction of the subjects' preferences, a new matching aspiration coefficient, embodying stability and based on the psychological "anchoring effect," is proposed. This contributes to the construction of a rational and effective multi-objective two-sided matching model. Results from a case study involving personnel-job matching in a smart elderly care service platform demonstrate that the proposed two-sided matching model is effective, and decision-makers can adjust parameters based on their own risk preferences to maximize the satisfaction of the subject's aspiration. Compared with decision methods such as ORESTE and TOPSIS, the proposed improved ORESTE matching model can more reasonably and effectively get ranking values to obtain the optimal matching pair.
  • Tang Jingwen, Lai Huicheng , Wang Tongguan
    Accepted: 2024-04-11
    Pedestrian detection in intelligent community scenarios needs to accurately recognize pedestrians to address various situations. However, in the face of the scenarios of occluded and long-distance person, the existing detectors have some problems such as missed detection, error detection and large model. To address the above problems, this paper proposes a pedestrian detection algorithm, ME-YOLO(Multiscale Efficient-YOLO), based on YOLOv8. An efficient feature extraction module is designed to make the network learn and capture pedestrian features better, which reduces the number of network parameters and improves the detection accuracy. A reconstructed detection head module reintegrates detection layer to enhance the network's recognition ability of small targets and effectively detect small target pedestrians. The bidirectional feature pyramid network is introduced to design a new neck network, and the expanded residual module and weighted attention mechanism expand the receptive field and learn pedestrian features with emphasis, alleviating the problem of network insensitivity to occlusion of pedestrians. Compared with the original YOLOv8 algorithm, ME-YOLO increases the AP50 by 5.6 percentage points and reduces the number of model parameters by 36% and compresses the model size by 40% after training and verification based on the CityPersons dataset, which also increases the AP50 by 4.1 percentage points and the AP50:95 by 1.7 percentage points on the TinyPerson dataset. The algorithm not only significantly reduces model parameters and size but also effectively improves detection accuracy. It holds considerable application value in intelligent community scenarios.
  • JIANG Hui-Zhen, SUN Yan-Chun, HUANG Gang
    Accepted: 2024-04-11
    As the largest and most popular online code hosting platform in the world, GitHub provides rich learning resources for software development learners. However, faced with such rich and complex GitHub content, software development beginners often encounter difficulties in forming suitable search texts to search effectively when using GitHub's search function to search for the learning resources they need due to their unclear requirements or lack of relevant knowledge and experience. To solve the above problem, this paper designs a GitHub software development knowledge graph combining the potential hierarchical structure of GitHub topics with the domain knowledge of software development in Wikipedia, and proposes a GitHub hierarchical learning and retrieval service based on the knowledge graph. Then, the feasibility and effectiveness of the hierarchical learning and retrieval service proposed in this paper are verified through comparative experiments and questionnaires.
  • BIAN Yuxing, HUANG Rong, ZHOU Shubo, LIU Hao
    Accepted: 2024-04-11
    Image steganography refers to the technique of hiding a secret image within a cover image, creating a container image, and transmitting it over a public channel. Existing multi-cover image steganography methods involves two processes: embedding and extraction of the secret image. Existing multi-cover image steganography methods often decompose the embedding process into encoding and overlaying steps. The secret image is encoded into a secret disturbance and the secret disturbance is overlaid with multiple cover images using spatial operations, implementing enable the embedding of a secret image within multiple cover images. These methods employ two separate networks for the embedding and extraction processes, which are not parameter-sharing, resulting in high computational resource consumption and a large number of training parameters. To solve this problem, a multi-cover image steganography model based on invertible neural network is proposed. It associates the embedding and extraction processes with the forward and inverse mappings of a invertible neural network, enabling parameter sharing and effectively reducing the network parameter count. Furthermore, existing models lack a method for measuring the importance of content-level regions in the secret image. To tackle this problem, a spatial attention module is introduced at the input of the invertible neural network to enhance the encoding quality, focusing on key regions of the secret image and improving the steganography performance. Additionally, an identity verification mechanism is established by allocating a key-based identity information matrix to multiple users, preventing unauthorized access to the secret image. The experimental results demonstrate that the proposed method achieves superior steganography performance compared to baseline models. The peak signal-to-noise ratio of the container image and extracted secret images surpass the baseline model by 8.5 dB to 9.4 dB, respectively. The structural similarity index outperforms the baseline model with a margin of 0.012 to 0.019, and the learned perceptual image patch similarity outperforms the baseline model with a margin of 0.0029~0.0047. Moreover, the proposed model requires only 17.6% of the parameters compared to the baseline model.
  • HUANG Kaiji, YANG Hua
    Accepted: 2024-04-11
    The objective of image matching is to establish correspondences between similar structures across two or more images. This task stands as a fundamental and crucial issue in computer vision and its applications span widely across robotics, remote sensing, and autonomous driving. With the advancement of deep learning in recent years, 2D image matching algorithms based on deep learning have seen continuous improvements in feature extraction, description and matching. The performance of these algorithms, in terms of matching accuracy and robustness, significantly surpasses traditional methods, yielding substantial achievements. Although there exist numerous reviews on 2D image matching algorithms, the latest developments have yet to be summarized. Therefore, a comprehensive review of deep-learning-feature-based 2D image matching methods over the past eight years is carried out. This study provides a detailed introduction to the development, classification methods, and performance evaluation metrics of 2D image matching algorithms, focusing on three aspects: local feature detection and description two-stage methods, joint feature detection and description methods and detector-free matching methods. The advantages and limitations of each method are also summarized. Subsequently, the application scenarios of 2D image matching algorithms are introduced, elucidating the impact of advancements in two-dimensional image matching on its application domains. Finally, the development trends of 2D image matching algorithms are summarized and future prospects are presented.
  • Wei Xing, Sun Hao, Cao Jian, Zhu Xiaobin
    Accepted: 2024-04-11
    As a key tool to assist users in finding matching interests and requirements from massive amounts of data, the goal of session-based recommendation systems is to predict a user's next actions based on anonymous sessions. Existing methods have insufficient representation of users' overall interests and rarely consider the positional relationship among items. Therefore, an enhanced memory network-based session recommendation model, SR-MAN, was proposed to analyze global user interest representation and item sequence problems. Initially, the method introduced position encoding while generating the item embedding vector to highlight the influence of different positions on the sequence. Subsequently, the neural Turing machine was employed to store recent session information, and an attention network was designed to learn long-term preferences, combining users' last click as the current interest preference. Finally, the method integrated long-term and current preferences for prediction and recommended items of interest. Bayesian personalized ranking was employed to estimate the model parameters. Experiments on three datasets confirmed the effectiveness of the proposed method.
  • SHI Xin, CAO Fengteng, JI Yi, MA Junyan
    Accepted: 2024-04-11
    Traffic flow prediction has considerable worth in designing transportation systems, optimizing road resources, and mitigating traffic congestion, among other aspects. Aiming at the issue of limited prediction accuracy due to insufficient extraction of temporal periodic features in traffic flow forecast, this paper proposes a traffic flow forecast method based on Multi-scale Spatial and Temporal Features and Soft Attention mechanism (MSTFSA). Firstly, the Graph Talking Head Attention Network (GTHAT) is used to extract the non-Euclidean structural features of spatial data, and the dynamic weights are calculated to represent the influence of traffic flow on adjacent roads at different times. Secondly, the Bidirectional Enhance Attention Gated Recurrent Unit (Bi-EAGRU) is utilized to capture the continuity correlation features of temporal data, enhancing the temporal features of each moment and the continuity between adjacent moments. Then, the similar traffic flow trends at three scales of weekly periodicity, daily periodicity and nearest-neighbor time are fused based on soft attention to implement the comprehensive extraction of temporal periodic features. Finally, the prediction accuracy of MSTFSA is verified by the highway datasets of both PeMS04 and PeMS08. Experimental results demonstrate that MSTFSA has exhibited the distinct advantage in the prediction accuracy of traffic flow. Compared with the baseline methods of STSGCN (Spatial-temporal Synchronous Graph Convolutional Networks) and ASTGCN (Attention-based Spatial-temporal Graph Convolutional Networks), MSTFSA not only can reduce the Root Mean Square Error (RMSE) by 7.15% and 3.8%, but also can decrease the Mean Absolute Error (MAE) by 7.79% and 3.99%, respectively. In summary, MSTFSA can efficiently extract and merge the multi-temporal and spatial attributes of traffic data, and perform the considerable advantage in improving the prediction accuracy of traffic flow.
  • YI Peng, YANG Ye, YAN Shijia
    Accepted: 2024-04-11
    To solve the problem of inter-individual variability and improve the universality of gesture recognition technology, this paper proposes a migration learning strategy based on Multi Parallel Conventional Neural Network (MPCNN), which aims to achieve efficient gesture recognition based on surface electromyography through a parallel architecture and an optimized migration learning mechanism. This strategy aims to achieve efficient gesture recognition based on surface EMG signals. With a parallel architecture and optimized migration learning mechanism, MPCNN can deal with physiological differences between individuals more efficiently than previous CNN migration frameworks, which improves the model’s adaptability to new users and recognition accuracy. In addition, MPCNN significantly enhances the utility of the system by reducing the model training time and improving the generalization ability. Through multiple sets of experiments, including multiplicative cross-validation, ablation experiments, and robustness tests, this study confirms the effectiveness of the proposed strategy in several aspects. The experimental results show that MPCNN significantly improves the accuracy of gesture recognition compared to traditional CNN models, and the MPCNN migration learning strategy proposed in this paper achieves a recognition rate of 94.95% in Ninapro DB7 compared to previous CNN migration learning frameworks with an improvement of 4.38%, while the training time is reduced by more than 50%. These experiments validate the advantages of the MPCNN migration model in reducing the training burden, enhancing the generalization ability and improving the anti-interference. The human-computer interaction capability is validated based on the experimental model, which verifies its promise for myoelectric control applications.
  • ZHANG Xinbo, ZHANG Xueying, HUANG Lixia, CHEN Gunjun
    Accepted: 2024-04-11
    In industrial classification prediction, the labeled data are scarce and the labeling cost is high, which leads to the inaccurate prediction of the model. At the same time, the features in most unlabeled data are not rationally used, and the generalization ability of the model is insufficient. In order to solve this problem, this study combines labeled data and unlabeled data through supervised learning and unsupervised learning to improve the prediction accuracy of the model. Firstly, Gaussian noise and sparsity constraints were added to the deep autoencoder channel to extract more representative feature representations related to classification. Secondly, a lateral connection is introduced between the encoder and the decoder to filter the information irrelevant to the classification task, so that the network can better learn the feature representations of key variables, and a supervised learning path is added to the top layer of the network to realize classification and recognition. Then, the original encoder is added and trained together with the output of the corresponding hidden layer in the decoder, so as to realize the unsupervised learning path and effectively use the information in the unlabeled data. Finally, the total loss function was constructed by the supervised loss function and the unsupervised loss function to realize the classification and prediction of key variables in industrial production. Experimental results show that compared with the commonly used supervised learning model and the traditional semi-supervised learning model, the classification prediction accuracy of the proposed algorithm is effectively improved, and the precision, recall and F1 score are improved.
  • WANG Xiaolu , WEN Jianrong
    Accepted: 2024-04-11
    To tackle the problem of redundant information in action video and the sparse distribution of feature channels in action information, a 3D residual network based on action-time-perception is proposed. The action- perception module (AM) calculates the temporal differences of feature level. The motion features can be obtained by utilizing these differences to excite the action-sensitive channel. The temporal attention module (TM) works out the attention weight matrix along the time dimension, based on which, the local time features are determined. The fusion features of action information are acquired by combining the results of the AM module and the TM module, and then the fusion feature is joined into the 3D convolution network, which construct an action-time-perception module (ATM) based 3DCNN action recognition network. Experimental results show that on the public datasets UCF101 and HMDB51, the accuracy of the action recognition of the 3DResNeXt-101 network based on the ATM module is improved by 1.6% and 2.8%, respectively, indicating that the method proposed in this paper is feasible and effective.
  • ZHANG Wenbo, HUANG Hao, WU Di, TANG Minjie
    Accepted: 2024-04-10
    Punctuation restoration, also known as punctuation prediction, refers to the task of adding appropriate punctuation marks to a text without punctuations to enhance its readability. It is a classic natural language processing task. In recent years, with the development of pre-training models and deepening research on punctuation restoration, the performance of punctuation restoration tasks has been continuously improving. However, Transformer-based pre-training models have limitations in extracting local information from long sequence inputs, which hinders the prediction of final punctuation marks. Additionally, previous studies treated punctuation labels as symbols to be predicted, overlooking the contextual attributes of different punctuation marks and their relationships. To address these issues, this paper introduces the Moving average Equipped Gated Attention (MEGA) network as an auxiliary module to enhance the model's ability to extract local information. Moreover, a hierarchical prediction module is constructed to fully utilize the contextual attributes of different punctuation marks and the relationships between them for final classification. The paper conducts experiments using various Transformer-based pre-training models on datasets in different languages. The experimental results on the English punctuation dataset IWSLT demonstrate that applying the MEGA module and hierarchical prediction module on most pre-training models leads to performance gains. Notably, using DeBERTaV3 xlarge achieves an F1 score of 85.5% on the REF test set of IWSLT, which is a 1.2% improvement compared to the baseline, making it the best result on the REF test set to date. The proposed model also achieves the highest accuracy on the Chinese punctuation dataset.
  • WANG Bin, ZHANG Jiao, LI Wei, WANG Xiaofan, JIN Haiyan
    Accepted: 2024-04-10
    The coevolution framework is an effective method for solving large-scale global optimization problems. Designing a reasonable decision variable grouping method is the key to improving the coevolution algorithm. Using elite decision variables to construct elite subcomponents dynamically can improve evolutionary efficiency. This chapter will focus on the characteristics of inseparable variables in large-scale optimization problems that are difficult to divide. The existing strategy may assign unrelated variables to the same subcomponent of the grouping problem. A two-stage dynamic grouping algorithm of elite contribution (EC-TSDG) is proposed: first, the variables are randomly grouped in the pre-grouping stage, and then the contribution of variables is evaluated, and the elite contribution variables are found from many variable contributions; Secondly, in the post-grouping stage, the correlation relationship of variables is used to find the remaining variables that interact with the elite decision-making variables, and merge them to form the elite sub-component, so that the variables inside the elite subcomponent are correlated in pairs, to improve the accuracy of variable grouping and the convergence speed of the algorithm, and avoid the correlation interference between the subcomponents. Finally, the adaptive differential evolution algorithm with an external archive is used as the optimizer to optimize each subcomponent. Compared with other advanced algorithms on the CEC2013 test function set, the proposed algorithm has a faster rate of convergence than the comparison algorithm. The average ranking of Friedman test is 1.43, which is 36.78% higher than the comparison algorithm.
  • LIU Yanhong, YANG Qiuxiang, HU Shuai
    Accepted: 2024-04-10
    Haze, formed by the accumulation and concentration of atmospheric pollutants under meteorological conditions like temperature inversion, severely limits visibility. Image dehazing techniques aim to eliminate issues caused by haze, such as image blur and low contrast, thus enhancing image clarity and visibility. However, challenges regarding the loss of image details persist. To address this, we propose a feature difference-based multi-scale feature fusion dehazing network termed FD-CA dehaze.In this network, the basic block structure of FFA-Net is enhanced by extracting intermediate feature information from the feature difference dimension, coordinate dimension, and channel dimension. We introduce an effective Coordinate Attention Module (ECA) that combines global pooling, max pooling, and coordinate positional information. This module mitigates the problem of positional information loss during feature fusion. By integrating channel attention with the ECA module, we construct a Dual Attention Model (D-CA) that enables better utilization of spatial and channel information. Consequently, the model exhibits enhanced performance in image dehazing tasks.Furthermore, we improve the loss function by combining L1 loss function with perceptual loss. Experimental results on the SOTS and HSTS datasets demonstrate that the FD-CA dehaze network achieves a peak signal-to-noise ratio (PSNR) of 37.93dB and a structural similarity index (SSIM) of 0.9905. Experimental results show that compared to classic dehazing networks such as FFA-Net and GridDehazeNet, FD-CA dehaze achieves significant improvement and obtains better dehazing performance.
  • Yongtao Yu, Ao Sun, Ang Li, Linlin Zhu
    Accepted: 2024-04-10
    In industrial surface QC scenarios, deep neural networks classify product images for qualified judgment or quality grading. Surface QC equipment equipped with deep classification neural networks is required to carry out Attribute Reproducibility and Repeatability (AR&R) assessment of gauges, but the deep classification neural network won't be able to output consistent classification results and blurring results for disturbed images.Due to assembly tolerance, equipment vibration, and other factors, captured product images exhibit perturbations in position, angle, brightness, and blurring. Consequently, the classification neural network cannot consistently output classification results and probabilities for these perturbed images, which results in the surface QC equipment failing to pass the AR&R assessment, which is summarised as a network output reproducibility problem, we propose a Siamese network-based classification neural network training method, the Siamese primary network uses the original samples for supervised learning training to learn to output the correct classification categories, the Siamese secondary network copies the primary network weights through exponential smoothing and outputs feature embeddings of the perturbed samples corresponding to the original, which is used for comparative learning training of the primary network, so that the primary network outputs a consistent classification probability for both the original samples and the perturbed samples inputs, When reasoning, only the main network is retained for product defect classification. In order to fully verify the performance of the algorithm, benchmark experiments, network architecture ablation experiments and comparison experiments with similar methods are designed and verified on inductive product images, and the classification accuracy reaches 99.3462% with a classification probability variance of 0.001016 in the verification results, The described method effectively addresses the output repetitiveness issue of deep classification neural networks for industrial product image classification, reducing classification probability variance and improving accuracy.
  • LIU Zhuang, GU Kangzheng, TAN Xin, ZHANG Yuan
    Accepted: 2024-04-09
    Developing exploits for vulnerabilities is the main way to evaluate the exploitability of kernel vulnerabilities. Spray objects are widely used in the exploitation process to complete malicious behaviors such as malicious content injection and memory layout manipulation. The current researches on spray objects have limitations in two aspects: (1) Spray objects with basic types are ignored; (2) no work can generate the code to edit the contents of spray objects. Therefore, this paper proposes the techniques of automatically generating code to manipulate spray objects for kernel vulnerability exploitation. The techniques consist of spray object identification based on use-define chain analysis and spray object control code generation based on directed fuzzing. The experimental results show that the techniques can identify and generate the control code of 28 spray objects in Linux kernel version 5.15, which can cover all the spray objects identified by the existing work. A total of 23 generated codes can control the spray object to achieve the expected target, with a success rate of 82.1%. The case analysis shows that the control code generated by the techniques can be applied to the exploitation of real-world kernel vulnerabilities.
  • Fei GE, Shan MIN, Han QIU, Zhenyang DAI, Zhimin YANG
    Accepted: 2024-04-09
    Ant colony algorithm (ACO) is an optimization algorithm that simulates the search for food paths of ants in nature, and can solve the NP-Hard combination problem of geometric distribution without any external guidance or control in a dynamically changing environment. In order to solve the problems of ACO in solving the NP-Hard problem, which is easy to fall into the local optimum, and it is difficult to balance the depth and breadth of the search, a Green Intelligent Evolutionary Ant Colony Algorithm (G-IEACO) was proposed. By introducing four domain operation operators, the state transition rules and pheromone update methods of the ant colony algorithm are improved to enhance the optimization performance and prevent premature convergence, and at the same time, the congestion avoidance strategy is adopted to balance the time cost and environmental cost. Numerical analysis results show that the Green Intelligent Evolutionary Ant Colony Algorithm (G-IEACO) outperforms the Genetic Algorithm (GA) in dealing with the total driving time (TT) and vehicle carbon emissions (TCO2) of the fleet, and reduces the driving time and carbon emissions by 13.64% on average in the R2-100 and RC2-100 test cases, effectively promoting the realization of green and low-carbon goals.
  • RAO Dongning, XU Zhenghui, Liang Ruishi
    Accepted: 2024-04-09
    Knowledge base question answering aims to use pre-constructed knowledge bases to answer questions raised by users. Existing knowledge base question answering research mainly sorts candidate entities and relationship paths, and finally returns the tail entity of the triple as the answer. After the questions given by the user pass through the entity recognition model and the entity disambiguation model, they can be linked to candidate entities related to the answers in the knowledge base. Using the generation ability of the language model, the answer can be expanded into a sentence and returned, which is more user-friendly. In order to improve the generalization ability of the model and make up for the difference between question text and structured knowledge, candidate entities and their one-hop relationship subgraphs are organized and input into the generation model through prompt template, and a popular and fluent text is generated under the guidance of the answer template. Experimental results on the NLPCC 2016 CKBQA and KgCLUE Chinese datasets indicate that, on average, the proposed method outperformed the BART-large model by 2.8%, 2.3%, and 1.5% on the BLEU, METEOR, and ROUGE series metrics, respectively. On the Perplexity metric, the method performed comparably to ChatGPT's responses.
  • Han Wence, Kang Xiao, Li Hongyu, HeWei, Zhou Guohui, BoXiangfeng
    Accepted: 2024-04-09
    In medical practice, the collection of physiologic indicators from patients is often key to diagnosing disease. However, in reality, patients' physiological data are often uncertain and ambiguous. Belief rule base BRB (BRB) is an expert system approach that efficiently handles a variety of uncertainty and ambiguity information by combining expert knowledge to transform data into confidence distributions. However, current BRB-based disease diagnosis models still rely on offline training methods, which is insufficient to meet the dynamic real-time requirements in disease diagnosis environments. In addition, existing online models in other fields suffer from the problem of exploding number of training data samples and sample imbalance. Therefore, in this paper, we propose an online belief rule base disease diagnosis method based on the human-in-the-loop strategy. First, the traditional offline training BRB disease diagnosis model is improved into an online training model, so that the model can realize dynamic growth according to different patients' physiological indicators. Second, a human-in-the-loop algorithm is proposed in the online learning BRB model to enhance the decision-making ability of experts. The problems of explosive growth of training samples, overfitting of model output and sample imbalance in traditional online models are effectively solved. Finally, the effectiveness and superiority of the method is verified through experiments on chronic kidney disease grading, hepatitis C prediction, breast cancer diagnosis and diabetes diagnosis.
  • YANG Wangda, WAN Yaping, ZOU Gang, MIN Xiaoshan, WANG Yi, LU Yucheng
    Accepted: 2024-04-09
    Statistics show that traffic accidents caused by drivers' unsafe behaviours account for the majority of traffic accidents, for the study of the characteristics of driving cognitive quality, the construction of a virtual driving scene to assess the driver's driving quality can be maximally close to the real environment and operation, awakening the driver's potential ability to drive and coping ability, which is of positive significance to reduce road killers. Eye movements can greatly reflect the driver's cognitive state, but most current eye movement state recognition studies mainly focus on the basic visual motion direction or eyelid closure in the natural state, and the ability and effect of recognising the categories are limited for the cognitive state assessment of driving scenarios. The study collects binocular data of ten categories of static eye movement directions and proposes a multi-scale eye state image recognition model incorporating the attention mechanism. Firstly, a two-branch feature fusion module is designed using partial convolution to enhance the feature extraction capability of the model while reducing the computational redundancy; then an improved coordinate attention mechanism is embedded in the residual module of the two-branch feature fusion to enhance the model's ability to characterise the information of features at different scales; finally, the structure and number of channels in the model are adjusted to balance the number of parameters in the model and the recognition accuracy. The experimental results show that the proposed method achieves a recognition accuracy of 95.1% on the proposed 10-class eye movement state dataset, which is 3.4 percentage points higher than that of the pre-improvement network; the recognition accuracy on the Eye Chimera dataset and the MRL eye dataset is 95.1% and 98.95%, respectively, which can satisfy the requirements of eye movement state recognition in a virtual driving test environment, and lays the foundation for further combining multi-parameter analysis of driving quality deficiency tasks.
  • SU Hui, ZHANG Jianhui, ZENG Junjie, CHU Xiaoxi
    Accepted: 2024-04-09
    The temporary file issue in Dockerfile causes the Docker image to pack unnecessary file resources beyond its functional requirements, resulting in an increase in image size and affecting the efficiency of image transmission and deployment. The existing dynamic analysis methods generate a large number of logs during runtime, resulting in significant system overhead. However, static analysis methods cannot detect various changes in temporary files, which limits their effective application in daily detection. This article proposes a static detection method for Dockerfile temporary files, which collects 21 temporary file forms through rule validation; Using node association algorithm to improve the AST structure and enhance detection efficiency; And based on NA-AST, a coloring mechanism is used to process nodes, ensuring detection integrity. The experimental results show that compared to existing schemes, the proposed method can detect various temporary file forms in files with less time overhead. In addition, this method provides a basis for classifying the forms of temporary files, which can be used for analyzing and detecting the new forms of subsequent temporary files, and has high universality.
  • Lu Ming, Chen Cifa, DongFangmin
    Accepted: 2024-04-09
    The consensus mechanism is the core of blockchain. Proof of Stake (PoS), as a consensus mechanism, significantly reduces resource consumption compared to Proof of Work (PoW). However, PoS still faces security issues such as difficulty in obtaining accounting rights for active low equity honest nodes, lack of active node block verification, coin age accumulation attacks, and unreasonable allocation of block rewards. In response to the above issues, this article makes improvements on the basis of PoS. Firstly, by introducing an integral mechanism to enhance the total equity of active low equity honest nodes and increase the probability of nodes obtaining accounting rights; Next, non-linear functions are used for coin age calculation to prevent malicious nodes from accumulating coin age and launching attacks; Secondly, using the proportion of comprehensive points of nodes to distribute block rewards, nodes that actively participate in verification or voting within the specified time will receive rewards, reducing the phenomenon of "the rich getting richer" and reducing the wealth gap between nodes; Finally, experiments and analysis were conducted on the improved algorithm. Compared with other PoS improved consensus mechanisms, the effective control of infinite growth in coin age was achieved. The number of times active low equity honest nodes received rewards and accounting rights increased by 3.6 and 2.6 times, reducing the centralization trend of the system, increasing the opportunities for active low equity honest nodes to compete for accounting rights, and reducing the possibility of coin age attacks, Further validation of the feasibility and superiority of the scheme has promoted the healthy development of blockchain networks.
  • BAI Yu, WANG Jun, RAN Honglei, AN Shengbiao
    Accepted: 2024-04-09
    Internal void defects that occur during the packaging process of semiconductor devices will directly affect the performance of electronic devices. Aiming at the problems of different sizes of void defects in X-ray internal images of semiconductor devices, difficulty in labeling, positioning and noise interference, a semi-automatic annotation method and a U-Net-based device internal void defect detection method are proposed. The semi-automatic annotation method uses threshold segmentation to initially locate the defect area, generate the bounding rectangle of the defect, and then manually modify and improve the rectangular frame, which is input into SAM (Segment Anything Model) as a prompt to obtain high-precision segmentation results. Semi-automatic labeling methods can save labeling time and improve label quality, overcoming labeling problems. Aiming at the problem of poor generalization of the classic U-Net method, an improved U-Net method (EFU-Net) is proposed. First, the Edge and Position Enhancement (EPE) module is introduced into the encoder. By combining the Sobel filter and the coordinate attention mechanism, it enhances the perception of image edge information and effectively integrates position information to improve the accuracy of feature extraction. The Feature Fusion Control (FFC) module is introduced to replace the traditional skip connection, fuse the three features of high-level features, low-level features and prediction mask, and utilize multi-layer parallel atrous convolution and attention Force gating mechanism enables more targeted and high-quality feature fusion. Experiments were conducted on the semiconductor device data set, and the Dice coefficient and MIoU of EFU-Net reached 70.71% and 77.23% respectively, which were improved by 14% and 7.71% respectively compared with the U-Net method. Experimental results show that the EFU-Net method has better segmentation performance.
  • Zimin Chen, Zhitao Guan
    Accepted: 2024-04-09
    Deep learning models have achieved impressive results in fields such as image classification, but they are susceptible to interference and threats from adversarial examples. Attackers carefully design small perturbations by diverse attack algorithms and construct adversarial examples that are difficult to distinguish with the naked eye but can cause misclassification in deep neural networks, which poses serious potential security risks to image classification tasks. To improve the robustness of image classification models, we propose an adversarial examples defense method that combines adversarial detection and adversarial purification by conditional diffusion model. Remaining the target model’s structure and parameters when detecting and purifying adversarial examples. This method includes two modules: adversarial detection and adversarial purification. For adversarial example detection, inconsistency enhancement is adopted. We train an image restoration model that integrates high-dimensional features of the target model and basic image features. The inconsistency between the initial input and the restoration results is compared to detect adversarial examples. For adversarial purification, an end-to-end adversarial purification method is adopted, and image artifacts are added during the execution of the denoising model. The adversarial detection and purification module is added in front of the target model on the premise of ensuring the accuracy of the target model. Based on the detection results, corresponding purification strategies are selected to eliminate adversarial examples and improve the robustness of the target model. We compare recent adversarial detection and purification methods on the CIFAR10 and CIFAR100 datasets by five adversarial attack algorithms to create adversarial examples: FGSM, PGD, Deepfool, CW, and BPDA. Compared to Argos, our method improves detection accuracy by 5-9 percentage points on the CIFAR10 and CIFAR100 datasets in low purification setting. Compared to ADP, this scheme has a more stable defense effect when facing different types of adversarial examples, and is higher than ADP by 1.3% under BPDA attacks.
  • ZHANG Lei, ZHAO Guangyue, XIAO Chaoen, WANG Jianxin
    Accepted: 2024-04-09
    In recent years post-quantum cryptographic algorithms have become a hot research topic in the field of security due to their resistance to quantum attacks. The lattice based Falcon digital signature algorithm is one of the first four post-quantum cryptographic standard algorithms published by NIST. Key tree generation is the core component of Falcon algorithm, which takes more time and consumes more resources in the actual operation. Therefore, proposes a GPU-based parallel key tree generation scheme for Falcon, which uses SIMT parallel mode with joint control of parity threads and direct computation mode without intermediate variables to achieve speedup and reduce resource consumption. Experiments are conducted on a python-based CUDA platform to verify the correctness of the results. Falcon key tree generation for RTX 3060 laptop has a latency of 6ms and a throughput rate of 167 times/s, It achieves a 1.17x acceleration ratio relative to the CPU when computing a single Falcon tree generating part, where the GPU achieves a approximately 56x acceleration ratio relative to the CPU when 1024 Falcon tree generating parts are generated simultaneously; the throughput rate is 32/s on the embedded Jetson Xavier NX platform.
  • XU Shoukun, ZHANG Lujun, SHI Lin, LIU Yi
    Accepted: 2024-04-09
    Through an investigation into point cloud object detection, it was found that current detection methods often require demanding supervised datasets, posing challenges in terms of manpower and resources. Therefore, an innovative solution is proposed, which involves the use of the Meta Learning framework to overcome the abundance of labeled data. Research is conducted on the application of Few-shot Learning techniques to 3D point cloud object detection. This approach can predict the classification of unlabeled samples based on a limited number of new class-labeled samples, thus achieving good results under limited data conditions. To achieve this, the Prototypical VoteNet is introduced to learn geometric prototypes of categories and support set prototypes. Additionally, an Intention-attention mechanism is introduced to learn point cloud context information for more precise information fusion. To avoid over-reliance on max-pooling and the loss of a significant amount of information in prototype generation, mean-pooling is used. Compared to baseline models on benchmark datasets, the method consistently demonstrates significant improvements, providing strong support for research and applications in the field of point cloud object detection.
  • SONG Yinghua, XU Yaan, ZHANG Yuanjin
    Accepted: 2024-04-09
    Air pollution is one of the main problems of urban environmental governance, and PM2.5 is an important factor that affects air quality. In regard of problems that the traditional time series prediction model for PM2.5 concentration prediction lacks seasonal factor analysis and the prediction accuracy is not high enough, a fusion model based on machine learning, the Periodic Difference Automatic Smooth Regression (SARIMA)-Support Vector Machine (SVM) model is proposed in this paper. The fusion model is a tandem fusion model, which splits the data into linear parts and nonlinear parts. Based on the Autoregressive Integral Moving Average (ARIMA) model, the SARIMA model adds seasonal factor extraction parameters, which can effectively analyze the seasonal trend of PM2.5 concentration data and predict the future linear trend of the data. Combined with the SVM model, to optimize the residual sequence of the predicted data, using sliding step size prediction method to determine the optimal prediction step size for the residual series, the optimal model parameters are determined by grid search, which leads to the long-term prediction of PM2.5 concentration data and improves the overall prediction accuracy. By analyzing the PM2.5 concentration monitoring data in Wuhan in the past five years, the results shows that the prediction accuracy of the fusion model is greatly improved compared with the single model. In the same experimental environment, the accuracy of the fusion model is improved by 99%, 99% and 98% compared with the ARIMA, Auto ARIMA and SARIMA models respectively and the stability of the model is also better, which provides a new idea for the prediction of PM2.5 concentration.
  • ZHANG Guo-fu, GUAN Yan-ni, SU Zhao-pin, YUE Feng
    Accepted: 2024-04-09
    The distribution of emergency supplies for large-scale natural disasters is the basis for emergency relief at disaster-stricken sites, and it mainly studies how to reasonably allocate emergency supplies around the natural disaster-stricken sites so as to expeditiously deliver emergency supplies from various reserve stations to the disaster-stricken sites to ensure the smooth implementation of accidental relief. However, most of the existing researches are limited to the distribution of emergency materials at a single stage, and they emphasize too much on the timeliness of the emergency response and neglect the continuity of the material consumption. To this end, a multi-objective allocation model is first constructed for multi-reserve stations, multiple emergency materials, multi-affected points, and multi-stage continuous allocation of emergency materials, and the constraints to meet the continuous consumption of materials within a stage are analyzed and deduced. Then, a multi-objective allocation algorithm for large-scale natural disaster emergency materials is proposed based on the non-dominated sorting genetic algorithm and heuristic strategy. Finally, the effectiveness of the proposed algorithm is verified by simulation experiments. The experimental results show that the proposed algorithm simultaneously takes into account the continuity and timeliness requirements of large-scale natural disaster emergency response, and can provide more and better emergency material allocation schemes for large-scale natural disaster emergency relief.
  • HE Jie, MA Qiang
    Accepted: 2024-04-09
    The rapid development of driverless and assisted driving puts high demands on vehicle computing performance, for which task offloading techniques for joint mobile edge computing provide solutions. However, there is a huge challenge to realize fast and efficient task offloading decision, while the existing research does not consider the overall system benefits of task offloading enough. To address the above problems, a distributed task offloading system model for Cellular Vehicle-to-Everything (C-V2X) based on software-defined networking is designed using the vehicle-road-air architecture; a task offloading control algorithm based on deep reinforcement learning is proposed. The cost models are constructed for the three modes of task local computing, edge computing, and satellite computing respectively, and the objective function is constructed with the vehicle energy consumption and resource leasing cost at the user side and the task processing delay and server load balance at the server side as the joint optimization objectives. Considering the constraints such as the maximum expected task delay and the maximum server load ratio, the problem of task offloading is formulated as a mixed-integer nonlinear programming problem (MINLP), which is then modeled as a Markov decision process in a discrete-continuous mixed action space. Finally, the task offloading decisions regarding task scheduling, resource leasing, and power control are obtained based on deep reinforcement learning algorithms. Compared with the traditional schemes based on particle swarm optimization and genetic algorithms, the scheme in this paper reduces the single decision latency by more than 45% while achieving similar decision benefits.
  • CHENG Wen, GUO Liuxiao
    Computer Engineering. https://doi.org/68654
    Accepted: 2024-04-09
    This paper studies the problem of achieving formation desire in a directed communication topology for nonlinear second-order multi-agent systems (MASs) in an arbitrary preset time. Firstly, the preset time mechanism is combined with the integral sliding mode control strategy to integrate the design of the new formation control protocol. Theoretically, the protocol can ensure the convergence of the arrival and sliding phases at any preset time. The introduced integral sliding mode control term effectively reduces the steady-state error of the system. It improves the robust stability of the system under the preset time control. Secondly, the Lyapunov method and the analysis of the algebraic graph theory method. Finally, numerical simulation results show that the algorithm is effective and feasible given an initial value of a second-order nonlinear multi-agent system.
  • YAN Changyu, ZHANG Lei
    Accepted: 2024-04-09
    In heterogeneous computing systems, efficient task scheduling algorithms are important to achieve high performance. List scheduling algorithm is a kind of classical static heuristic algorithm to solve the task scheduling problem. Due to the difference of the computation cost and communication cost of the task in heterogeneous system, the task scheduling problem is more complex than in homogeneous system. In this field, the research goals mainly focus on algorithms that shorten the scheduling length under lower time complexity. To solve these problems, a hybrid list scheduling algorithm DPLS based on task replication and pre-scheduling is proposed. This algorithm adopts task duplication strategy to selectively duplicate and schedule key predecessor tasks of the current task to the same processor, reducing the waiting time of the current task's dependent data communication on the key predecessor task, and then advancing the current task's completion time. The algorithm includes two stages: pre-scheduling and secondary scheduling. The pre-scheduling algorithm generates the basic scheduling scheme, and the secondary scheduling algorithm tries to generate a better scheduling scheme on this basis. Compared with the classical algorithm, DPLS has the same time complexity, that is, for tasks and processors. Simulation experiment results show that DPLS can generate schemes with shorter scheduling lengths and show better performance than other list scheduling algorithms. Compared with HEFT and PEFT, the performance of DPLS is improved by 12.563% and 7.786% respectively.