Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

15 May 2026, Volume 52 Issue 5
    

  • Select all
    |
    Frontier Perspectives and Reviews
  • WANG Tian, LI Guo, MEI Yaxin, ZHONG Wentao
    Computer Engineering. 2026, 52(5): 3-42. https://doi.org/10.19678/j.issn.1000-3428.0260004
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    To address the critical challenges of high latency, bandwidth constraints, and privacy vulnerabilities faced by traditional Sensor—Cloud (SC) systems in processing massive amounts of real-time sensing data, Edge Computing (EC) has emerged as a promising solution by extending computational and storage capabilities to the network periphery. This study provides a systematic survey of integration technologies and evolutionary paradigms of SC and EC. First, the evolutionary logic of the "Cloud—Edge—End" collaborative architecture is analyzed, and edge-based data preprocessing, redundancy elimination, and collaborative storage mechanisms are discussed. Second, intelligent resource optimization techniques are investigated by comparing the performance of traditional heuristic algorithms and Deep Reinforcement Learning (DRL) in dynamic task offloading and cross-layer resource scheduling. Furthermore, the application paradigms of Federated Learning (FL) and Edge Intelligence (EI) in privacy preservation and autonomous decision-making are analyzed, focusing on hierarchical model aggregation, lightweight model compression, and collaborative inference based on knowledge distillation. Additionally, by incorporating systems engineering practices, this study elucidates the implementation path for building efficient and scalable edge collaborative systems using Kubernetes container orchestration and Kafka messaging middleware. Finally, common challenges such as heterogeneous resource management and network dynamics are summarized, and future trends toward green computing, semantic communication, and integrated sensing—communication—computation are envisioned, providing theoretical references and engineering guidance for constructing next-generation efficient Internet-of-Things (IoT) sensing systems.

  • XU Shengxuan, XU Lei, FEI Yifan
    Computer Engineering. 2026, 52(5): 43-59. https://doi.org/10.19678/j.issn.1000-3428.0252172
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Machine vision technology can be leveraged to identify images of collected concrete products, enabling the rapid, accurate, and nondestructive assessment of their performance, which is significant for engineering applications. Traditional manual inspection methods are inefficient and highly subjective. Moreover, the performance of existing image recognition technologies is hindered by challenges such as uneven lighting, background noise interference, diverse crack shapes, and blurry boundaries in dynamic images. Therefore, intelligent solutions that can adapt to complex engineering scenarios are in demand. Through a systematic review of relevant literature, this paper evaluates two concrete types, focusing on the identification of cracks and appearance defects in static hardened concrete and the evaluation of the flowability of dynamic fresh concrete. First, from the perspectives of traditional digital image technology and neural networks, it reviews the research progress on crack identification, appearance quality discrimination, and flowability assessment under different scenarios and shooting subjects. Then, it summarizes and compares the advantages and disadvantages of different algorithms in the preprocessing, image segmentation, and feature extraction steps in existing processing procedures, as well as their application scenarios. Finally, through a comparative analysis, a set of recommended image recognition processing procedures and solutions for judging the appearance, quality, and flowability of concrete products is proposed. This paper provides algorithmic ideas for the intelligent recognition and assessment of structural concrete performance, thereby promoting the application of visual technology in the civil engineering field.

  • XU Minchen, QU Dan, SI Nianwen, PENG Sisi, CHEN Yaqi
    Computer Engineering. 2026, 52(5): 60-80. https://doi.org/10.19678/j.issn.1000-3428.0070287
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Timely and effective disinformation detection is crucial for curbing the spread of disinformation and minimizing social harm. Numerous deep learning methods have been employed for disinformation detection. Summarizing the detection principles and paradigms of existing research is essential for identifying directions for technical optimization. Therefore, this paper comprehensively reviews existing research based on the principles and implementation paths of disinformation detection, and for the first time, summarizes and compares the applications of large language models in this field. First, the relevant concepts of disinformation detection tasks are introduced and the data structures of commonly used disinformation detection datasets are summarized. Then, based on detection principles and implementation methods, the paper presents ways to detect textual and multimodal disinformation through semantic feature representation, auxiliary task design, internal knowledge inference, and fact verification, refining them into ten subcategories and summarizing the potential characteristics of detection methods for each subcategory. Finally, the paper summarizes disinformation detection paradigms based on deep neural networks and large language models, compares the detection performance of representative methods from these paradigms across seven disinformation detection datasets, and highlights the advantages and limitations of large language models in detecting disinformation. It also presents the anticipated opportunities and challenges brought about by large language models in the field of disinformation detection, providing a reference for future research.

  • LI Hui, LIU Jiayu, XU Yaping
    Computer Engineering. 2026, 52(5): 81-94. https://doi.org/10.19678/j.issn.1000-3428.0253035
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Medical image segmentation enables pixel-level localization of lesions or anatomical structures in multimodal imaging data and serves as a key foundation for computer-aided diagnosis and clinical decision-making. This study addresses the rapid evolution of medical image segmentation network architectures and the inherent limitations (semantic ambiguity and statistical instability) of existing evaluation metrics. This study aims to systematically examine and delineate the alignment among network structure, task characteristics, and evaluation metrics; reveal the method development path and performance boundaries; and establish a structure-metric matching mechanism tailored to practical clinical needs. Based on representative literature from the Web of Science Core Collection between 2020 and 2025, this study first reviews the design mechanisms and evolutionary pathways of core architectures, such as Transformers, Graph Neural Networks (GNNs), and Diffusion Models (DMs), and then summarizes the essential characteristics of lightweight, hybrid, and prompt-guided paradigms. Subsequently, by integrating empirical studies on public datasets, a quantitative comparison is conducted across different architectures in typical segmentation tasks involving organs, tumors, and brain tissues, covering common metrics such as the Dice Similarity Coefficient (DSC), 95% Hausdorff Distance (HD95), and Intersection over Union (IoU). The results indicate that HD95 exhibits high variability in boundary-complex tasks, DSC shows limited sensitivity to small targets, and IoU presents insufficient structural discrimination capability. Furthermore, this study reveals the statistical causes underlying metric misapplication and task-metric mismatch; constructs a task-structure-to-metric recommendation mapping; proposes a task-granularity-based metric selection strategy; and explores how dynamic networks, self-supervised learning, and cross-modal modeling contribute to the enhancement of model generalization.

  • Computational Intelligence and Pattern Recognition
  • HU Jingdan, LI Bo, YANG Jing
    Computer Engineering. 2026, 52(5): 95-102. https://doi.org/10.19678/j.issn.1000-3428.0070131
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    The automatic resolution of Math Word Problems (MWP) is a current research hotspot in the academic community. Despite the significant progress made in the existing research, most studies treat numerical values in MWPs as placeholders and simply process them as ordinary text, overlooking the importance of numerical semantics in solving MWPs. To address this issue, this study proposes an enhanced numerical representation model for MWP solving based on the generic "encoder-decoder" architecture. The model achieves this by utilizing Graph Convolutional Neural Networks (GCNN) to explicitly model the semantic relationships between numerical values and between numerical values and context text and introducing auxiliary learning tasks to guide the model to fully capture task-related numerical semantics. This significantly enhances the numerical modeling capability of the encoder. Empirical evidence from the commonly used MWP datasets, Math23K and MAWPS, shows that the proposed model can fully consider numerical semantics and outperform mainstream classical models in solving large-scale Chinese application problem sets.

  • WANG Lijuan, LI Xueyan, YIN Ming, HAO Zhifeng, CAI Ruichu, CHEN Wei, LIU Rui
    Computer Engineering. 2026, 52(5): 103-116. https://doi.org/10.19678/j.issn.1000-3428.0070309
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Multi-view clustering focuses on mining consistency information between different views to improve performance. Most existing multi-view clustering algorithms focus on single-task multi-view clustering while ignoring the similarity of related tasks, which results in poor performance on multiple tasks. Multi-task clustering can effectively handle the correlation between multiple tasks, and the most common clustering problem in practice is multi-task multi-view data clustering. To better explore the correlation between related tasks and obtain more effective consistency information from the multi-view data of each task, this paper proposes a multi-task multi-view clustering algorithm based on consensus graph learning. This algorithm establishes a view-specific shared feature library, which stores and migrates all tasks and potential information shared by all views, that is, the feature-embedding information shared by each task in the common view. When dealing with new tasks, each view of a new task optimizes the similar graph structure and corresponding sample embeddings simultaneously to obtain more accurate sample embedding representations. Meanwhile, collaborative clustering is introduced to achieve knowledge transfer between shared feature libraries and new task sample embeddings. This approach utilizes the diversity information of feature embedding to promote the consistent expression of various views in the new task, while updating the shared feature library based on the sample information of this new task. After obtaining the optimal sample-embedding representation, all views are fused to obtain a consensus graph for the new task. Subsequently, an alternating direction strategy is adopted to optimize the model, and rank constraints from the Laplacian matrix of the consensus graph are introduced to directly obtain the clustering results. The results of experiments show that, compared with six existing advanced algorithms, the proposed algorithm exhibits higher clustering performance and efficiency on five multi-task multi-view datasets.

  • ZHAO Shuxu, ZHOU Hongze, WANG Xiaolong
    Computer Engineering. 2026, 52(5): 117-128. https://doi.org/10.19678/j.issn.1000-3428.0070326
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Edge servers often need to collaborate to execute tasks by forming alliances when resources are limited. Ensuring that tasks can be completed as quickly as possible while reducing the cost of restructuring alliances is a major challenge considering the dynamic changes in server resource utilization for task execution. A coalition structure optimization strategy based on dual Deep Q-Network (DDQN) optimization is proposed to address these issues. First, with the optimization objective of maximizing task completion efficiency and minimizing alliance-building costs, the problem is modeled as a Cost Introduced Markov Decision Process (CT-MDP) by defining the state space, action space, and reward function. Second, in response to the problem of overestimating Q-values in high-dimensional state spaces in the CT-MDP, a lightweight optimal alliance structure search algorithm based on DDQN is proposed. Two independent Q-networks are used to reduce the forward cumulative error during the update process. To satisfy the strict requirements of the edge devices for resource utilization during training, the activation function is optimized to reduce the storage resource requirements of the training model. Finally, the proposed algorithm is compared with Q-learning, DQN, Dueling DQN, and other algorithms using simulation experiments. The results show that the proposed method has good convergence and stability, and it reduces alliance construction costs and resource utilization by 20.36% and 12.12%, respectively, demonstrating the effectiveness of the method.

  • LIU Haijun, FU Xiaodong
    Computer Engineering. 2026, 52(5): 129-138. https://doi.org/10.19678/j.issn.1000-3428.0070288
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Real-world data often follow a long-tail distribution. Federated learning methods that assume a balanced global data distribution struggle to classify tail-class data within long-tail data accurately. Researchers typically focus on retraining a balanced classifier for the global model, to mitigate the impact of long-tail data. However, this approach does not consider the feature extractor of the balanced model or how the model's feature extractor can be enabled to learn high-quality image features, leading to the poor performance of the global model. To enable the model to learn high-quality image features without bias during the feature learning stage, this study proposes a federated learning method combining rotational self-supervision and Contrastive Language-Image Pre-training (CLIP) guidance. This method uses rotational self-supervision to guide the training of local client models, thereby reducing the impact of long-tail data on the client models and enabling the model to learn high-quality image features. Simultaneously, CLIP is utilized to guide both the normal training of the model and the rotated images, transferring rich knowledge from CLIP to the client model and further enhancing the performance of the feature extractor. In experiments on the CIFAR-10 and CIFAR-100 datasets under different long-tail distributions, the proposed approach improves the global model's classification accuracy by 2.35 to 4.72 percentage points, respectively, compared with other federated learning methods.

  • WANG Shuo, LI Ke, LI Zelin
    Computer Engineering. 2026, 52(5): 139-149. https://doi.org/10.19678/j.issn.1000-3428.0070058
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Entity Alignment (EA) is a key step in the fusion of Knowledge Graphs (KGs). Existing EA methods only consider EA between two KGs, whereas many scenarios require the EA across multiple KGs. Existing methods transform the EA tasks of multiple KGs to several pairwise EA tasks, while ignoring the inherent connections and constraints among equivalent entities across all KGs. To address this problem, based on an analysis of existing EA optimization methods, Entity Alignment Optimization for Multiple knowledge Graphs Fusion (MGEAO) is proposed by leveraging the transitivity constraints of equivalent entities across multiple KGs. A general framework for EA optimization across multiple KGs is proposed by combining it with existing pairwise EA methods. First, the pre-alignment matrix for each KG pair is computed on the basis of entity embeddings. The matrix is then corrected to obtain the final alignment through multiple KGs alignment optimization, which integrates Bidirectional Normalization (BN), Deferred Acceptance Algorithm (DAA), Relation—Entity Aware adjustment (REA) and Transitivity Constrained Optimization (TCO). Experiments on the DBP15K, FB15K, and YAGO15K datasets indicate that the performance is significantly improved in relation to that of baseline EA models, i.e., Hits@1 and Hits@10 are improved by up to 18.8 and 18.05 percentage points, respectively.

  • LI Mingming, PAN Zihao
    Computer Engineering. 2026, 52(5): 150-159. https://doi.org/10.19678/j.issn.1000-3428.0070256
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    The traditional path-planning algorithm of a mobile robot usually requires a map to effectively plan the path. By contrast, path planning based on Deep Reinforcement Learning (DRL) does not require maps for navigation, owing to which it has received considerable attention. However, traditional DRL path-planning algorithms often face challenges such as low sample utilization, slow training speed, and insufficient generalization ability. To solve these issues, the Twin Delayed Deep Deterministic (TD3) policy gradient algorithm is improved to enhance its path-planning performance for mobile robots. First, to solve the problem of the TD3 algorithm having limited ability for continuous space exploration, the exploration strategy is improved and the continuous space exploration ability of the algorithm is enhanced using pink noise with time correlation. Second, the n-step method is combined with the Loss-Adjusted Approximate Actor Prioritized (LA3P) experience replay method. The n-step method can capture the long-term reward signal more accurately by expanding the immediate reward in the experience replay buffer to the n-step cumulative discount reward, whereas the LA3P method can improve the sample utilization and performance of the algorithm by efficiently using the n-step experience. Finally, three different environments are built in Gazebo for the experiments and compared with various algorithms. The experimental results show that the improved algorithm has more advantages in terms of training time, average success rate, and average distance, which proves the effectiveness of the improved algorithm.

  • WU Yongqing, JIANG Zhengyu
    Computer Engineering. 2026, 52(5): 160-171. https://doi.org/10.19678/j.issn.1000-3428.0070319
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    This paper introduces a traffic flow prediction model using a Decoupled Dynamic Spatio-Temporal Convolutional Recurrent Network (DDSTCRN) to improve the exploration of complex spatio-temporal correlations in existing traffic flow prediction models. First, the data-decoupling module uses gating and residual decomposition mechanisms to separate the two hidden time-series signals in the traffic data: diffusion and independent signals. Second, separate models are applied to these two signals to improve prediction accuracy. Local diffusion convolution captures the diffusion process between traffic data points, whereas dynamic recurrent graph convolution captures the global spatio-temporal correlations in the independent signals, addressing the accuracy issues of single-model approaches. Third, a dynamic graph constructor, using a prior-free dynamic graph construction method, captures the dynamically changing spatial dependencies in traffic data. Finally, an external component module predicts the impact of factors such as weather conditions on traffic data, thereby enhancing the robustness of the model. Experiments on five public traffic flow datasets (METR-LA, PEMS-BAY, PEMS04, PEMS08, and NE-BJ) show that the proposed model reduces Mean Absolute Error (MAE) by 1.2%—4.6% compared to the top-performing D2STGNN and by 3.7%—10.5% compared to the second-best DGCRN across different prediction lengths. The proposed model exhibits lower prediction errors than the other representative models. The experimental results suggest that the model effectively captures complex spatio-temporal correlations in traffic data and delivers superior performance in traffic flow prediction tasks.

  • WANG Shuyun, MA Tengfei, XIA Jie, YANG Zhiyong
    Computer Engineering. 2026, 52(5): 172-183. https://doi.org/10.19678/j.issn.1000-3428.0070265
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Primary and secondary school students face numerous distractions such as games and short videos after class. They have limited self-control and tend to become distracted during autonomous learning. Constant supervision by parents is also difficult, which leads to poor learning outcomes. A highly reliable and low-intrusion learning state monitoring system is required to enhance students' learning efficiency and alleviate parents' anxiety. Among existing learning state monitoring methods, those based on computer vision and wearable devices have drawbacks such as relying on equipment and environment, affecting user comfort, and infringing personal privacy. To address these issues, this paper identifies micromotions under different learning states as the recognition target, categorizing learning behaviors into four states: gaming, reading, writing, and resting. The paper proposes a noncontact learning state monitoring method called Wi-LSM, based on Wi-Fi Channel State Information (CSI). The proposed method involves the following steps: first, the Wi-Fi network card collects raw CSI data, which are then preprocessed using phase calibration and linear interpolation algorithms to eliminate original phase shifts and fill missing data packets; second, the time-frequency domain information of filtered and denoised amplitudes is extracted and combined with phase differences to form recognition features; finally, perceptual features are input into a multilayer convolutional neural network model, BN_SE_CNN, to classify different learning states. Experimental results show that the method achieves an optimal recognition accuracy of 96% in different indoor environments, verifying the effectiveness of the system for learning state monitoring.

  • Computer Vision and Image Processing
  • YANG Jiahao, WANG Lei
    Computer Engineering. 2026, 52(5): 184-191. https://doi.org/10.19678/j.issn.1000-3428.0070179
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Existing individual attention target detection methods mainly rely on facial information. They face challenges in scenarios where fine facial information is missing because of partial occlusion, facial blurring, or privacy protection. Ignoring time information can also affect the effectiveness of these methods in video tasks. This paper proposes a spatiotemporal inference network based on multi-feature fusion for detecting individual attention targets. Convolutional neural networks are utilized to extract the key features from an individual's head appearance and facial information, individual posture information, and related scene information. By leveraging the attention mechanism of the spatial reasoning encoder and through custom model training strategies, the significance of different features is learned, reducing the overreliance on any single feature and achieving weighted integration of spatial features. Convolutional Long Short-Term Memory (Conv-LSTM) networks are employed to integrate spatiotemporal information across video frames, effectively detecting individual attention targets. In experiments on the GazeFollow and VideoAttentionTarget datasets, the proposed method achieves AUC values of 0.936 and 0.902, respectively. Compared with state-of-the-art methods, the overall performance of the proposed method improves by 1.7 and 3.2 percentage points on these two datasets. It has better accuracy and robustness in individual attention target detection tasks, making it suitable for complex real-world scenarios.

  • CAI Jianghe, CHEN Fei, JIANG Fan, CHEN Hang, WANG Meiqing
    Computer Engineering. 2026, 52(5): 192-202. https://doi.org/10.19678/j.issn.1000-3428.0070478
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Color image Guided Depth Super-Resolution (GDSR) aims to reconstruct a High-Resolution (HR) depth map from its Low-Resolution (LR) version with guidance from HR color images of the same scene. Although learning-based methods used in the spatial domain can effectively enhance the overall reconstruction quality of depth maps, they exhibit edge structure blurring during reconstruction from LR depth maps. To address this issue, this paper proposes a Gradient—Frequency guided multi-stage integration Network (GFNet). This network utilizes gradient priors and frequency information from color images to enhance the reconstruction of edge structure details in depth maps. First, a gradient feature extraction module is designed to incorporate the gradient prior knowledge of RGB images and thus optimize the gradient structures of LR depth maps. Subsequently, a spatial—frequency dual-path guidance module is developed to map precise high-frequency components from RGB images onto LR depth maps, thereby guiding the reconstruction of high-frequency information lost in the depth maps. Finally, a novel implicit neural function is employed to improve the resolution of depth maps. Experimental results show that, under an 8× scaling factor, GFNet achieves RMSE values of 2.48, 1.62, and 2.57 cm on the NYUv2, Middlebury, and real-world RGB-D-D datasets, respectively, outperforming the more complex GeoDSR model by 0.14, 0.06, and 0.12 cm, respectively. Additionally, it surpasses comparison methods in terms of edge structure detail reconstruction, demonstrating its effectiveness.

  • TIAN Hui, DUAN Xinlong, HAO Qiya, SUI Wenhao, MA Yuying, YU Zuhua, XU Yang, CAO Yangjie
    Computer Engineering. 2026, 52(5): 203-215. https://doi.org/10.19678/j.issn.1000-3428.0070281
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Cell counting is common in clinical medical research and plays a crucial role in biology and clinical medicine. In situations where cells overlap, multiple cells may be counted as a single one, causing counting accuracy to decrease. To address this issue, this paper introduces an improved U-Net medical image segmentation model. The paper proposes a cell counting method combining an improved Vision Transformer (ViT) module and multi-scale feature fusion. This counting method comprises four parts: an encoder for extracting deep features, a multi-scale feature fusion module for concatenating encoder and decoder features, an improved ViT module for capturing global context information, and a decoder for restoring feature dimensions and outputting the segmentation results. The improved ViT module utilizes novel spatial and channel attention modules to address the insufficiency of traditional ViT in extracting specific spatial and channel dimensional information. The multi-scale feature fusion module integrates feature maps of different scales, enhancing the ability of the model to segment the boundaries of cells of different sizes and reducing the impact of cell overlap on counting accuracy. To further improve the ability of the model to segment overlapping cells, the paper proposes a data augmentation strategy. By converting the original cell annotations into circular annotations with a specific radius and adjusting the distance between the cell annotations, this strategy guides the model to separate overlapping cells more effectively. Experiments on the LiveCell, MBM cells, and DCC datasets demonstrate that the proposed counting method achieves good results, effectively addressing the issue of decreased counting accuracy caused by cell overlap.

  • ZHANG Xiang, PENG Li
    Computer Engineering. 2026, 52(5): 216-225. https://doi.org/10.19678/j.issn.1000-3428.0070312
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Autonomous driving scene understanding is one of the critical components of autonomous driving technology. Map perception and topological relationship inference are essential parts of scene understanding. Map perception tasks mainly include road element perception and traffic element perception. Topological relationship reasoning builds topological relationships between perception results based on map perception. However, traditional methods face challenges in map perception performance when sensors are occluded, or the perception range exceeds the sensor's limit. Additionally, since topological relationship inference in driving scenes relies on map perception results, errors in map perception can further impact the accuracy of topological relationship inference. To address this, a map perception uncertainty modeling method incorporating standard map priors is proposed, along with robust topological relationship inference based on map perception uncertainty. First, by introducing high-precision map prior information, the method effectively enhances map perception performance in occluded scenarios. Next, the map perception results are modeled using the Laplace distribution to achieve uncertainty modeling of map perception. Finally, a probabilistic topological relationship inference method is proposed based on the map perception results and their uncertainties, which effectively improves the accuracy of topological relationship construction. Extensive experiments conducted on public datasets demonstrate that the proposed method outperforms comparative methods in both map perception and topological relationship inference tasks.

  • WEI Wenquan, MO Hongwei
    Computer Engineering. 2026, 52(5): 226-238. https://doi.org/10.19678/j.issn.1000-3428.0070376
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    The quality of Printed Circuit Boards (PCBs) plays a decisive role in the performance of industrial electronic products. Therefore, strict control of PCB factory quality is of considerable importance. The six types of defects that PCB are prone to are the primary basis for evaluating PCB quality. To address problems such as the low accuracy and large size of PCB defects detected by existing target detection algorithm models, an improved YOLOv5s-CMS model is proposed. According to the characteristics of PCB defects, the YOLOv5s-CMS model adopts a feature extraction network C2f-C3-Ghost (CCG) focusing on small target information to replace the original feature extraction network, such that the model pays more attention to the space and gradient flow information of small targets in the feature extraction stage. In the feature fusion phase, a Multi-scale Cross-layer Small Target Feature Fusion Network (MCSTF-Net) is proposed to replace the Path Aggregation Network (PANet), which can improve the accuracy of PCB defect detection while considerably reducing the number of parameters in the model. To further improve the model's understanding of small target characteristics, the CCG network and MCSTF-Net are combined with the Squeeze-and-Excitation (SE) attention mechanism to highlight the channels rich in target characteristics while suppressing irrelevant channels. The ablation experiment results showed that the accuracy, recall, mAP@0.5, and mAP@0.5∶0.95 of PCB defect detection using the YOLOv5s-CMS model reached 98.1%, 97.8%, 98.4%, and 61.2%, respectively. Compared to the original YOLOv5s model, the number of parameters increased by 2.2, 1.3, 0.8, and 5.0 percentage points, respectively, and the number of model parameters decreased by approximately 46.1%.

  • GAO Yufei, JIA Xin, HUANG Zhangchi, XU Zhinan, HUO Pengfei, LU Zhiyin
    Computer Engineering. 2026, 52(5): 239-249. https://doi.org/10.19678/j.issn.1000-3428.0070753
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Multi-view 3D reconstruction aims to recover the 3D shape of a given object from multiple 2D images. However, existing methods neglect to learn the rotational invariance and regional consistency of objects, making it difficult to accurately aggregate multi-view features, resulting in the loss of reconstruction details. To address this issue, this study first proposes Dual-view Point cloud reconstruction based on Rotation-invariant Regional consistency (DPR2). DPR2 takes two RGB images as input, explores the rotational invariance of object regions, learns the regional consistency of objects across views, promotes the aggregation of multi-view features, and reconstructs the fine point cloud of the given object. In the encoding stage, a point-cloud initialization network is first introduced to initialize a coarse point cloud for each view. The study also proposes a region-level rotational invariant feature extraction network that captures the rotational invariant features of different regions of the coarse point cloud by calculating the Euclidean distance between two points. In the decoding stage, a two-stage cross-attention mechanism is designed to construct high-quality regional consistency of cross-view point clouds, thereby accurately achieving multi-view feature aggregation. Additionally, a point-cloud refinement network is designed that utilizes aggregated features to refine the coarse point cloud into one with fine-grained details and smooth surfaces. Extensive experimental results on the ShapeNet and Pix3D datasets show that DPR2 outperforms existing state-of-the-art methods. Compared with the latest methods, P2M++ and MVP2M++, DPR2 improves the Chamfer Distance (CD) by 23.62% and 9.06%, respectively.

  • SONG Tianze, CAO Congjun, HE Jiaqi, WANG Xusheng, LIU Chenyu
    Computer Engineering. 2026, 52(5): 250-258. https://doi.org/10.19678/j.issn.1000-3428.0070106
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Dense pedestrian detection is a research hotspot in the field of pedestrian detection. This study proposes an improved DETR target detection algorithm, Pe-DETR, to address the problem of occluded targets and small target pedestrians being prone to missed detection in dense pedestrian detection scenes. This algorithm uses Dino-DETR, which is based on the multi-head self-attention mechanism, as the benchmark model. However, the self-attention mechanism lacks the ability to capture local features, resulting in poor detection of dense pedestrians. To address this issue, this study enhances Feedforward Neural Network (FNN) and proposes channel attention convolutional feedforward neural network DWSEFNN to extract more local detailed features. In response to the low efficiency of the ResNet50 backbone network in extracting important features, Swin Transformer-L is adopted as the feature extraction network. Simultaneously, Pe-DETR is completely built based on the attention mechanism, and the architecture does not contain a deep convolution structure. To handle the contradictions between the large number of targets in dense pedestrian scenes and sparse matching in the DETR detector, densely different queries are applied to handle pedestrian-dense scenes without introducing invalid similar queries. Experimental results on the CrowdHuman dense pedestrian detection dataset show that, compared with the Dino-DETR algorithm, the proposed pedestrian detection algorithm Pe-DETR achieves an improvement of 3.7 percentage points in Average Precision (AP)@0.5 and an increase of 4.5 percentage points in AP. In dense pedestrian detection tasks, the improved Pe-DETR algorithm demonstrates significantly higher accuracy than other end-to-end models.

  • ZHAO Ang, XIANG Jie, NIU Yan, WU Xubin, SONG Zize, WEN Xin
    Computer Engineering. 2026, 52(5): 259-269. https://doi.org/10.19678/j.issn.1000-3428.0070168
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Magnetic Resonance Imaging (MRI) is one of the most widely used methods of clinical diagnosis. However, obtaining high-resolution MRI images is challenging because of the high cost of the scanning equipment and time limitations. In recent years, Diffusion Models (DMs) have been applied to super-resolution techniques to improve image quality. Nevertheless, existing research has demonstrated inefficiencies in model inference and insufficient extraction of high-frequency features, resulting in suboptimal reconstruction outcomes. To address these issues, an efficient MRI single-image super-resolution diffusion model called Residual Diffusion Model (ResDM) is developed. This model leverages a pretrained super-resolution model to provide a conditional image for a given low-resolution input. Noise is then guided to the residual space between the high-resolution and conditional images. To accelerate the model inference, an implicit denoised diffusion model is employed in conjunction with a U-Net structure to achieve rapid generation and high-quality results. Furthermore, a loss function and attention mechanism based on the frequency domain are introduced to enhance the recovery of high-frequency detailed information. Experiments are conducted on three public datasets: HCP, BraTS2019, and FastMRI. The results are evaluated using two objective image evaluation metrics: the Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM). The findings indicate that, compared to seven existing image super-resolution reconstruction methods, the proposed method achieves an average increase of 2.24 dB in PSNR and 0.06 in SSIM across the three datasets with an upsampling factor of 4. This approach yields MRI images with higher resolution and richer detailed information, as demonstrated by the corresponding visualization results.

  • Cyberspace Security
  • WU Peiying, LI Xiaohui, WANG Junfeng
    Computer Engineering. 2026, 52(5): 270-280. https://doi.org/10.19678/j.issn.1000-3428.0070472
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Command and Control (C2) communication plays an essential role in modern Advanced Persistent Threats (APTs) and is the key communication link for achieving long-term lurking and continuous control. C2 traffic detection is crucial for defending against APT attacks and protecting network security. However, existing C2 traffic detection methods are mainly based on conventional machine learning and deep learning. In these methods, feature engineering relies on expert experience, is highly subjective and prone to omissions, and has poor adaptability to rapidly evolving attack forms and traffic patterns. Conversely, traditional deep learning models show poor performance in capturing deep and complex features and show a strong dependence on labeled data and training resources. To address these issues, this paper proposes a C2 traffic detection method (C2BT) based on Transformer bidirectional encoding representation. Unlike conventional detection methods based on feature engineering, this method uses the Bidirectional Encoder Representations from Transformers (BERT) large model to automatically learn and capture the depth features of the remote control traffic context. It further introduces a separately trained Transformer decoder for reconstruction and error calculation to evaluate the performance quality of the encoder and incorporates the reconstruction error into the subsequent optimization training process of the encoder to further improve the detection effect and robustness of the model. Extensive experiments are conducted on multiple different C2 traffic datasets. The proposed method demonstrates excellent performance and strong generalization capabilities, with its accuracy, precision, and F1 value reaching 98.47%, 95.82%, and 95.91%, respectively. It maintains stable results on new datasets, demonstrating its effectiveness in C2 traffic detection. The introduction of a decoder reconstruction error evaluation mechanism to verify the robustness of the encoder improves detection efficiency. The proposed method provides a new technical pathway for building a more efficient network security detection and defense system.

  • LIU Minghui, ZHANG En, WANG Mengtao, HUANG Yuchen
    Computer Engineering. 2026, 52(5): 281-292. https://doi.org/10.19678/j.issn.1000-3428.0070356
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    In the contact tracing application scenario, the historical trajectories of the querier and the "sensitive" population are not completely identical, which makes the Private Set Intersection CArdinality (PSI-CA) protocol unable to be directly applied to this contact tracing scenario. To address this issue, an unbalanced fuzzy PSI-CA (uFPSI-CA) protocol is proposed. The uFPSI-CA protocol is a variant of the privacy-set intersection-cardinality protocol. This protocol allows the sender and receiver to jointly calculate the intersection size of their private sets via interactions, without revealing any other information. This protocol assigns many computations to the sender and behaves as a server. The receiver performs only simple element encryption and decryption. Additionally, the overall communication overhead of the protocol is positively correlated with the size of the receiver dataset. To extend PSI-CA to uFPSI-CA, a Classified Shuffle Diffie-Hellman oblivious Pseudo-Random Function (CS-DH-OPRF) algorithm is proposed. When computing the OPRF value of the receiver, the algorithm encrypts the same value for the same group of data as a label, and the receiver classifies the data according to the label in the subsequent calculation. Finally, the uFPSI-CA protocol is implemented. When the sender set size is 218 and the receiver set size is 26, only needs 4 s online time and 15 MB of communication overhead are required, which proves that the protocol is efficient.

  • LIU Chenxu, CAO Suzhen, LIU Jingjie, PANG Xinjie, FENG Zhen
    Computer Engineering. 2026, 52(5): 293-302. https://doi.org/10.19678/j.issn.1000-3428.0069806
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    The importance of data privacy protection and ciphertext searchability in cloud computing environments is increasing. Access policies in plain text in traditional CP-ABE schemes may leak sensitive information, and revoking malicious users is cumbersome. To address these issues, this study proposes an attribute-based searchable encryption scheme with forward and backward security, revocability, and partial policy hiding. This scheme achieves partial policy hiding by exposing user attribute names and hiding user attribute values to avoid sensitive information leakage. A user's identity information is associated with the leaves of a binary tree, and the user revocation list is bound to the ciphertext. Thus, malicious users cannot access the ciphertext before and after revocation once they are added to the revocation list by the trusted center, thereby achieving direct user revocation while meeting forward and backward security. After a malicious user is revoked, the cloud service provider only needs to update the ciphertext related to the revocation list, and no additional key update operation is required, which improves the computational efficiency of the ciphertext update. The binary tree nodes occupied by the revoked user are reused by updating the random value of the binary tree node, which increases the number of users in the system. Based on the q-Bilinear Diffie-Hellman Exponent (q-BDHE) assumption, the proposed scheme is proven to be Indistinguish ability under Chosen Plaintext Attack (IND-CPA) secure in the random oracle model. In performance analyses, computational burden reduces by at least 15.3% during the scheme's encryption stage, and the computational overhead is low in the search verification and ciphertext update phases.

  • LI Liang, XIAO Mingzhi, CHEN Xi
    Computer Engineering. 2026, 52(5): 303-325. https://doi.org/10.19678/j.issn.1000-3428.0253167
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    To address issues such as a single point of failure, risk of tampering, opaque verification, and the proliferation of false information in centralized news architectures, a blockchain-based decentralized news-retrieval and aggregation architecture is proposed for trusted storage, verifiable retrieval, and transparent governance of multi-source news data. This architecture integrates blockchain, smart contracts, and distributed storage to form an integrated trust system of "chain-contract-storage." The consensus mechanism ensures the credibility of the data sources, whereas smart contracts automatically enforce governance rules for traceability and credibility. A Multi-Valued modulo Function (MVF) robust task allocation algorithm and a Key/Transaction Merkle-Patricia Tree (KMPT/TMPT) dual-layer verifiable indexing mechanism are proposed to optimize task scheduling and index verification, thereby enhancing retrieval and verification efficiency. The system incorporates Merkle Hash Tree (MHT) integrity verification and a multi-source reputation weighting mechanism to achieve adaptive adjustment of the source reputation, thereby enhancing retrieval accuracy and system robustness. The system is deployed in a private cloud environment on an OpenStack and experimentally validated using 106 532 news data collected in 2024. Experimental results show that compared to traditional solutions, the credibility verification accuracy of this architecture is improved by 15.6% (P < 0.01), the tamper resistance detection success rate reaches 99.6%, and the false news suppression rate is 92.4%. Deep integration of the retrieval and verification processes enhances the comprehensive effectiveness of trusted retrieval by 22.3% (P < 0.05). This study verifies the feasibility and engineering effectiveness of blockchain in trusted retrieval and data governance, providing theoretical support and an engineering reference for building a highly transparent and traceable news ecosystem.

  • Large Language Models and Generative Artificial Intelligence
  • QU Jinghong, WANG Zhongqing, ZHOU Guodong
    Computer Engineering. 2026, 52(5): 326-335. https://doi.org/10.19678/j.issn.1000-3428.0070161
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Generative models show satisfactory performance in many question and answer reasoning tasks. However, significant manual effort is required for matching each data point with the corresponding relevant knowledge text to ensure the reliability of the model's output. If a language model can be sufficiently trained to internalize a knowledge base and reliably output question and answer knowledge, it can eliminate the cost of providing relevant knowledge explicitly in question and answer reasoning tasks. In addition, generating knowledge texts related to the answers can help explore which knowledge the model relies on for reasoning, which is crucial for investigating the interpretability of the model. For this purpose, this paper proposes a new natural language generation task. This task takes a question-answer pair as the input and requires the model to directly generate the relevant knowledge text. The generated text should support the reasoning behind the given answer, thereby helping the model consolidate its internal knowledge base during the training process. Benchmark models have been established for this new task. The results demonstrate the remarkable text generation quality of the model, confirming the feasibility of the task. When the statement forms of the question-answer pairs are also included in the input, the generation effect of the model can be significantly improved. A comparison of three generative models reveals that models with more parameters achieve superior generation performance, likely owing to their more comprehensive internal knowledge bases. Furthermore, experiments are conducted with different input fusion methods while varying the number of knowledge statements generated to identify the optimal task configuration. The results indicate that this task is feasible and valuable for future research.

  • YU Tao, DONG Jun
    Computer Engineering. 2026, 52(5): 336-348. https://doi.org/10.19678/j.issn.1000-3428.0070301
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    In multi-agent game simulations, the performance of Large Language Model (LLM) has been widely studied; however, their decision-making ability to guide multi-agent cooperation in fuzzy task objectives or uncertain environments is often unreliable. To address this issue, a multi-level collaborative decision-making framework based on distributed Bayesian inferences is proposed. This framework integrates three major functional modules: decision making, peer evaluation, and supervision. It utilizes multiple LLM for collaborative decision making and has been experimentally validated in a spatial prisoner's dilemma game. The experimental results show that the framework effectively overcomes the decision-making bottleneck of LLM in fuzzy task environments and successfully promotes the emergence of multi-agent cooperative behavior. Additionally, a quantitative evaluation of the model's decision-making ability in different experimental scenarios reveals that the decision error of the model is not linearly related to the model size. Under fuzzy task instructions, the decision error of the LLaMA3 (70×109) model is 16.6% higher than that of the LLaMA3 (8×109) model and 7.2% higher than that of the LLaMA2 (7×109) model. This indicates that in more complex environments, relying solely on the expansion of the model size does not significantly improve the decision-making performance. By contrast, LLM collaborative decision making has shown significant advantages in improving decision consistency and effectiveness. These results reveal the crucial role of multi-model collaboration in complex decision-making environments and provide important references for the future design of intelligent agent systems for uncertain tasks.

  • LI Jiakun, LIU Yanqing, DU Fang, YU Zhenhua, FENG Yu, WANG Hui, HUO Xianhao
    Computer Engineering. 2026, 52(5): 349-359. https://doi.org/10.19678/j.issn.1000-3428.0252472
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    To address the challenges faced by general-purpose medical Large Language Model (LLM) in the field of brain tumor care—namely the scarcity of domain-specific data, limited clinical adaptability, and insufficient accuracy of generated content. This paper proposes BrainTumorLLM, a specialized LLM tailored for brain tumor diagnosis and treatment. Built upon the Meta-LLaMA-3-8B-Instruct foundation model, BrainTumorLLM is optimized via Supervised Fine-Tuning (SFT) and Reinforcement Learning with Human Feedback (RLHF) and trained using a self-constructed, high-quality dataset named BrainTumorQA. This dataset comprises 11 000 question-answer pairs, encompassing both macro-level medical knowledge (symptoms, diagnostic methods, and treatment strategies) and micro-level clinical cases, with privacy safeguarded via anonymization and information constraint strategies. From a technical perspective, Low-Rank Adaptation (LoRA) is employed to enhance the training efficiency. A two-tier prompting framework is designed to guide the model in generating domain-specific responses at both the macro and micro levels. Furthermore, RLHF is integrated using an expert preference-driven optimization mechanism and a Proximal Policy Optimization (PPO) algorithm, reinforcing the clinical consistency of the generated content. The experimental results demonstrate that BrainTumorLLM significantly outperforms both general-purpose and medical-domain models in brain tumor-related question-answering tasks. In automatic evaluations, it achieves BLEU-1 and BLEU-2 scores of 0.338 3 and 0.268 4, respectively, and ROUGE-1, ROUGE-2, and ROUGE-L scores of 0.323 7, 0.146 6, and 0.261 1, respectively. Moreover, the perplexity of the model is substantially reduced from 20.362 (base model) to 7.674, highlighting its domain-specific precision, professional accuracy, and potential for clinical applications. BrainTumorLLM is a robust AI-powered tool that supports brain tumor diagnosis, treatment planning, and medical research.

  • LI Jiangtao, MA Li, LI Yang
    Computer Engineering. 2026, 52(5): 360-370. https://doi.org/10.19678/j.issn.1000-3428.0070408
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    In response to the difficulty of privacy protection in medical data owing to its wide coverage, large quantity, and diverse types and to effectively classify medical data reasonably and take corresponding privacy protection measures based on the classification results, this article proposes a fusion classification method for large and small models based on different levels of medical information sensitivity, achieving the goal of medical data classification encryption. A Large Language Model (LLM) deep neural network combined with Medical Data Classification Standards (MDCS) is used to annotate and output features from the medical dataset. Then, the output features of the LLM are used as inputs for the small-text classification model. The Long Short-Term Memory (LSTM) network of the small-text classification model is used to learn feature representations in the text. Finally, the erroneous prediction results of the small-text classification model are returned to the LLM for reclassification, and the classification results of the large and small models are fused to achieve an accurate classification of medical data according to different levels of sensitivity. The experimental results show that the fusion classification method for large and small models improves model convergence, classification accuracy, and data classification balance than those of other classification models and standards. This verifies that the iterative mechanism of large and small models fusion is highly compatible with the medical data scenario and can significantly improve the classification accuracy, achieve more efficient classification, and ensure the privacy protection of medical data.

  • Next-Generation Networks and Edge Computing
  • WANG Yi, QIN Tuanfa, WEI Rui, HUANG Jinbao
    Computer Engineering. 2026, 52(5): 371-382. https://doi.org/10.19678/j.issn.1000-3428.0070030
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Remote areas face problems such as insufficient cellular network coverage as well as low energy and computing power of Internet of Things (IoT) devices. Hence, the requirements for delay-sensitive task offloading and computing for a large number of tasks cannot be met. Considering the combination of the Space—Air—Ground Integrated Network (SAGIN) and Mobile Edge Computing (MEC), this paper proposes a strategy for dynamic task offloading and resource allocation for Unmanned Aerial Vehicle (UAV)-assisted IoT devices that support Wireless Power Transmission (WPT) technology, in which UAVs are responsible for collecting compute-intensive tasks generated by IoT devices. These tasks are locally calculated or dynamically unloaded to the base station and a Low Earth Orbit (LEO) satellite for further processing using a partial unloading mode, according to the current state. Given the dynamic heterogeneous network environment, as well as the tight coupling between long-term queuing delays and short-term decision-making, this paper proposes a Twin Delayed Deep Deterministic Policy Gradient (TD3PG) algorithm based on Lyapunov optimization under queuing delay constraints. The algorithm coordinates UAVs to learn the optimal offloading strategy and resource allocation by optimizing UAV dynamic association, task allocation, computing resource allocation, and bandwidth allocation. Simulation results show that, compared with other schemes, the proposed dynamic scheme can effectively reduce the energy consumption, network backlog sum, and average queue delay in the UAV network. Under different learning rate combinations, the reward of the TD3PG algorithm increases by 13.6% and 24.0% compared with that of the Deep Deterministic Policy Gradient (DDPG) algorithm, and by 20.4% and 17.9% compared with that of the Double Deep Q-Network (DDQN) algorithm.

  • SHEN Danyang, MAI Wen
    Computer Engineering. 2026, 52(5): 383-395. https://doi.org/10.19678/j.issn.1000-3428.0069677
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Automatic Modulation Recognition (AMR) is a crucial component in communication identification, situational awareness, and electronic reconnaissance. Deep neural networks, known for their powerful feature extraction and classification capabilities, offer higher recognition accuracy compared to traditional methods. However, current neural networks exhibit limitations in effectively extracting temporal information from signals, leading to high complexity and poor recognition accuracy under low Signal-to-Noise Ratio (SNR) conditions. To address these issues, this paper proposes a decision fusion recognition scheme based on a Residual Neural Network (ResNet) and Transformer network (ResNet-Transformer). This scheme aims to handle more complex SNR scenarios and improve the overall recognition accuracy. By leveraging the temporal memory characteristics of ResNet to deeply extract time-domain features from communication signals, and combining the outstanding long-distance dependency extraction capabilities of the Transformer network to enhance noise resistance, the proposed scheme employs a decision fusion strategy to obtain the final decision based on the outputs of each branch. Experimental results show that the proposed scheme achieves an average recognition accuracy of over 93% for SNRs above 10 dB and maintains a recognition accuracy of 56% even at an SNR of 0 on the open dataset RML2018.01A. Compared to traditional network models, the proposed scheme achieves a higher modulation recognition accuracy and exhibits a high noise resistance.

  • YIN Chao, SHI Xuhua
    Computer Engineering. 2026, 52(5): 396-403. https://doi.org/10.19678/j.issn.1000-3428.0069472
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Workflow is a commonly adopted execution paradigm in cloud computing environments. Reliability is a crucial Quality of Service (QoS) metric in the process of executing cloud workflow tasks. Currently, methods that can simultaneously satisfy the reliability requirements of workflow computation while optimizing both time and cost are scarce. Neural network-based algorithms require substantial time to search for optimized parameter models when handling large-scale workflows, and the decomposition strategies of existing reliability-based algorithms require further improvement. To address these issues, this paper proposes a reliability decomposition-based fault-tolerant scheduling method. This heuristic method consists of the following steps: calculating task-scheduling priorities, determining reliability allocation weights, performing an initial decomposition of the overall reliability requirement, and selecting Virtual Machines (VMs) for task replicas. The core of this method lies in the optimization of two strategies, namely reliability decomposition and VM selection. The reliability decomposition strategy is designed based on the computational size of workflow tasks and their predecessor-successor dependencies, while the VM selection strategy operates based on a weighted function that balances relative task completion time and execution cost. Experiments are conducted using various workflow types, scales, and reliability requirements. The results indicate that the proposed method satisfies the specified reliability requirements. Moreover, it demonstrates superior comprehensive performance in balancing completion time and cost, outperforming three baseline algorithms: QFEC, QEEC+, and C_GM. This paper provides new solutions and insights for research on reliability decomposition and fault-tolerant scheduling in cloud workflow execution.

  • LIN Hai, WANG Heyu, CAO Yue, WANG Liyuan, WANG Shijie
    Computer Engineering. 2026, 52(5): 404-417. https://doi.org/10.19678/j.issn.1000-3428.0070165
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Edge intelligence faces challenges such as real-time computation, limited resources, and device variations. Typically, models are compressed to create lightweight networks for fast inference in edge environments. However, excessive compression reduces accuracy and does not always shorten the inference time, thereby affecting the performance of edge intelligence. To address these issues, this paper proposes a hardware-aware edge intelligence framework called LuffyNet. The framework uses a lookup table to estimate the inference performance. It applies constraints on the computational latency and device memory to make the model hardware-aware. LuffyNet aims to create high-accuracy networks that fit edge devices while satisfying latency limits. The framework optimizes the model accuracy, inference latency, and network size through gradient descent. To reduce the search time, LuffyNet uses best optimization and worst optimization strategies. This approach reduces unnecessary computation and saves time and resources. Comparison experiments evaluate LuffyNet networks against four advanced models. LuffyNet-A achieves 66.50% Top-1 accuracy with a 1.69 ms delay, approximately five times faster than ResNet50, and is only 6.58 MB in size. LuffyNet-B and LuffyNet-C exceed 73% Top-1 accuracy with a 2.65 ms delay. They outperform ResNet18, ResNet50, DenseNet121, and DenseNet169 in terms of accuracy and speed. Ablation experiments confirm that networks formed using the LuffyNet framework are not only suitable for edge devices but also reduce the search time by approximately 25%.

  • Interdisciplinary Integration and Engineering Applications
  • LI Zhongwei, WANG Penghao, LUO Cai
    Computer Engineering. 2026, 52(5): 418-429. https://doi.org/10.19678/j.issn.1000-3428.0070300
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    To address the issue of unmanned vehicles being unable to efficiently reach target points in muddy and rugged terrains such as tidal flats, an improved algorithm based on the A* algorithm, Tidal-A* (TA*), is proposed to plan optimal paths for unmanned vehicles. Considering the characteristics of tidal flat environments, the quality of the generated paths is jointly evaluated using Soil Moisture Content (SMC) along the path, path height fluctuation, and path length. To address the difficulty of directly obtaining environmental information, a drone equipped with a hyperspectral sensor and LiDAR is used to scan the target area. A dimensionality reduction method combining spectral preprocessing and the Pearson correlation coefficient is proposed to train the SMC inversion model. In response to the limitations of the traditional A* algorithm, which only searches for paths based on path length, a cost function that integrates multiple constraints is proposed based on the design of the cost functions for three individual constraints. To address the issue that the traditional A* algorithm cannot change the path according to requirements, a coefficient combination is designed to control the proportion of each constraint in the cost function while solving the problem of inconsistent orders of magnitude between different constraints. To address the potential issue that the traditional A* algorithm may overlook better solutions, the calculation range of the heuristic function is improved, allowing the algorithm to trade off-path redundancy for the optimization of other constraints. The simulation results show that when using this algorithm to train the model, the determination coefficient R2 is 0.784, and the Ratio of Standard Deviation (RPD) is 2.151, which are 38% and 33.8% higher, respectively, than those of the direct inversion methods. Compared to those generated by traditional algorithms, the length, SMC value, and height fluctuation of the paths generated by the TA* algorithm are reduced by 3.4%, 5.1%, and 18.7%, respectively.

  • HUO Jiuyuan, LI Xin, CHANG Chen, ZHANG Yaonan
    Computer Engineering. 2026, 52(5): 430-444. https://doi.org/10.19678/j.issn.1000-3428.0070297
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Rolling bearings are components commonly used in mechanical equipment. Traditional methods struggle to classify signals with numerous complex features in a multi-noise environment. They often rely on classical deep learning models for performing fault diagnosis using one-dimensional data, failing to fully extract complex features. To address this issue, this paper proposes a dual-channel fault diagnosis method based on the ACNN-LFSwin Transformer, which performs fault diagnosis on both one-dimensional data and two-dimensional images. First, the original signal is processed using Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) and Short-Time Fourier Transform (STFT) to obtain Intrinsic Mode Functions (IMF) and two-dimensional images. Subsequently, in channel 1, the CEEMDAN-decomposed IMF are fed into an Attention-based Convolutional Neural Network (ACNN) for feature extraction. In channel 2, the two-dimensional images composed of bearing data are input into a Swin Transformer network (LFSwin Transformer) for local feature extraction. Finally, the features from both channels are concatenated and fused for fault diagnosis. ACNN employs an attention mechanism to automatically allocate weights to signal features, thereby emphasizing key features. The LFSwin Transformer performs vector conversion based on the traditional Swin Transformer, converts the input vector into an image, and performs convolution operations, making the model more advantageous in extracting local fault features. In experiments on the CWRU and Paderborn datasets, the proposed method achieves a fault diagnosis accuracy of over 97%. This result shows that it can accurately diagnose various faults and effectively avoid interference from complex noise.

  • ZHANG Penghe, YANG Yining, WANG Bicheng, YI Yunqi, TANG Zhongrui, LIU Min
    Computer Engineering. 2026, 52(5): 445-455. https://doi.org/10.19678/j.issn.1000-3428.0070261
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Currently, the maintenance and anomaly detection of user-side smart meters primarily rely on professionals visiting the site, leading to low inspection efficiency, significant periodic testing burdens, and dependence on manual experience. A dataset of abnormal electricity meter images is created based on the inspection images obtained from a power grid company. This paper introduces a novel anomaly detection method for electricity meters that utilizes Diversity-Driven Differentiable Automatic Data Augmentation (D-DADA) algorithm and the Dual-Branch Feature Enhancement YOLO (DBE-YOLO) network to address issues such as complex backgrounds, varying target sizes, and obscured wiring in meter images. First, the DBE-YOLO model is designed to enhance the extraction of global contextual information and multiscale features by introducing cascaded dilated convolutions. It also employs a dual-branch aggregation network to overcome the limitations of the original model, including a restricted receptive field and fixed convolutional feature capture patterns. Second, the D-DADA algorithm is introduced, featuring a search strategy with diversity constraints to enhance the automatic discovery of a wider array of data augmentation strategies. This enables the model to learn the detection target features and patterns under various scenarios, angles, and lighting conditions, addressing the issue of insufficient model recognition performance owing to large intraclass variations. The experimental results indicate that the improved YOLOv8 model achieves an average detection accuracy of 79.6% across eight types of electricity meter anomalies, representing a 3.4 percentage point increase compared with the previous version.

  • ZHANG Hong, ZHU Siyu, ZHANG Xijun, WEI Jiaoyun
    Computer Engineering. 2026, 52(5): 456-466. https://doi.org/10.19678/j.issn.1000-3428.0070124
    Abstract ( ) Download PDF ( ) HTML ( )   Knowledge map   Save

    Owing to the non-stationary nature of traffic flow, extracting its dynamic spatial-temporal features is challenging. The non-smooth characteristics of traffic flow causes dynamic changes in different traffic modes and within different neighborhoods. To address this issue, this study proposes a Meta-graph learning traffic flow prediction model based on Adaptive Graph Convolution (Meta-AGC). Specifically, a method is designed to adaptively capture the spatial correlation between nodes in different traffic modes and the dynamic changes in traffic flow within different neighborhoods. The method pattern-matches the spatial-temporal features captured by AGC with a meta-node library in meta-graph learning, which enables the spatial-temporal meta-graph generated based on the meta-node library to adaptively represent the spatial correlations among nodes in different traffic modes. AGC consists of a set of graph wavelets with different learnable scales and a context attention mechanism to dynamically adjust the convolution receptivity field according to the input traffic flow information at any time. Consequently, the limitation of fixed acceptance domain in traditional convolution is overcome, and traffic flow variations within different neighborhoods triggered by random events are captured efficiently. Experimental results demonstrate that Meta-AGC enhances the prediction accuracy by 5.2% and 4.2% compared with the superior baseline model at the 6-step and 12-step prediction intervals, respectively. Additionally, the findings substantiate the assertion that Meta-AGC is more effective in modeling the non-stationarity of traffic flow and improving prediction accuracy.