Author Login Chief Editor Login Reviewer Login Editor Login Remote Office
Home Browse Just accepted

Just accepted

Please wait a minute...
  • Select all
    |
  • HUO Ziyue, WU Youxi, GENG Meng, LIU Jingyu, LI Yan
    Accepted: 2025-07-16
    Sequence pattern mining aims to extract frequent ordered subsequences from data. However, sequence patterns do not directly represent causal relationships, as the occurrence of earlier events may not necessarily be the triggers for subsequent ones. Causal inference can be used to discover causal patterns in sequential databases, but existing causal discovery methods for sequential data predominantly rely on expert-defined priors, which limits their applicability to knowledge-scarce scenarios. To address this issue, this paper proposes an algorithm, CP (Causal Pattern Mining, CP), based on mining both positive and negative sequence patterns. CP adopts a pattern join strategy to reduce the number of candidate positive sequence patterns. CP uses an intersection-based method to efficiently calculate the occurrence list of candidate negative sequential patterns. Additionally, CP introduces a matching sequence pair algorithm to improve the credibility of the results. Experimental results show that CP improves running time by 11.8%, 60.336%, 55.501%, 25.737%, and 84.252% compared with CP-a, CP-b, CP-d, CP-e, and CP-m, respectively. It also reduces the number of candidate causal patterns by 56.057% compared with CP-m, and the number of candidate positive patterns by 66.415% compared with CP-b and CP-d. Moreover, CP achieves approximately a 50% improvement in F1-score over NOTEARS. Unlike PC, which can only mine single-variable causal patterns, CP is capable of mining combinatorial causal patterns. These results demonstrate that CP outperforms other algorithms.
  • Yan Qihong, Yang Wenjun
    Accepted: 2025-07-16
    Currently, encrypted traffic classification has attracted significant research attention. However, many existing methods extract only flow-level features, which are often unreliable for short flows due to the instability of statistical characteristics. Moreover, they tend to treat packet headers and payloads equally, failing to explore the potential correlations between individual bytes. In addition, convolutional neural networks (CNNs) and recurrent neural networks (RNNs) struggle to capture the discriminative information embedded in raw bytes. To address these challenges, we propose a fine-grained encrypted traffic classification model(ETC-SAT), which integrates GraphSAGE and Graph Attention Networks (GAT). This method constructs byte-level traffic graphs based on pointwise mutual information (PMI). Specifically, a dual embedding layer is designed to embed the byte-level traffic graphs into the graph model. Furthermore, we develop a traffic graph encoder that combines GraphSAGE and GAT: GraphSAGE provides stable neighbor feature aggregation, while GAT introduces an attention mechanism to enhance selective aggregation, thereby improving the quality of graph feature extraction. An adaptive deep feature fusion mechanism is then employed to integrate information from separately processed packet headers and payloads, resulting in stronger feature representations. Experimental results on two public datasets—ISCX-VPN2016 and ISCX-Tor2016—demonstrate that the ETC-SAT model can effectively identify types of encrypted traffic and significantly outperforms baseline methods in terms of classification performance.
  • LI Yanqing, ZHU Hongqing
    Accepted: 2025-07-15
    Magnetic resonance imaging serves as an important tool for clinical diagnosis and lesion detection. Convolutional neural network-based magnetic resonance image reconstruction has achieved significant progress in both speed and accuracy. However, existing models mostly depend on single-domain feature extraction and remain limited to local receptive fields. This leads to slightly lower reconstruction quality for complex anatomical structures. To address these issues, this paper proposes an improved hybrid attention-enhanced dual-domain multimodal reconstruction network (AMC-Net) based on a parallel framework. The network constructs the MMFF multimodal feature fusion module. This module matches and fuses shared features between multimodal inputs. It supplements missing structural features in the initial input and reduces artifactual interference. The method builds a two-branch PIRN reconstruction subnetwork based on the iterative shrinkage thresholding algorithm. The subnetwork employs multi-layer iteration and dual-domain information interaction for progressive reconstruction. It introduces an improved attention-based convolution mechanism to achieve global feature learning. Additionally, the approach integrates synergistic channel-space attention and self-attention mechanisms. These mechanisms refine reconstruction results by enhancing detailed features and suppressing artifacts. They improve the restoration quality of tiny structures and enhance overall visual clarity. AMC-Net outperforms mainstream methods on static brain MRI reconstruction using IXI and BraTS2018 datasets, adapting well to various sampling conditions. With 5× random sampling, it achieves an average PSNR of 42.66 and SSIM above 0.97, delivering clear and detailed images.
  • WANG Pingye, MA Lihuan, SUN Zhuo, GAO Yuan, HUI Ying
    Accepted: 2025-07-14
    The collaborative delivery problem involving multiple vehicles and multiple drones, which is closely aligned with real-world scenarios, has become a key challenge in path optimization due to its complexity. This paper proposes a collaborative delivery problem that considers multiple drones carried by each truck, the possibility of multiple deliveries within a single flight range of drones, and customer time window constraints. A mixed-integer programming model is established with the objective of minimizing total delivery cost. To solve this problem, a hybrid heuristic algorithm combining Adaptive Large Neighborhood Search (ALNS) and Simulated Annealing (SA) is designed. The double-chain encoding structure is proposed to separate the delivery sequence of customers and delivery tools, which not only visually displays the collaborative delivery paths of trucks and drones but also facilitates the quick generation of new solutions through destruction and repair operators. Numerical analysis validates the effectiveness of the algorithm and the adaptability of the double-chain encoding structure to this complex scenario. The results show that the algorithm generates high-quality solutions quickly for small, medium, and large-scale instances. Specifically, the computational time for small-scale instances is reduced by an average of 94.34% compared to the Gurobi solver, and the total cost for large-scale instances is reduced by an average of 9.11% compared to the Variable Neighborhood Search (VNS) algorithm. Furthermore, sensitivity analysis indicates that the number of drones carried by trucks and the cost of drones are key factors affecting both total delivery costs and the formulation of path plans, providing a theoretical basis for enterprises to design and optimize collaborative distribution paths.
  • Bo Kaibin , Li Yewen , Zhang Zhonghai, Tan Guangming
    Accepted: 2025-07-14
    Basecalling, the process of converting raw electrical signals into DNA sequences in nanopore sequencing, is a critical step that directly impacts the timeliness of genomic analysis. Addressing the limitations of existing basecalling tools in computational acceleration, hardware adaptation, and system-level optimization, this study proposes and implements three innovative optimization strategies that significantly enhance computational performance and enable deployment on domestic hardware platforms. The main contributions of this study are as follows: First, we developed OpenKoi, a high-performance acceleration library based on a heterogeneous computing architecture. By performing operator-level optimizations, we achieved algorithmic breakthroughs. For key operators such as LSTM and Conditional Random Fields (CRFs), we introduced a novel matrix concatenation strategy and a parallel execution scheme, reducing the number of GEMM operations required for each LSTM step from eight to one. We also implemented a block-level parallel beam search algorithm. Second, we proposed a heterogeneous pipeline architecture that overcomes traditional I/O bottlenecks by enabling three-stage parallelism: data loading, GPU computation, and result output. This architecture demonstrated linear scalability on the DCU platform. Third, we developed DCUCaller, the first basecalling system compatible with domestic DCU (Dawning Computing Unit) hardware. Its innovation lies in the co-optimization of hardware adaptation and quantization techniques. Leveraging the HIP programming framework for cross-platform compatibility, DCUCaller integrates the OpenKoi library and the heterogeneous pipeline framework to optimize basecalling throughput. Through innovations in algorithm design, system architecture, and hardware ecosystem integration, this study not only significantly improves the efficiency of basecalling but also provides critical technical support for the large-scale application of domestic computing platforms in bioinformatics. It holds strategic significance for promoting the independent development of genome sequencing technologies.
  • ZHAO Yun Kang, XU Ming
    Accepted: 2025-07-14
    To address challenges of data confidentiality, dynamic authorization, and efficient retrieval in Underwater Acoustic Sensor Networks (UASNs), a searchable authorized keyword encryption scheme with dual-layer security is proposed. At the data layer, leveraging quantum resistance of the Small Integer Solution (SIS) problem and Identity-Based Encryption (IBE), an anti-quantum public-key encryption mechanism using lattice trapdoor algorithms is developed. At the authorization layer, a time-constrained discrete Gaussian token distribution protocol generates lightweight authorization signatures through lattice-based rejection sampling, allowing authorities to dynamically assign and update fine-grained keyword search permissions. Authorized nodes utilize generated authorization trapdoors for precise data retrieval. Lattice basis expansion algorithms enhance computational efficiency, while integration with IBE simplifies public key management, aligning the scheme with UASNs' low bandwidth and limited computing resources. The proposed scheme ensures quantum security, reduces communication overhead, and rigorous analyses confirm its IND-sID-CKA and T-EUF security properties, fulfilling the demands of underwater acoustic communications.
  • ZHAI Sheping, KANG Chaoyue, YANG Rui, CAO Shilong
    Accepted: 2025-07-14
    The Practical Byzantine Fault Tolerance (PBFT) algorithm faces challenges such as high communication complexity, incomplete node management, and the lack of dynamic behavior evaluation, which limit its performance and security in large-scale blockchain systems. To address these issues, a reputation-based grouped consensus algorithm is proposed. According to a multi-dimensional reputation assessment, nodes are divided into excellent, good, and observer classes. Only excellent and good nodes participate in consensus, where leaders are primarily selected from the excellent class. Nodes with insufficient reputation but no malicious records act as observers, limited to ledger synchronization. Nodes identified with malicious actions are isolated from the consensus and synchronization process to improve system security and robustness.To reduce communication overhead, the algorithm integrates Boneh–Lynn–Shacham (BLS) aggregate signature technology, which compresses multiple signatures into a single fixed-length signature. This reduces the size of transmitted data and eases intra-group and inter-group message broadcasting. A dynamic node management mechanism is also designed to allow flexible node entry and exit, enhancing adaptability and fault tolerance.Experimental results show that compared with PBFT, DT-PBFT, and NBR-PBFT, the proposed algorithm reduces consensus latency by approximately 45.3%, 29.3%, and 17.4%, and improves throughput by around 17.4%, 10.6%, and 4.5%, respectively. These improvements demonstrate better scalability and communication efficiency in consortium blockchain environments.
  • Xie Xingang , Lu Zhaoxuan, Liang Jingkun
    Accepted: 2025-07-14
    In recent years, climate change and ocean pollution have led to the degradation of coral reefs, making automatic coral detection an urgent need for monitoring marine ecosystems. The low image contrast, complex coral shapes, and dense growth in underwater coral detection tasks limit the performance of general detection algorithms. To address these problems, a soft coral detection model based on the YOLO architecture, named CoralDet, has been proposed. Firstly, a multi-path fusion module (MPFB) is designed to capture coral features at multiple scales, which improves the robustness of the model against uneven underwater lighting and image blurring. Additionally, reparameterization is used to enhance inference efficiency. Secondly, introducing GSConv and VoV GSCSP lightweight design components can reduce computational costs without sacrificing performance. An Adaptive Power Transformation label assignment strategy was introduced to dynamically adjust anchor point matching metrics, and soft labels and soft center region loss were used to focus the model on high-quality, aligned and accurate predictions. Finally, CoralDet was evaluated on the Soft Coral dataset with an inference delay of only 9.52 milliseconds and an mAP50 of 81.9, surpassing YOLOv5 (79.9), YOLOv6 (79.4), YOLOv8 (79.5), YOLOv9 (78.3), YOLOv10 (79.5), MambaYOLO (80.1), and RT-DETR (81.6). Experiments were conducted on the Coral-lwptl dataset, and CoralDet outperformed traditional models such as MambaYOLO, YOLOv8, and YOLOv10 in multiple key indicators. The results have demonstrated the effectiveness and practicality of CoralDet in underwater coral detection.
  • Xu Shengxuan, Xu Lei , Fei Yifan
    Accepted: 2025-07-08
    This paper focuses on the civil engineering field. By leveraging machine vision technology to identify images of collected concrete products, it can rapidly, accurately, and non - destructively assess their performance, which is of great significance for engineering applications. Currently, traditional manual inspection methods are inefficient and highly subjective. Meanwhile, existing image recognition technologies face challenges such as uneven lighting, background noise interference, diverse crack shapes, and blurry boundaries in dynamic images. There is an urgent need to construct intelligent solutions that can adapt to complex engineering scenarios. Through a systematic review of relevant literature, the research focuses on the assessment of two types of concrete, namely the identification of cracks and appearance defects in static hardened concrete, and the evaluation of the flowability of dynamic fresh concrete. Firstly, from the perspectives of traditional digital image technology and neural networks, it reviews the research progress of existing studies on crack identification, appearance quality discrimination, and flowability assessment under different scenarios and shooting subjects. Then, it summarizes and compares the advantages and disadvantages of different algorithms in the steps of preprocessing, image segmentation, and feature extraction in the existing processing procedures, as well as their application scenarios. Finally, through comparative analysis, a set of recommended image recognition processing procedures and solutions for judging the appearance quality and flowability of concrete products is proposed. This provides algorithmic ideas for the intelligent recognition and assessment of structural concrete performance, promoting the application of visual technology in the civil engineering field.
  • HE Shiyu, SUN Weihong, SHAO Tiefeng, LIANG Man
    Accepted: 2025-07-08
    In this study, to address the issues of large model size, high computational cost, and slow detection speed in existing internal pollution cocoon detection models, an internal pollution cocoon detection method based on LEF-YOLOv7 (Lightweight and Enhanced Features-YOLOv7) was proposed. Following the principle of the traditional light-based cocoon selection method, it detects targets by leveraging image differences caused by varying transmittance between polluted and reelable cocoons. Firstly, GhostNet was used to reconstruct the feature extraction network to reduce computational costs and memory consumption. Secondly, the feature fusion network is simplified, and a 3*3 convolution with a step size of 2 was employed for downsampling, which further reduces the amount of model computation and memory access. Thirdly, the convolutional attention mechanism was introduced to enhance the feature extraction ability of the model and weaken the background interference. Finally, the DIoU (Distance-IoU) loss function was used to reduce the impact of the prediction box loss calculation on the model performance. Experimental results show that the log-average miss rate(LAMR) of the LEF-YOLOv7 model is 0.01, which is reduced by 0.05 compared with the original model. The mean average precision(mAP) is 99.58%, while floating-point operations(FLOPs) and parameters are reduced by 91.84% and 83.13%. The model size is reduced by about 117.7 MB, and the detection speed reaches 36.69 FPS, which is about 3.3 times faster than the original model, which meets the requirements of lightweighting. This method can reduce the complexity of the model while maintaining good performance, which can provide technical support for internal pollution cocoon detection in production-oriented enterprises.
  • Deng Yuhui, DengYueming, He Xin
    Accepted: 2025-07-08
    UAV low-altitude aerial photography technology has been widely used in a variety of fields, such as data acquisition, security monitoring, and terrain mapping, and has significantly improved operational efficiency. However, since aerial images usually cover a wide area, the detection targets are often small in size, dense and unevenly distributed, which limits the accuracy of target detection. To address the above problems, a target detection algorithm TOD-YOLO is proposed based on an improved YOLOv8 network. First, an RFAConv downsampling module is introduced in the backbone network part, which provides effective attention to the convolutional kernel without increasing the number of parameters, allowing the network to focus on smaller targets in the feature extraction phase. Secondly, the feature fusion module CSPOKM and a new feature fusion path are proposed to increase the focus on small targets and improve the performance of the feature fusion network. Then, based on the depth separable convolution DWConv and EMA attention mechanism, the lightweight attention detection head DWA-Head is designed to reduce the number of parameters while improving the accuracy of detection. Finally, in order to solve the problem that CIoU is easy to misdetect and miss detection in complex scenes with high density of small targets, DA-MPDIoU, a loss function that can dynamically adjust the positive and negative sample coefficients, is designed to assign higher weights to small target samples that are easy to miss and difficult to classify, and to optimize the training results. Experimental results show that compared with the original YOLOv8 algorithm, the improved algorithm improves mAP@0.5 and mAP@0.5:0.95 on the VisDrone2019 dataset by 9.6 percentage points and 6.8 percentage points respectively, and further generalization experiments are carried out on the DOTA dataset. The experimental results show that the algorithm in this paper exhibits significant advantages and potentials in small target detection tasks.
  • Zhikang Li, Yu Jin
    Accepted: 2025-07-04
    In cloud computing, frequent data updates and migrations by data owners (DO) raise data management complexity and challenge traditional cloud data auditing. Traditional third-party auditor (TPA) based dynamic auditing has centralization and single-point failure issues. Although blockchain technology is adopted to replace TPA, current blockchain-based schemes suffer from high computational cost, low dynamic efficiency, and heavy DO auditing burden. Thus, a Dynamic Cloud Data Auditing Scheme empowered by Side Information and Consortium Blockchain Smart Contracts(DCASIC) is proposed. DCASIC decouples auditing metadata from data block indexes via side information auditing and correlates them with a homomorphic hash function, enhancing dynamic auditing efficiency. Smart contract parallel execution and pre-computed verification information reduce DO auditing time. Theoretical and experimental results show DCASIC significantly cuts computational cost, boosts dynamic auditing efficiency, and reduces DO time cost during auditing compared with existing blockchain-based schemes.
  • Wang Yuanyuan, Cao Hui, Wang Tingwei
    Accepted: 2025-07-04
    Skeleton based motion recognition method is attracting more and more attention because of its excellent performance. In skeleton action recognition task, coarse-grained feature is an important supplement to fine-grained feature, which can effectively improve the performance of action recognition method. However, the existing multi granularity skeleton action recognition methods have shortcomings, first, the constructed coarse-grained features do not accurately retain the structural information between local adjacent fine-grained joint points; second, they do not make good use of the global correlation between coarse-grained features for feature learning. To solve the above problems, when constructing coarse-grained joint points, the arithmetic mean and classical convolution operations are used to capture the position and structure information of local adjacent fine-grained joint points; The cross attention mechanism is used to capture the global correlation between coarse-grained and fine-grained features, which can better describe the part level movement trend and improve the representation ability and discrimination of coarse-grained features. This method is combined with a variety of skeleton motion recognition models, and experiments are carried out under multiple evaluation standards of NTU RGB+D and NTU RGB+D 120 motion recognition data sets. Experimental results show that the proposed method can extract and fuse skeleton motion features with different granularity, and significantly improve the classification performance of human skeleton motion recognition method.
  • LIANG Ziyi, LIU Tianquan, LI Liping, ZHU Yuanfei, LU Cunyue
    Accepted: 2025-07-03
    Detection and identification of aquatic algae is an important task in ecological protection. However, in practical applications, traditional object detection models struggle to meet the real-time and efficiency requirements due to the limited hardware resources of on-site water quality detection equipment, as well as the computational complexity and resource demands. At the same time, lightweight models often face challenges in achieving sufficient accuracy when dealing with issues such as imbalanced sample distribution, severe target occlusion, significant scale differences, and complex backgrounds. To address these challenges, this paper proposes an improved EfficientDet object detection model aimed at effectively improving algae detection performance under limited computational resources. To tackle the problem of insufficient rare algae samples, data augmentation techniques are employed to enhance the model's generalization ability. For algae species with similar features, a CBAM (Convolutional Block Attention Module) attention module is introduced into the backbone network to enhance feature mapping between different algae species. In the feature fusion stage, a BiFPN (Bidirectional Feature Pyramid Network) module based on a hybrid attention mechanism is used to more accurately capture the semantic information of algae in complex backgrounds. Experimental results show that the improved EfficientDet model achieves an average precision (mAP) of 74.2% on the test set, which is a 3.4 percentage point improvement over the original EfficientDet model, with a floating point computation of 21.188 GFLOPS, an energy consumption of only 4.3W, and a model size of 31.4MB, which is just a 0.1MB increase compared to the original model. Compared to YOLOv5s, RetinaNet, Faster R-CNN, SSD, and other mainstream lightweight models such as YOLOv8 and YOLO-WORLD, the average precision (mAP) improved by 7.6, 1.7, 0.7, 4.0, 1.9, and 2.5 percentage points, respectively. Ablation experiments further validate the contribution of each module to the performance improvement and their collaborative optimization effects, providing an efficient and lightweight solution for applications such as water quality monitoring and ecological protection.
  • Shuai Feng , Jian Gao
    Accepted: 2025-07-03
    n the field of computer security, malicious code protection has always been an important research topic. With the rapid development of computer technology, the types and forms of malicious code are constantly evolving. Traditional feature engineering methods have a single feature dimension when dealing with complex malicious samples, resulting in insufficient representation ability and the inability to accurately identify various types of malicious code. Other malicious code classification methods based on feature fusion rely on expert experience to manually design features during the feature extraction process. Moreover, multimodal deep learning models have insufficient interpretability and high computational costs.To address these issues, this paper proposes an innovative feature fusion method, which is applied to the classification of malicious code in Windows PE files. By integrating behavioral features, structural features, and texture features, and using LightGBM as the classifier, the classification of malicious code is completed. The experimental results show that the proposedmethod achieves a test accuracy of 99.90% and a log loss (Logloss) of 0.0057 on the Microsoft Malware Classification Challenge dataset, and a test accuracy of 98.97% and a log loss of 0.042 on the Bazaar dataset.The experimental results demonstrate that this method can comprehensively and accurately represent malicious code, and it has important theoretical significance and practical application value. By fusing multi-dimensional features, this method provides an effective solution for malicious code detection and has broad application prospects.
  • Shang Wen, You Jinguo, Wu Kang, Fu Wantin, Li Xiaowu, Jia Lianyin
    Accepted: 2025-07-03
    In distributed computing frameworks, inefficient data transfer in the Shuffle phase has become a key bottleneck in data connectivity. Existing methods have certain limitations in dealing with table joins, such as broadcast joins and hash joins in Spark are both susceptible to data skewing, which makes the load between nodes unbalanced. Aiming at this problem, the paper focuses on joining aggregated queries, and proposes a table joining method based on lattice structure: by precomputing the storage table partition data in the form of lattice structure, and utilizing the convex set property of equivalence class, i.e., the data cells containing the upper bound of equivalence class and contained by the lower bound of equivalence class, whose aggregation values are equal to the aggregation values of equivalence class, so as to realize the quick matching and Calculation. Since the query data cells as a compressed form of basic table data, the data size and skew are more concise and uniform, the article uses the query data cells instead of table data to perform data transfer and connection, which greatly reduces the data Shuffle and computational complexity. The method proposed in the paper has been implemented in Spark, and experiments based on the TPC-H dataset show that: the method of the paper reduces the data Shuffle by about 45.06% in large dataset scenarios, meanwhile, the workload among the nodes is more balanced compared to the benchmark method, and the query response time is shortened by 14.23% on average.
  • Yuanhao Li, Fangli Ying
    Accepted: 2025-07-01
    Learning disentangled representations to enhance the controllability of image generation models is a key research direction in computer vision. However, existing methods face two major limitations: reliance on large-scale annotated data and difficulty in handling complex dependencies between features. To address these issues, this study proposes a universal generative disentangling method based on the Hilbert-Schmidt Independence Criterion (HSIC). This method innovatively converts HSIC into an independence regularization mechanism for the latent space of generative models. By incorporating HSIC regularization terms, it optimizes the measurement objective of nonlinear dependency relationships and guides the model to learn independent feature representations. Specifically, the study integrates HSIC into two mainstream generative model architectures: For the Variational Autoencoders (VAEs) class, it combines variational inference with HSIC regularization to optimize latent distribution disentanglement; For the Diffusion Models (DMs) class, it gradually achieves progressive feature disentangling by embedding the HSIC regularization term into the time step optimization of the reverse process. The experimental results show that this universal method, which can be implemented in different model architectures, enhances latent representation independence and maintains stable performance in unsupervised settings, offering a new way to model complex feature dependencies. To further verify the semantic consistency of the disentangling space, this study conducted latent space interpolation experiments to generate smoother trajectories, demonstrating that HSIC regularization constructs a linearly separable disentangling space. In terms of evaluation system, this study conducted dual validation using standard disentangling metrics and HSIC-based custom metrics, showing a positive correlation and confirming the objectivity of the disentangling evaluation criteria.
  • MENG Hui, ZHANG Luhui, YAN Xixi, TANG Yongli
    Accepted: 2025-06-30
    Currently, most Unique Ring Signature (URS) schemes are based on the discrete logarithm problem. Among them, only the URS scheme proposed by Nguyen and Junhui Wang satisfies post-quantum security requirements. However, each of these schemes has its limitations: Nguyen's scheme utilizes zero-knowledge proofs, which results in significant computational resource consumption, while Junhui Wang's scheme leads to longer key lengths due to the design of the lattice-based structure, increasing storage overhead. In addition, both schemes rely on digital certificates to manage public keys, requiring the storage and management of a large number of certificate files, further increases the storage and management costs of the system. To address these challenges, this paper proposes an efficient identity-based URS scheme over the NTRU lattice. First, by leveraging the relatively short public and private key lengths of the NTRU lattice cryptosystem, the scheme reduces key storage overhead. Second, a compact Gaussian sampling algorithm is employed to generate user private keys, thereby improving key generation efficiency. Finally, an identity-based mechanism is introduced to construct public keys, eliminating the reliance on digital certificates. Security analysis demonstrates that, under the Random Oracle Model (ROM), the proposed scheme achieves unconditional anonymity, unforgeability, and uniqueness, with its security reducible to the small integer solution problem on the NTRU lattice. Performance analysis shows that, compared to existing lattice-based URS schemes, this approach offers shorter public key lengths and lower computational overhead, with average reductions of about 15% and 13% in signature generation and verification time, respectively.
  • LI Dongfeng , CHEN Yuren , YU Bo
    Accepted: 2025-06-25
    In existing pavement crack detection methods based on U-Net, the interaction between features at different levels of the encoder has not been fully considered, which may lead to incomplete crack detections or missed crack detections due to information loss during down-sampling. Therefore, this paper proposes a pavement crack detection method based on multi-level feature fusion. First, in the encoding stage, crack features at different levels are extracted from the input image, forming crack feature representations from shallow to deep levels. Second, in the skip connection, the cross-level fusion strategy based on improved channel-wise cross fusion Transformer is employed, which enhances the complementarity between features at different levels and enriches the representations of crack features. Finally, in the decoding stage, the feature cross fusion module is used to optimize how the decoder utilizes the encoder's features, promoting the transmission of crack features and improving the perception capability for crack features. To verify the effectiveness of the proposed method, a series of comparative experiments and ablation experiments were conducted on the two public datasets of DeepCrack and CRACK500. The experimental results show that the comprehensive performance of the proposed method is better than the six comparison methods including DeepCrack and Swin-UNet. Specifically, on the DeepCrack dataset, the F1 score increased by 2.30% and 2.51% respectively, while on the CRACK500 dataset, it increased by 1.65% and 1.00% respectively.
  • TAN Zhong-Xia, LIU Qi-Kun, JIANG Cui-Ling, WAN Yong-Jing
    Accepted: 2025-06-20
    Due to the prolonged examination time and limited therapeutic time window for stroke, the development of a rapid and highly accurate medical image segmentation model for stroke is of significant importance for clinical diagnosis. The U-Net architecture based on Mamba, known for its low complexity and capability to handle large-scale images, has garnered widespread attention in the field of medical image processing in recent years. The fractional Fourier transform can convert signals into arbitrary fractional domains between the spatial and frequency domains, allowing the observation of features that are not prominent in the spatial or frequency domains. Therefore, by introducing the fractional Fourier transform, lesion characteristics can be observed in the fractional domain. Based on the fractional Fourier transform and the Mamba network, a novel model named FRFTMamba-UNet is proposed for stroke medical image segmentation. This model incorporates the fractional domain into the Mamba network and designs a multi-level residual module connected to the U-Net encoder. Additionally, a hierarchical feature extraction strategy is implemented in the U-Net-like network, where different feature extraction modules are designed for the shallow and deep layers. Specifically, residual convolutions based on convolutional neural networks are added to the shallow layers to effectively extract shallow features, while the Mamba architecture is utilized in the deep layers to further extract deep features. The proposed method demonstrates superior accuracy and efficiency compared to existing state-of-the-art models based on the Mamba module across three stroke datasets: AISD, ATLAS, and ISLES22. On the AISD dataset, it achieves a Dice score of 64.27%; on the ATLAS dataset, it achieves a DSC score of 62.24%; and on the ISLES22 dataset, it achieves a DSC score of 85.24%.
  • Gao Xiong, Gou Guanglei, Zhou Linjie, Jia Penghao
    Accepted: 2025-06-20
    In fine-grained image classification tasks, sufficient samples can provide rich local feature information. However, in few-shot scenarios, data sparsity makes it difficult for the model to fully capture discriminative local information. To address this issue, a few-shot learning method integrating axial attention and a scale-aware mechanism is proposed. First, a frequency-adaptive feature selection module is designed to reduce interference from background noise and non-target regions, highlighting discriminative local features and thus increasing the feature separability between different categories. Second, an axial-scale joint enhancement module is constructed to integrate global contextual information, focus on key regions, and process features with different receptive fields in parallel, improving the representation capability for details at various scales. Finally, a dual similarity measurement module is adopted to guide learning through two similarity measurement methods, enhancing the generalization of features and reducing the bias toward specific features. On the public datasets CUB_200_2011 and Stanford Dogs, the proposed method improves classification accuracy by 1.4 and 1.45 percentage points in the 1-shot and 5-shot scenarios, respectively, and by 1.86 and 3.49 percentage points on the Stanford Cars dataset. In the 1-shot scenario, it achieves state-of-the-art performance, while in the 5-shot scenario, it also achieves competitive results. Experimental results demonstrate that the proposed method effectively improves the performance of fine-grained image classification under few-shot settings and better captures discriminative feature information.
  • Kewei Zhang, Xin Wen, Wenhui Zhang, Rui Cao
    Accepted: 2025-06-20
    Drug development is a complex, costly, and low-success-rate process. Molecular property prediction is a fundamental yet challenging task in drug development, and accurately predicting molecular properties can accelerate the process and reduce costs. With the advancement of machine learning, particularly deep learning, significant progress has been made in molecular property prediction. However, many existing methods rely on single molecular representations or fail to integrate the potential relationships among multi-dimensional representations. Therefore, this study proposes a novel molecular property prediction method—the Multi-Representation Fusion Model for Molecular Property Prediction (MRFP). It innovatively designs a molecular representation fusion algorithm that integrates two distinct types of molecular representations: molecular fingerprints and molecular graphs, thereby generating a more comprehensive and detailed molecular representation, which provides more accurate input for molecular property prediction. Furthermore, to better extract features in molecular graphs, we have designed a novel molecular graph readout module named the Tri-Step Convolutional Readout Module (TCNN) based on molecular characteristics, which effectively captures the information expressed in molecular graphs. Experimental results on six classification datasets and three regression datasets from MoleculeNet demonstrate the effectiveness of our method, achieving an average improvement of 2.8% in classification metrics and a reduction of 0.47 in regression metrics. This research not only provides a new solution for molecular property prediction but also offers strong support for molecular design and screening in drug development, with broad application prospects and potential.
  • PI Chengdong, HU Bin
    Accepted: 2025-06-19
    Using traditional computer vision technology to resolve collision detection in complex scenes is a very difficult task, especially when faced with false collision interference, the model has a high false alarm rate and low accuracy. To address this problem, based on the hierarchical structure of the mammalian retina, this paper uses the danger perception characteristics of neurons in the Polysensory Zone (PZ) the precentral gyrus of the primate cerebral cortex to a specific visual area, and proposes a bio-inspired Enhanced Collision Detection Neural Network (ECDNN) that could effectively reduce false collision interference. This network consists of a presynaptic subnetwork and a postsynaptic subnetwork. Among them, the presynaptic subnetwork is based on the hierarchical processing and step-by-step transmission characteristics of mammalian retinal information, and divides the dynamic focus receptive field from the global Focus of Expansion (FOE) to obtain key visual information from low-order visual information perception. The postsynaptic subnetwork integrates the membrane potential excitation intensity caused by the approaching visual stimulus in the focus receptive field, and outputs an alarm signal representing the imminent collision danger. Experiments show that the model can not only effectively filter false collision interference in complex scenes and reduce model false detection, but also improve the accuracy of collision detection to over 96%, which can provide an important foundation for building future artificial intelligence interactive systems.
  • ZHENG Kun, ZHANG Ziyan, LI Xiaoli
    Accepted: 2025-06-19
    The measurement of physiological parameters from facial videos in online education is currently a research hotspot in intelligent education. Traditional remote photoplethysmography (rPPG) cannot adapt to the changes in the lighting environment in online education scenarios, which affects the flexibility and accuracy of physiological parameter measurement based on facial videos. Aiming at the typical lighting scenarios in online education, a method for extracting blood volume pulse (BVP) signals based on lighting adaptability is proposed, and a dual correction model for BVP signals is constructed by combining a generative adversarial network (GAN) and a convolutional neural network (CNN). Firstly, the optimal solution of the orthogonal chrominance signal under different lighting conditions is calculated based on the simulated annealing algorithm. At the same time, a lighting scene prediction mechanism for classifying lighting scenes using the average gray intensity is established to achieve the optimal chrominance signal that adapts to the lighting scene. Furthermore, the GAN and CNN models are combined to perform dual correction on the BVP signal to ensure that the finally output physiological parameters are more accurate and reliable. The model is verified on four publicly available datasets reorganized for typical educational scenarios. The experimental results show that the root mean square error (RMSE) of the heart rate is reduced by an average of 8.3 bpm, demonstrating the robustness and accuracy of the model under different lighting conditions. This model has significant advantages in improving the accuracy of heart rate and heart rate variability prediction, and can provide effective support for contactless physiological parameter detection in complex lighting environments.
  • SHI Kangwei, CHAI Yidong, QIAN Yang, JIANG Yuanchun, LIU Yezheng
    Accepted: 2025-06-19
    Web Application Firewall (WAF) is an effective tool for protecting web applications from cyberattacks. The rapid development of web applications in recent years has made research on WAF increasingly significant. Common approaches to building WAF include rule-based methods and machine learning-based methods. Rule-based WAF detect attacks using a predefined set of rules, which are often complex and challenging to update dynamically or manually. Machine learning-based WAF, primarily utilizing methods such as Support Vector Machines, classify payloads but struggle to identify sudden malicious payloads as effectively as rule-based methods and lack the breadth of coverage provided by rule-based approaches. To address these limitations, this paper proposes a WAF enhancement method based on pretrained language models, which strengthens rule-based WAF. The method first fine-tunes a pretrained language model using collected malicious and benign payloads to endow it with preliminary discriminative capabilities. Subsequently, the model undergoes iterative fine-tuning using malicious payloads intercepted by the WAF to learn the textual features of these payloads. During deployment, the pretrained language model is positioned in front of the WAF to perform initial payload screening. Additionally, returning deceptive responses to some requests intercepted by the pretrained language model further enhances the effectiveness of the proposed method. Adversarial experiments were conducted on two open-source WAF, targeting SQL injection and cross-site scripting attacks with two attack methods. The results demonstrate that the average interception rates for payloads generated by the two attack methods increased from 40.01% and 36.07% to 96.91% and 97.13%, respectively, after enhancement with the pretrained language model, while maintaining a false positive rate of 0. These findings validate the effectiveness of the proposed method.
  • Zhang Hang, Wang Jinsong
    Accepted: 2025-06-13
    For user devices (UD) with limited computing resources, handling computation-intensive tasks is quite challenging. Edge computing helps by extending computational resources to the network edge, and one of its key enabling functions is the efficient offloading of tasks. Coordinating the computational resources of numerous edge nodes for task offloading, while ensuring data security during the offloading process, is a significant challenge. Therefore, a task security offloading method based on deep reinforcement learning (DRL) is proposed. First, an edge computing network model is constructed, and a variable security protection mechanism is designed to adaptively ensure data security. Then, the edge computing network model and objectives are formalized and further transformed into a Markov decision process (MDP). Finally, a DRL method based on a penalized action space is proposed to derive the optimal task offloading strategy. Simulation results show that the proposed method can reduce latency and energy consumption costs while ensuring security protection, and consistently maintain a zero task loss rate.
  • WANG Guanyu, GU Yijun
    Accepted: 2025-06-11
    In the field of malicious encrypted traffic classification, algorithms enhance the richness of learning discriminative representations by increasing the dimensionality of traffic features. However, challenges persist, such as the mismatch between selected models and the characteristics of malicious encrypted traffic data, insufficient feature selection, and a lack of in-depth discussion on the characteristics of encrypted traffic data. To address these issues, a classification method based on multi-representation fusion is proposed for the domain of IoT malicious encrypted traffic classification. On one hand, an abstract representation learning module is used to learn packet-level byte association representations and session statistical representations of traffic sessions. On the other hand, a plaintext representation learning module is employed to learn session connection representations of unencrypted plaintext. Finally, the classification results of the two modules are fused based on the confidence scores of the abstract representation learning module to obtain the final malicious traffic classification result. To validate the method's advancement, its performance is compared with 7 benchmark methods based on different methods. The method achieves an F1 score of 0.7694, significantly outperforming other existing benchmark methods. Additionally, to discuss and validate the adaptability of each module to traffic representation learning and the complementarity between the discriminative representations contained in the selected features, 10 variant models based on different inputs and model architectures are generated and compared. The results demonstrate that the proposed method has superior detection performance, proving the adaptability of the model architecture and the complementarity between the representations.
  • SHEN Xianhao , GU Ling , CHEN Yi , YANG Jiazhi
    Accepted: 2025-06-06
    With the accelerated integration of renewable energy into the grid and the intelligent transformation of the new power system, the Power Internet of Things (PIoT) has become key to realizing the intelligence of power systems. However, Power Internet of Things Devices (PIoTD) in remote areas face numerous challenges, including inadequate network coverage, limited energy harvesting, and poor communication conditions. To address these issues, a cloud-edge-device cooperation framework based on artificial intelligence is processed, which employs Unmanned Aerial Vehicle Simultaneous Wireless Information and Power Transfer (UAV-SWIPT) to provide continuous energy to energy-constrained PIoTD. Energy replenishment and communication relay frameworks for SAG-PIoT devices are facilitated by deploying SWIPT services on UAVs in a low-altitude network within the space-air-ground network. Furthermore, to optimize the collaborative work of multiple UAVs and enhance data relay, transmission power allocation, Global Energy Efficiency (GEE), and PIoTD association scheduling, a multi-agent deep reinforcement learning algorithm is introduced to tackle the problems of incomplete global information and high-dimensional variable coupling in dynamic environments. The simulation results show that the proposed algorithm converges faster and demonstrates superior energy efficiency compared to several other benchmark algorithms. On the other hand, in terms of maximizing the minimum transmission rate, MADDPG achieves the highest performance, reaching bits/s. Additionally, it is observed that the optimal SWIPT power splitting ratio is approximately 0.7, and the GEE is the highest.
  • YUAN Lining, FENG Wengang, LIU Zhao
    Accepted: 2025-06-06
    In order to solve the problems of current academic paper classification methods, which neglect the relational information, we propose a novel classification model that integrates Graph Convolutional Networks (GCN) with contrastive learning, called Contrastive Graph Convolutional Network (CGCN). Firstly, we define two distinct types of homogeneous-heterogeneous relational information based on the content and citations of the papers, transforming these into self-supervised information for constructing the contrastive loss. Secondly, we enhance the feature extraction process of GCN by employing contrastive loss, pushing homogeneous papers to be close to one another while ensuring that heterogeneous papers remain distant. Thirdly, we utilize cross-entropy loss and the softmax function to complete end-to-end academic paper classification. On three benchmark academic datasets, the CGCN outperformed advanced baselines in classification task. Micro-F1 and Macro-F1 are raised by 8.29% and 7.91% respectively compared to the original GCN on the Cora dataset. CGCN enhances the capacity to represent potential information in papers by employing a contrastive loss based on the homogeneous-heterogeneous relationship, thereby improving prediction accuracy and generalization. This approach provides innovative ideas and methods for research in academic paper classification.
  • CHEN Haixiu, CHEN Ziang, FANG Weizhi, LU Haitao, HUANG Zijie, CHENG Rong
    Accepted: 2025-06-05
    Dense pedestrian detection is one of the key problems in the development of crowd flow monitoring system in large public places. Aiming at the difficulty of small target detection caused by crowd occlusion in dense pedestrian detection scenes and the deployment requirement of lightweight model, this paper proposes an improved YOLOv8-n dense pedestrian detection model CAD-YOLO(CGDown-Adaptive Fusion Module-Dyhead). Embedded CGDown subsampling module, through an efficient context information extraction mechanism, effectively alleviates the problem that the traditional target detector is easy to lose context features when dealing with dense scenes, and significantly enhances the ability to capture dense pedestrian features and focus on small targets. A BiFPN-Adaptive structure was designed and the neck network was reconstructed. By adaptive fusion of feature information of different scales, the model was more accurate in extracting features of obscured pedestrians and small and medium-sized target pedestrians, and the number of parameters and calculation cost of the model were greatly reduced. The dynamic detection head Dyhead, combined with the new 160×160 small target detection layer, enables the model to capture the fine features of the dense small target area more accurately, thus effectively alleviating the problem of missing detection in the occlusion scene. The experimental results show that compared with YOLOv8-n, the detection accuracy of CAD-YOLO on Crowd Human dataset and WiderPerson dataset is improved by 5.1% and 2.1%, respectively. Despite the significant performance improvement, CAD-YOLO has a reference count of only 2.9M and a model compute capacity of 12.3GFLOPs, meeting the requirements of low power consumption and high precision when deployed on edge devices or mobile devices.
  • LIU Tao, Man Dapeng, XU Chen, LV Jiguang, FENG Zhu, ZENG Fanyi, ZHOU Xue, YANG Wu
    Accepted: 2025-06-05
    Conventional clean label backdoor attacks often fail to establish a strong link between the trigger and target class, resulting in a low attack success rate, and extensive experimental experience shows that this failure is even more severe in federated learning. The main reason for the failure of the attack is that the random selection of the trigger makes it lack a direct connection with the target class. To this end, a learnable trigger backdoor attack was designed for federated learning, which made full use of the task information and shared model issued by the central server to train a trigger that was strongly correlated with the target class, and formalized this training process into a dual-objective optimization problem and solved it. Found the optimal perturbation under constraint conditions to blur the original features of the image as much as possible, thereby maximizing the model's learning ability for the trigger; these blurred images were then trained by adding the triggers allowed within the specified range as inputs, minimizing their image classification loss and generating the optimal trigger quickly using the optimization method of small-batch projected gradient descent. The backdoor attack activated with this trigger still guaranteed excellent attack performance in federated learning. Experimental results on three datasets showed that the attack success rate of the proposed method in federated learning was much higher than that of all kinds of existing clean label backdoor attacks, especially on CIFAR-10, which had an improvement of more than 82% compared to the baseline method. The proposed attack method presents new challenges to the security of federated learning.
  • Li Junliang, Ma Junpeng, Liu Mengxuan, Liu Yuxue, Zhang Junsan
    Accepted: 2025-06-03
    Medical report generation from images is challenging due to low image contrast and the small size of abnormal regions, making it difficult to accurately capture abnormal features using visual information alone. Therefore, introducing external knowledge to enhance visual representation becomes a key issue. In addition, the co-occurrence patterns of abnormal features are complex and cannot be effectively captured from a single instance, making it crucial to leverage similar cases to model such patterns. To address the aforementioned challenges, a Similar-Instance Guided method for medical report generation is proposed, consisting of two main components: Image Feature Memory Module Incorporating Heterogeneous Graphs(FMHG) and Similar Instance Feature Fusion Module(SIFF). FMHG extracts entity relationships from the report and constructs a corresponding heterogeneous graph as a bridge, guiding the model's attention to the abnormal regions of the image, thus enhancing abnormal visual features. SIFF retrieves similar instances and integrates their abnormal visual features, thereby augmenting the representation of abnormal regions while acquiring a more comprehensive under-standing of the abnormal information. Experiments conducted on the IU X-ray and MIMIC-CXR medical imaging datasets demonstrate that the proposed method performs well on the BLEU evaluation metrics, achieving BLEU-1 to BLEU-4 scores of 0.539, 0.353, 0.265, and 0.193 respectively on the IU X-ray dataset. Additionally, it excels in METEOR and ROUGE-L metrics, indicating that the proposed method outperforms existing methods in terms of NLG metrics as well as the accuracy and completeness of the generated reports.
  • Hu Wei, Chen Yuner, Du Puliang
    Accepted: 2025-06-03
    Aiming at the low efficiency of parameter optimization of Variational Mode Decomposition (VMD) in current short-term electricity price prediction methods, the insufficient feature expression ability of single prediction models, and the problem of feature redundancy, this paper proposes a short-term electricity price prediction method based on Multi-Strategy Improved Crested Porcupine Optimizer (MSICPO) algorithm and deep learning. First, the Crested Porcupine Optimizer (CPO) algorithm is improved by introducing Lévy flight strategy, periodic population variation, and dynamic parameter adjustment mechanism to enhance its global search ability and convergence speed. It is used to optimize the modal number and penalty factor parameters of VMD to improve the accuracy of signal decomposition. Second, a deep learning model integrating feature weighting is constructed. By designing a dynamic weighting module to suppress noise interference and enhance the impact of key features, combined with the long-term dependency capture ability of sLSTM and the parallel computing advantage of Transformer, multi-scale feature collaborative optimization processing is realized. Finally, the MSICPO-VMD-WF-sLSTM-Transformer hybrid model is constructed for electricity price prediction. Experimental results show that the Multi-Strategy Improved Crested Porcupine Optimizer algorithm achieves a refined balance of optimal solution precision and optimization efficiency in VMD parameter optimization compared with the original CPO algorithm and other traditional optimization algorithms. The proposed hybrid forecasting model performs well in prediction accuracy, with a coefficient of determination reaching 0.95. In addition, cross-regional data prediction experiments further verify the applicability and generalization ability of the model in different regional electricity markets. The method proposed in this paper not only provides theoretical references for the improvement of intelligent optimization algorithms and multi-feature prediction technologies, but also offers a high-precision and strong generalization solution for short-term electricity price prediction in complex electricity markets.
  • GENG Xia, LIN Xianwen, YANG Zhi
    Accepted: 2025-06-03
    In text-based person search tasks, initializing models with parameters from pre-training models has become a mainstream paradigm, which effectively alleviates the feature alignment bottleneck of single-modal models caused by the lack of cross-modal information. Existing methods focus on mining semantic features at different scales in the image-text joint embedding space for optimization. However, the introduction of the new alignment paradigm is prone to cause the pre-training model to fall into local minimum during fine-tuning. To solve above issues, this paper proposes a Prompt-based Information Transfer (PIT) framework. By introducing cross-modal prompt tokens in the original forward process of the single-modal encoder and the cross-modal image-text encoder, it promotes early feature fusion and implicitly guides the model to focus more on modal-invariant information. PIT includes a prompt-based contrastive loss and a prompt training strategy. The prompt-based contrastive loss aims to construct a shared feature embedding space with both intra-modal discrimination and inter-modal semantic consistency by constraining the similarity between graphic and text features. The prompt training strategy can be regarded as a form of self-distillation, which treats the pseudo-targets generated by non-prompt features and ground-truth as another view of image-text pair, supervising the training process and making the learned embeddings contain richer multi-modal information. Only 0.61M additional parameters introduced on the basis of fine-tuning, PIT achieves Rank-1 improvements of 1.48%, 1.5%, and 1.55% on three public datasets, respectively.
  • GU Yingshuang , GUI Tao , ZHANG Qi
    Accepted: 2025-06-03
    Large language models (LLMs)’s factual hallucination refers to the generation of content that conflicts with established real-world facts, significantly reducing model credibility and applicability in high-risk domains such as healthcare, law, and scientific research. Current methods for hallucination mitigation primarily depend on input optimization, supervised learning, or integration with external knowledge bases. However, these approaches exhibit limited generalizability, substantial dependence on extensive labeled datasets, and constraints in real-time scenarios, making it challenging to fundamentally improve the factual accuracy of LLMs. To address these limitations, this paper proposes a reinforcement learning-based framework incorporating semantic entropy as feedback to mitigate factual hallucinations. Semantic entropy serves as a precise measure of uncertainty at the semantic level, enabling an accurate assessment of the model's confidence in its generated responses. By embedding semantic entropy into the reinforcement learning process as a reward signal, the model is encouraged to proactively avoid responses with a high likelihood of hallucination. Compared to traditional predictive entropy-based methods, semantic entropy more effectively distinguishes semantically equivalent expressions and enhances factual judgment capabilities without reliance on external knowledge sources. Experimental results show that this paper’s method, while maintaining the richness and coherence of the generated content, can improve factual judgment accuracy by up to 5.7% and factual generation accuracy by up to 7.8%, compared to the best baseline model, significantly validating its superiority in factitious hallucination mitigation.
  • ZHANG Lei, LI Shihua, GAO Hao, WANG Xiaoyong
    Accepted: 2025-05-26
    With the escalating energy consumption of urban rail transit system, enhancing the utilization of regenerative braking energy to reduce energy consumption of train operation has become a critical issue. This paper focuses on the optimization problem of tracking train operation control strategy in the process of multi-train cooperative operation. Firstly, building upon the traditional transition strategy of operation mode, the strategy of “Traction-Coasting-Traction-Cruising-Coasting-Braking” is proposed specifically for the tracking operation scenario. Secondly, the train dynamics model in spatial-domain, state transition equation, and energy consumption model are constructed. By employing interpolation method, the cooperative operation problem in time-domain is transformed into the problem of solving optimal switch points in spatial-domain. Subsequently, an optimization decision-making model with the goal of energy consumption and punctuality is constructed, which is then efficiently solved by using the Dung Beetle Optimizer. Finally, taking the Yizhuang Line of Beijing Subway as the simulation line, comparative analyses are conducted to evaluate the influence on optimization performance of Communication-Based Train Control (CBTC) and Train Autonomous Control System (TACS) architectures, as well as different transition strategies. The results demonstrate that TACS significantly enhances the optimization performance of cooperative operation, compared to CBTC. The proposed strategy not only meets punctuality requirement but also outperforms the traditional strategy in energy consumption at various departure intervals. The net absorbed energy consumption can be increased by 14.651 kWh at most, and the actual operational energy consumption can be decreased by 11.284 kWh at most. Therefore, the proposed operational mode transition strategy and optimization method effectively improve the energy consumption of train operation, and have certain reference significance for the development of urban rail train operation control technology. The code has been published in Github: https://github.com/eva-777/Tracking-Train-Operation-Optimization.git.
  • Zukun Wan, Runming Wang, Tianming Ma, Xingdong Song, Shengrong Yuan, Yajun Ding
    Accepted: 2025-05-23
    视觉问答(Visual Question Answering, VQA)理解和解析输入图像及其对应的文本问题,进而提供与问题相关的自然语言答案,已成为跨模态分析领域一个前景广阔的研究方向。现有工作极大程度上依赖于数据集的一些因素,如伪相关、数据集偏差和捷径学习,都对算法鲁棒性带来了极大的挑战。现有基于集成学习的方法通过训练偏差模型捕捉数据集偏差,但由于偏差模型对偏差样本的识别能力不足,导致其难以充分学习偏差信息,进而削弱去偏效果。为了增强偏差模型学习数据集偏差的能力,本文针对 VQA 任务提出了一种自适应偏差学习网络(命名为 ABLNet)。ABLNet 的核心设计包括: 首先,提出了一种自适应的样本重加权机制,基于每个样本的梯度信息动态分配权重,从而增强模型对数据集中偏差特征的学习,提升模型的泛化能力。其次,提出了一种基于受限学习的网络剪枝策略,通过限制偏差模型的学习能力,使其依赖于数据集中的表面相关性和偏差特征。在 VQA-CPv1、VQA-CPv2 和 VQA-v2 这些具有挑战性的 VQA 数据集上进行了大量实验,实验结果证明了我们方法的优越性。
  • CAO Xiaofei, WANG Runmin, CUI Lingxin, CHAI Xinling, Ding Yajun, Han Chang
    Accepted: 2025-05-23
    Breast ultrasound image segmentation plays a significant role in computer-aided diagnosis, but existing methods are constrained by the bottleneck of scarce annotated data. In recent years, generative models have demonstrated potential in medical image synthesis, yet current approaches struggle to simultaneously ensure image realism and mask semantic consistency. To address the performance bottleneck of segmentation models caused by the limited scale of ultrasound image datasets, this paper proposes an innovative ultrasound image dataset augmentation method. First, from a pathological perspective, we design a mask generation module based on the characteristics of benign and malignant tumors, which efficiently generates multiple semantically plausible masks. Next, to synthesize ultrasound images corresponding to these masks, we propose a Mask-guided Diffusion Model (MDM). This model incorporates mask information into the denoising network of the diffusion model through normalization methods, thereby generating ultrasound images that exhibit high semantic consistency with the masks. Experimental results demonstrate that the proposed method significantly outperforms mainstream generative models in terms of image fidelity (FID) and semantic alignment (mIoU). By validating the strategy of incrementally generating data, the performance of segmentation models improves markedly with increasing data volume, proving the effectiveness of the synthesized data.
  • Kai Chen, Zhihua Chen, Lei Dai
    Accepted: 2025-05-22
    Multi-agent Deep Deterministic Policy Gradient Algorithm (MADDPG) alleviates the problem of environmental non-stationarity by introducing global information when solving multi-agent path planning problems. However, in complex environments, multi-agent reinforcement learning algorithms still have shortcomings such as sparse rewards and low levels of agent collaboration. To solve these problems, a multi-agent path planning algorithm based on state action prediction (SA-MADDPG) is proposed. In SA-MADDPG, a Novelty Reward Module based on Long Short-Term Memory network is designed, which can give novel reward values to the agent without relying on current observations and actions to alleviate the problem of reward sparseness. In addition, an Action Prediction Module is designed by explicitly incorporating collaborative information, and a dynamic weight term based on Q-value gain to guide the agents in balancing the optimization of its own task strategy with the optimization of collaborative task strategies, thereby enhancing the level of collaboration among agents. Finally, a three-dimensional multi-agent path planning simulation environment based on drones is constructed to comprehensively evaluate the performance of the proposed algorithm. Experimental results show that the average reward and average episode time of SA-MADDPG: in the obstacle density experiment, they increased by 5.26%-15.81% and decreased by 10.96%-16.05% respectively; in the agent number experiment, they increased by 16.32%-22.9% and decreased by 15.03%-25.15%.
  • TIAN Qing, SHEN Junyu, YU Jiangsen
    Accepted: 2025-05-22
    Unsupervised Domain Adaptation (UDA) aims to migrate knowledge from the labeled source domain to an unlabeled target domain to improve the performance of the target domain model. However, traditional UDA methods assume that the category spaces of the source domain and target domain are entirely consistent, making it impossible to handle unknown categories in the target domain. This limitation restricts their application in real-world scenarios. Open-Set Domain Adaptation (OSDA) addresses this issue by introducing recognition of unknown categories, but effectively reducing inter-domain differences and category imbalance remains a significant challenge. Existing OSDA methods often overlook domain specific features and simply minimize domain differences. This can lead to unclear boundaries between categories and weaken the model’s generalization ability. Therefore, to address this problem, this paper proposes Open-Set Domain Adaptation with Optimal Transport Distance Regularization and Neighborhood Clustering (OTRNC). This method maximizes the distribution distance between high and low confidence sample sets using optimal transport distance regularization, thereby reducing the interference of unknown categories in the domain adaptation process. Subsequently, dynamic nearest neighbor retrieval and invariant feature learning are employed to reduce intra-class variations within the target domain, enhancing feature generalization capabilities. Experimental results show that OTRNC performs well across multiple benchmark datasets.
  • Gao Lingping, Xu Wei, Chen Xi, Mu Yibo, Zhang Kai
    Accepted: 2025-05-22
    As software scale and complexity grow exponentially, monitoring and analyzing program runtime behavior has become increasingly challenging. Dynamic binary instrumentation is an effective solution to this problem, with mature tools like Pin and Valgrind supporting mainstream architectures such as x86 and ARM. However, these tools lack support for emerging domestic instruction set architectures, such as LoongArch. LoongArch, a self-developed instruction set architecture in China, exhibits high levels of autonomy, advancement, and compatibility. Nevertheless, due to its relatively short development history, its ecosystem remains incomplete, particularly in the debugging toolchain. To address this gap and promote the maturation of the LoongArch ecosystem, developing a dynamic binary instrumentation tool for LoongArch is of significant importance. This study aims to design and implement a dynamic binary instrumentation tool based on the QEMU framework to support program monitoring and analysis on LoongArch. The tool, modeled after Pin, implements five fundamental instrumentation granularities and related APIs, along with over 20 instrumentation tools for direct use or as learning resources for tool development. To enhance performance, the framework was optimized through improvements in conditional jump instruction translation, basic block linking, and instrumentation inlining. Performance tests demonstrate that the optimized framework achieves over 100 times improvement in instruction-level instrumentation efficiency and nearly 33 times improvement in basic block-level instrumentation efficiency. Finally, the source code has been open-sourced on GitHub to facilitate the further development of the LoongArch ecosystem and provide a reference for researchers in related fields.
  • FENG Tao, HU Bin, XU Guangyuan
    Accepted: 2025-05-22
    Crowd escape behavior in public places is easy to cause serious public safety disasters. Traditional computer vision technology can detect a few characteristics of crowd escape behavior, but it is difficult to face complex dynamic visual scenes. To address this issue, based on the structure characteristics of locust visual nerve, the danger perception mechanism of locust Lobula Giant Movement Detector (LGMD) and mammalian retinal luminance adaptation mechanism, this paper proposes an Enhanced Crowd Escape Detection Neural Network (ECEDNN). The proposed neural network collects the luminance changes caused by crowd activities in the field of view. With the help of the mammalian retinal luminance adaptive mechanism, the visual response excitation is tuned to adapt to the lighting scene. Visual excitation and suppression are mixed to filter background noise and center-surround mechanism was used to enhance motion edges. Finally, neural spike adaptive tuning is used to detect the burst escape behavior of the crowd and output strong membrane potential excitation. This work is involved the research of crowd activity detection inspired by biological visual perception mechanism, which can provide new ideas and methods for crowd behavior activity perception and anomaly detection in artificial intelligence.
  • HU Caifu, WEI Bo, REN Ruibin
    Accepted: 2025-05-22
    As the network environment continues to evolve and internet applications emerge, machine learning classifiers trained on previous traffic data are becoming increasingly less adaptable to new sample spaces. This leads to a decline in the identification capabilities of classification models, which cannot meet the growing demands of network services and network security. Manually updating classifiers based on experience requires a significant amount of effort and does not guarantee the generalization performance of the new classifiers. At the same time, the continuous influx of new data poses a severe challenge to balancing model training accuracy with computational resource storage. Considering this, this paper innovatively proposes an incremental learning strategy spatial optimization technique to achieve efficient network traffic classification. First, by optimizing the spatial distribution of new and old traffic samples, clusters of new and old categories are kept at a minimum interval, avoiding distribution conflicts between new and old tasks due to sharing the same feature space. Then, within the optimized feature space, a small amount of old data samples are replayed, and knowledge distillation technology is combined to maintain the stability of the original model parameters, adjusting only the extended part of the model to update the classifier at the minimum cost. Experiments on the USTC-TFC2016 dataset show that, compared with other methods, the proposed method in this paper demonstrates higher stability and effectiveness in terms of model accuracy, resource consumption, performance, and ablation experiments.
  • XIE Qingqing, LIU Yuanyuan
    Accepted: 2025-05-20
    In the field of cybersecurity, phishing attacks are becoming increasingly complex and frequent. Traditional phishing detection schemes based on predefined reference templates rely on brand-domain mapping lists, using visual feature matching to identify brand intent and verify domain consistency for explainable detection. While these methods can counter zero-day phishing attacks, they face scalability challenges due to the need for continuous updates to reference lists to cover emerging brands, leading to high maintenance costs. To address these, the paper proposes Phish-RAGLLM, a novel reference-based phishing detection scheme leveraging Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG). By reframing traditional visual problems into language tasks, Phish-RAGLLM eliminates reliance on predefined reference lists, utilizing LLMs' extensive brand knowledge while enhancing generation capabilities through RAG integration with external brand knowledge bases. This approach effectively mitigates LLM hallucination issues and improves detection precision and robustness. Experimental results demonstrate that compared to the current state-of-the-art model PhishLLM, Phish-RAGLLM—using GPT-3.5-turbo-instruct as the main LLM—balances model performance, inference cost and knowledge base completeness, achieving 5.88% increase in F1score and a 12.5% improvement in operational efficiency. Moreover, it shows strong robustness against dataset variations and prompt injection attacks. Based on the characteristics of LLM, Phish-RAGLLM exhibits good adaptability to multilingual phishing websites, effectively detecting phishing webpages in different linguistic contexts. Furthermore, real-world evaluations reveal that Phish-RAGLLM has broader detection capabilities than VirusTotal (a threat intelligence source), further validating its feasibility and effectiveness.
  • WANG Chaoyang, SUN Weiwei
    Accepted: 2025-05-20
    Combinatorial optimization problems have important applications in areas such as logistics path planning, but their solution space exponentially expands with the problem size, leading to severe challenges for traditional methods. In recent years, neural combinatorial optimization methods based on reinforcement learning have been able to achieve solution quality close to that of traditional solvers while keeping the solution consumption time short. The mainstream method POMO (Policy Optimization with Multiple Optima) enhances the training stability through symmetry optimization, but its unidirectional sequence generation mechanism still suffers from a double limitation: on the one hand, it is difficult for the traditional constructive method to fully exploit the symmetry features of the problem; on the other hand, the endpoint information can’t effectively participate in the decision-making process of the remote node. To address this problem, this paper proposes a Bidirectional Construction Strategy (BCS)-based POMO model, named BCS-POMO, which dynamically selects the extension direction with higher confidence by constructing the solution sequence in parallel from the start point and the end point, avoiding models that are caught in a dilemma due to unidirectional constructions. The model exploits the symmetry of the construction sequence to achieve weight parameter sharing and improves the efficiency through batch parallel computation. Experiments have shown that the BCS-POMO effectively reinforces the role of endpoint information as a decision aid in the construction process, which reduces the error by 16% and 18% for the traveling salesman problem (TSP) and the capacitated vehicle routing problem (CVRP), respectively, verifying the effectiveness of the bidirectional construction strategy in exploiting the endpoint information and the advantages of symmetry modelling.
  • Guo Ziyun, Tian Youliang, Li Mengqian
    Accepted: 2025-05-19
    Federated learning leverages client data resources to collaboratively train a global model, whose performance depends on the quality of client data and their level of participation. Clients expect to receive appropriate compensation after contributing high-quality data to enhance their motivation for participation. Additionally, since the local model parameters uploaded by clients contain information about private data resources, they face the risk of privacy leakage. To address these challenges, this paper proposes an incentive-based adaptive privacy-preserving federated learning scheme. First, a pre-decision game auction mechanism is designed to ensure that clients truthfully report their costs while achieving Nash Equilibrium (NE). Second, a training quality evaluation algorithm is developed based on training time and model loss, which determines client compensation according to the overall training quality evaluation score, thereby incentivizing high-quality data contributors to participate in training. Finally, an adaptive differential privacy technique is employed to perturb local model parameters, enhancing model utility through dynamic noise allocation. Theoretical analysis demonstrates that the proposed scheme satisfies security and privacy protection requirements, while experimental results validate its effectiveness.
  • CHEN Xinluo, ZHAO Shuang , CAO Fang
    Accepted: 2025-05-19
    With the development of multimedia technology, the difficulty of unauthorized forgery and dissemination of false information has greatly decreased. This may lead to a series of negative consequences. Effective content authentication algorithms are urgently needed to ensure the authenticity and security of image content. In recent years, perceptual image hashing has shown excellent performance in the field of image authentication. However, existing algorithms are not ideal for processing images with a large proportion of text, and they can not effectively cope with new content-preservation manipulations such as scribble. Therefore, a text-picture mixed image content authentication algorithm based on perceptual hashing is proposed. The proposed algorithm adopts the image segmentation algorithm of ring partition, and it calculates the frequency and distribution characteristics of SIFT key points within each ring. These features have rotation invariance and can effectively improve the anti-collision performance of the proposed algorithm. By obtaining key point information, the proposed algorithm performs good robustness performance against content-preservation manipulations, including irregular scribble. A Text-Picture Mixed Image (TPMI) dataset is constructed to validate the performance of the proposed algorithm. Compared with some representative algorithms, this algorithm has better performance in perceptual robustness, anti-collision, and security. Partial tampering with images can effectively identify each tampered image as similar to the original image. In addition, experiments on scribble attacks are constructed in reality, and the results show that it can effectively identify such attack images.
  • ZHU XingPo, WANG Xiaoyang
    Accepted: 2025-05-19
    Bi-triangle (6-cycle) enumeration in bipartite graphs is essential for graph analysis tasks like local clustering coefficient computation. As real-world bipartite graph data scales beyond single-machine capacity, efficient distributed algorithms are needed. However, the existing distributed graph partitioning (GP) enumeration algorithm struggles with large subgraph combinations, message overload, and redundant enumeration. In this regard, two optimized algorithms are proposed based on the topological characteristics of bi-triangles: Method 1 views the bi-triangle as three wedge structures, generating subgraphs using wedge groups as the basic unit. A subgraph combination mechanism via A-type and V-type wedge group concatenation is introduced, greatly reducing the number and scale of subgraph combinations, ultimately enumerating bi-triangles through wedge triplet. To prevent message overload and redundancy, a subgraph reading mechanism via a distributed storage system and a deduplication mechanism based on vertex ordering are proposed. Method 2 decomposes the bi-triangle into two zedge structures. It first partitions the graph using wedge groups and then applies a “compressed zedge” construction and restoration mechanism for a second partition, ultimately enumerating bi-triangles through zedge pairs with lower computational complexity than Method 1. Experiments show that, compared to GP, Method 1 reduces subgraph data by 205x on average and enumeration time by at least 45x, while Method 2 achieves average reductions of 30x and at least 101x, respectively.
  • You Yiheng, Wang Xin, Ma Menglu, Wang Hui
    Accepted: 2025-05-19
    知识图谱作为人工智能领域的关键数据组织形式,在大数据与大模型蓬勃发展的当下,被广泛应用于众多领域。随着知识图谱规模不断扩大,现有存储结构暴露出数据导入速度慢、存储空间占用大等问题。为此,本文提出一种“关系型+键值对”的混合存储方案(KGHS),并设计基于属性频率的实体聚类算法。KGHS借助基于属性频率的实体聚类算法,对不同属性频率的实体簇进行分类。对于高频属性,利用关系型数据库存储,发挥其查询效率高的优势;对于稀有属性,则采用键值对形式存储,以展现键值对存储在处理稀疏数据时的灵活性。这种设计有效规避了关系型存储面对稀疏数据时产生大量空值的弊端,减少了键值对存储中键的重复存储问题,在确保数据灵活性的同时,显著提升了存储效率。在合成数据集和真实数据集上的实验显示,与现有方案相比,KGHS在真实数据集上存储空间节省50%以上,数据导入速度提升一个量级,且查询性能不受显著影响,充分说明KGHS有效地解决了大规模知识图谱的存储难题,为知识图谱在各个领域的广泛应用提供了有力的存储支持,具有重要的理论意义和实际应用价值。
  • Xu Xinhao, Li Ziqi, Yin Hefeng, Zhang Yonghong
    Accepted: 2025-05-19
    The surface texture of a printed circuit board (PCB) is complex, with defects that are small and come in a variety of shapes. In order to accurately detect small targets, smaller-scale detection heads are often added, which has the effect of significantly increasing the computational cost and slowing down the detection speed. To address this issue, we propose a multi-scale feature fusion learning model for PCB small-target defect detection, named PCB-Det. Based on the YOLOv8 architecture, the model replaces the original backbone network with the lightweight PP-HGNet and incorporates the GSPPFCSPC module for multi-level feature extraction, thereby expanding the receptive field to enrich feature information. Furthermore, we have devised the Pro-BiFPN feature fusion network with the objective of enhancing the interaction between features from adjacent layers, thereby optimizing the fusion of shallow detail information and deep semantic information. Furthermore, the model incorporates shared feature branches to reduce the computational burden of the original detection heads and employs the Wise-IoU loss function to dynamically adjust the loss weights, thereby accelerating model convergence. The experimental results demonstrate that the proposed PCB-Det model achieves an average precision of 97.7% on the PCB_DATASET defect dataset, representing a 3.1% improvement over the baseline model. The model effectively reduces both missed detections and false positives, thereby enhancing the detection capability for small-target defects in PCBs.