Author Login Chief Editor Login Reviewer Login Editor Login Remote Office
Home Browse Just accepted

Just accepted

Please wait a minute...
  • Select all
    |
  • Zhikang Li, Yu Jin
    Accepted: 2025-07-04
    In cloud computing, frequent data updates and migrations by data owners (DO) raise data management complexity and challenge traditional cloud data auditing. Traditional third-party auditor (TPA) based dynamic auditing has centralization and single-point failure issues. Although blockchain technology is adopted to replace TPA, current blockchain-based schemes suffer from high computational cost, low dynamic efficiency, and heavy DO auditing burden. Thus, a Dynamic Cloud Data Auditing Scheme empowered by Side Information and Consortium Blockchain Smart Contracts(DCASIC) is proposed. DCASIC decouples auditing metadata from data block indexes via side information auditing and correlates them with a homomorphic hash function, enhancing dynamic auditing efficiency. Smart contract parallel execution and pre-computed verification information reduce DO auditing time. Theoretical and experimental results show DCASIC significantly cuts computational cost, boosts dynamic auditing efficiency, and reduces DO time cost during auditing compared with existing blockchain-based schemes.
  • Wang Yuanyuan, Cao Hui, Wang Tingwei
    Accepted: 2025-07-04
    Skeleton based motion recognition method is attracting more and more attention because of its excellent performance. In skeleton action recognition task, coarse-grained feature is an important supplement to fine-grained feature, which can effectively improve the performance of action recognition method. However, the existing multi granularity skeleton action recognition methods have shortcomings, first, the constructed coarse-grained features do not accurately retain the structural information between local adjacent fine-grained joint points; second, they do not make good use of the global correlation between coarse-grained features for feature learning. To solve the above problems, when constructing coarse-grained joint points, the arithmetic mean and classical convolution operations are used to capture the position and structure information of local adjacent fine-grained joint points; The cross attention mechanism is used to capture the global correlation between coarse-grained and fine-grained features, which can better describe the part level movement trend and improve the representation ability and discrimination of coarse-grained features. This method is combined with a variety of skeleton motion recognition models, and experiments are carried out under multiple evaluation standards of NTU RGB+D and NTU RGB+D 120 motion recognition data sets. Experimental results show that the proposed method can extract and fuse skeleton motion features with different granularity, and significantly improve the classification performance of human skeleton motion recognition method.
  • LIANG Ziyi, LIU Tianquan, LI Liping, ZHU Yuanfei, LU Cunyue
    Accepted: 2025-07-03
    Detection and identification of aquatic algae is an important task in ecological protection. However, in practical applications, traditional object detection models struggle to meet the real-time and efficiency requirements due to the limited hardware resources of on-site water quality detection equipment, as well as the computational complexity and resource demands. At the same time, lightweight models often face challenges in achieving sufficient accuracy when dealing with issues such as imbalanced sample distribution, severe target occlusion, significant scale differences, and complex backgrounds. To address these challenges, this paper proposes an improved EfficientDet object detection model aimed at effectively improving algae detection performance under limited computational resources. To tackle the problem of insufficient rare algae samples, data augmentation techniques are employed to enhance the model's generalization ability. For algae species with similar features, a CBAM (Convolutional Block Attention Module) attention module is introduced into the backbone network to enhance feature mapping between different algae species. In the feature fusion stage, a BiFPN (Bidirectional Feature Pyramid Network) module based on a hybrid attention mechanism is used to more accurately capture the semantic information of algae in complex backgrounds. Experimental results show that the improved EfficientDet model achieves an average precision (mAP) of 74.2% on the test set, which is a 3.4 percentage point improvement over the original EfficientDet model, with a floating point computation of 21.188 GFLOPS, an energy consumption of only 4.3W, and a model size of 31.4MB, which is just a 0.1MB increase compared to the original model. Compared to YOLOv5s, RetinaNet, Faster R-CNN, SSD, and other mainstream lightweight models such as YOLOv8 and YOLO-WORLD, the average precision (mAP) improved by 7.6, 1.7, 0.7, 4.0, 1.9, and 2.5 percentage points, respectively. Ablation experiments further validate the contribution of each module to the performance improvement and their collaborative optimization effects, providing an efficient and lightweight solution for applications such as water quality monitoring and ecological protection.
  • Shuai Feng , Jian Gao
    Accepted: 2025-07-03
    n the field of computer security, malicious code protection has always been an important research topic. With the rapid development of computer technology, the types and forms of malicious code are constantly evolving. Traditional feature engineering methods have a single feature dimension when dealing with complex malicious samples, resulting in insufficient representation ability and the inability to accurately identify various types of malicious code. Other malicious code classification methods based on feature fusion rely on expert experience to manually design features during the feature extraction process. Moreover, multimodal deep learning models have insufficient interpretability and high computational costs.To address these issues, this paper proposes an innovative feature fusion method, which is applied to the classification of malicious code in Windows PE files. By integrating behavioral features, structural features, and texture features, and using LightGBM as the classifier, the classification of malicious code is completed. The experimental results show that the proposedmethod achieves a test accuracy of 99.90% and a log loss (Logloss) of 0.0057 on the Microsoft Malware Classification Challenge dataset, and a test accuracy of 98.97% and a log loss of 0.042 on the Bazaar dataset.The experimental results demonstrate that this method can comprehensively and accurately represent malicious code, and it has important theoretical significance and practical application value. By fusing multi-dimensional features, this method provides an effective solution for malicious code detection and has broad application prospects.
  • Shang Wen, You Jinguo, Wu Kang, Fu Wantin, Li Xiaowu, Jia Lianyin
    Accepted: 2025-07-03
    In distributed computing frameworks, inefficient data transfer in the Shuffle phase has become a key bottleneck in data connectivity. Existing methods have certain limitations in dealing with table joins, such as broadcast joins and hash joins in Spark are both susceptible to data skewing, which makes the load between nodes unbalanced. Aiming at this problem, the paper focuses on joining aggregated queries, and proposes a table joining method based on lattice structure: by precomputing the storage table partition data in the form of lattice structure, and utilizing the convex set property of equivalence class, i.e., the data cells containing the upper bound of equivalence class and contained by the lower bound of equivalence class, whose aggregation values are equal to the aggregation values of equivalence class, so as to realize the quick matching and Calculation. Since the query data cells as a compressed form of basic table data, the data size and skew are more concise and uniform, the article uses the query data cells instead of table data to perform data transfer and connection, which greatly reduces the data Shuffle and computational complexity. The method proposed in the paper has been implemented in Spark, and experiments based on the TPC-H dataset show that: the method of the paper reduces the data Shuffle by about 45.06% in large dataset scenarios, meanwhile, the workload among the nodes is more balanced compared to the benchmark method, and the query response time is shortened by 14.23% on average.
  • Yuanhao Li, Fangli Ying
    Accepted: 2025-07-01
    Learning disentangled representations to enhance the controllability of image generation models is a key research direction in computer vision. However, existing methods face two major limitations: reliance on large-scale annotated data and difficulty in handling complex dependencies between features. To address these issues, this study proposes a universal generative disentangling method based on the Hilbert-Schmidt Independence Criterion (HSIC). This method innovatively converts HSIC into an independence regularization mechanism for the latent space of generative models. By incorporating HSIC regularization terms, it optimizes the measurement objective of nonlinear dependency relationships and guides the model to learn independent feature representations. Specifically, the study integrates HSIC into two mainstream generative model architectures: For the Variational Autoencoders (VAEs) class, it combines variational inference with HSIC regularization to optimize latent distribution disentanglement; For the Diffusion Models (DMs) class, it gradually achieves progressive feature disentangling by embedding the HSIC regularization term into the time step optimization of the reverse process. The experimental results show that this universal method, which can be implemented in different model architectures, enhances latent representation independence and maintains stable performance in unsupervised settings, offering a new way to model complex feature dependencies. To further verify the semantic consistency of the disentangling space, this study conducted latent space interpolation experiments to generate smoother trajectories, demonstrating that HSIC regularization constructs a linearly separable disentangling space. In terms of evaluation system, this study conducted dual validation using standard disentangling metrics and HSIC-based custom metrics, showing a positive correlation and confirming the objectivity of the disentangling evaluation criteria.
  • MENG Hui, ZHANG Luhui, YAN Xixi, TANG Yongli
    Accepted: 2025-06-30
    Currently, most Unique Ring Signature (URS) schemes are based on the discrete logarithm problem. Among them, only the URS scheme proposed by Nguyen and Junhui Wang satisfies post-quantum security requirements. However, each of these schemes has its limitations: Nguyen's scheme utilizes zero-knowledge proofs, which results in significant computational resource consumption, while Junhui Wang's scheme leads to longer key lengths due to the design of the lattice-based structure, increasing storage overhead. In addition, both schemes rely on digital certificates to manage public keys, requiring the storage and management of a large number of certificate files, further increases the storage and management costs of the system. To address these challenges, this paper proposes an efficient identity-based URS scheme over the NTRU lattice. First, by leveraging the relatively short public and private key lengths of the NTRU lattice cryptosystem, the scheme reduces key storage overhead. Second, a compact Gaussian sampling algorithm is employed to generate user private keys, thereby improving key generation efficiency. Finally, an identity-based mechanism is introduced to construct public keys, eliminating the reliance on digital certificates. Security analysis demonstrates that, under the Random Oracle Model (ROM), the proposed scheme achieves unconditional anonymity, unforgeability, and uniqueness, with its security reducible to the small integer solution problem on the NTRU lattice. Performance analysis shows that, compared to existing lattice-based URS schemes, this approach offers shorter public key lengths and lower computational overhead, with average reductions of about 15% and 13% in signature generation and verification time, respectively.
  • LI Dongfeng , CHEN Yuren , YU Bo
    Accepted: 2025-06-25
    In existing pavement crack detection methods based on U-Net, the interaction between features at different levels of the encoder has not been fully considered, which may lead to incomplete crack detections or missed crack detections due to information loss during down-sampling. Therefore, this paper proposes a pavement crack detection method based on multi-level feature fusion. First, in the encoding stage, crack features at different levels are extracted from the input image, forming crack feature representations from shallow to deep levels. Second, in the skip connection, the cross-level fusion strategy based on improved channel-wise cross fusion Transformer is employed, which enhances the complementarity between features at different levels and enriches the representations of crack features. Finally, in the decoding stage, the feature cross fusion module is used to optimize how the decoder utilizes the encoder's features, promoting the transmission of crack features and improving the perception capability for crack features. To verify the effectiveness of the proposed method, a series of comparative experiments and ablation experiments were conducted on the two public datasets of DeepCrack and CRACK500. The experimental results show that the comprehensive performance of the proposed method is better than the six comparison methods including DeepCrack and Swin-UNet. Specifically, on the DeepCrack dataset, the F1 score increased by 2.30% and 2.51% respectively, while on the CRACK500 dataset, it increased by 1.65% and 1.00% respectively.
  • TAN Zhong-Xia, LIU Qi-Kun, JIANG Cui-Ling, WAN Yong-Jing
    Accepted: 2025-06-20
    Due to the prolonged examination time and limited therapeutic time window for stroke, the development of a rapid and highly accurate medical image segmentation model for stroke is of significant importance for clinical diagnosis. The U-Net architecture based on Mamba, known for its low complexity and capability to handle large-scale images, has garnered widespread attention in the field of medical image processing in recent years. The fractional Fourier transform can convert signals into arbitrary fractional domains between the spatial and frequency domains, allowing the observation of features that are not prominent in the spatial or frequency domains. Therefore, by introducing the fractional Fourier transform, lesion characteristics can be observed in the fractional domain. Based on the fractional Fourier transform and the Mamba network, a novel model named FRFTMamba-UNet is proposed for stroke medical image segmentation. This model incorporates the fractional domain into the Mamba network and designs a multi-level residual module connected to the U-Net encoder. Additionally, a hierarchical feature extraction strategy is implemented in the U-Net-like network, where different feature extraction modules are designed for the shallow and deep layers. Specifically, residual convolutions based on convolutional neural networks are added to the shallow layers to effectively extract shallow features, while the Mamba architecture is utilized in the deep layers to further extract deep features. The proposed method demonstrates superior accuracy and efficiency compared to existing state-of-the-art models based on the Mamba module across three stroke datasets: AISD, ATLAS, and ISLES22. On the AISD dataset, it achieves a Dice score of 64.27%; on the ATLAS dataset, it achieves a DSC score of 62.24%; and on the ISLES22 dataset, it achieves a DSC score of 85.24%.
  • Gao Xiong, Gou Guanglei, Zhou Linjie, Jia Penghao
    Accepted: 2025-06-20
    In fine-grained image classification tasks, sufficient samples can provide rich local feature information. However, in few-shot scenarios, data sparsity makes it difficult for the model to fully capture discriminative local information. To address this issue, a few-shot learning method integrating axial attention and a scale-aware mechanism is proposed. First, a frequency-adaptive feature selection module is designed to reduce interference from background noise and non-target regions, highlighting discriminative local features and thus increasing the feature separability between different categories. Second, an axial-scale joint enhancement module is constructed to integrate global contextual information, focus on key regions, and process features with different receptive fields in parallel, improving the representation capability for details at various scales. Finally, a dual similarity measurement module is adopted to guide learning through two similarity measurement methods, enhancing the generalization of features and reducing the bias toward specific features. On the public datasets CUB_200_2011 and Stanford Dogs, the proposed method improves classification accuracy by 1.4 and 1.45 percentage points in the 1-shot and 5-shot scenarios, respectively, and by 1.86 and 3.49 percentage points on the Stanford Cars dataset. In the 1-shot scenario, it achieves state-of-the-art performance, while in the 5-shot scenario, it also achieves competitive results. Experimental results demonstrate that the proposed method effectively improves the performance of fine-grained image classification under few-shot settings and better captures discriminative feature information.
  • Kewei Zhang, Xin Wen, Wenhui Zhang, Rui Cao
    Accepted: 2025-06-20
    Drug development is a complex, costly, and low-success-rate process. Molecular property prediction is a fundamental yet challenging task in drug development, and accurately predicting molecular properties can accelerate the process and reduce costs. With the advancement of machine learning, particularly deep learning, significant progress has been made in molecular property prediction. However, many existing methods rely on single molecular representations or fail to integrate the potential relationships among multi-dimensional representations. Therefore, this study proposes a novel molecular property prediction method—the Multi-Representation Fusion Model for Molecular Property Prediction (MRFP). It innovatively designs a molecular representation fusion algorithm that integrates two distinct types of molecular representations: molecular fingerprints and molecular graphs, thereby generating a more comprehensive and detailed molecular representation, which provides more accurate input for molecular property prediction. Furthermore, to better extract features in molecular graphs, we have designed a novel molecular graph readout module named the Tri-Step Convolutional Readout Module (TCNN) based on molecular characteristics, which effectively captures the information expressed in molecular graphs. Experimental results on six classification datasets and three regression datasets from MoleculeNet demonstrate the effectiveness of our method, achieving an average improvement of 2.8% in classification metrics and a reduction of 0.47 in regression metrics. This research not only provides a new solution for molecular property prediction but also offers strong support for molecular design and screening in drug development, with broad application prospects and potential.
  • PI Chengdong, HU Bin
    Accepted: 2025-06-19
    Using traditional computer vision technology to resolve collision detection in complex scenes is a very difficult task, especially when faced with false collision interference, the model has a high false alarm rate and low accuracy. To address this problem, based on the hierarchical structure of the mammalian retina, this paper uses the danger perception characteristics of neurons in the Polysensory Zone (PZ) the precentral gyrus of the primate cerebral cortex to a specific visual area, and proposes a bio-inspired Enhanced Collision Detection Neural Network (ECDNN) that could effectively reduce false collision interference. This network consists of a presynaptic subnetwork and a postsynaptic subnetwork. Among them, the presynaptic subnetwork is based on the hierarchical processing and step-by-step transmission characteristics of mammalian retinal information, and divides the dynamic focus receptive field from the global Focus of Expansion (FOE) to obtain key visual information from low-order visual information perception. The postsynaptic subnetwork integrates the membrane potential excitation intensity caused by the approaching visual stimulus in the focus receptive field, and outputs an alarm signal representing the imminent collision danger. Experiments show that the model can not only effectively filter false collision interference in complex scenes and reduce model false detection, but also improve the accuracy of collision detection to over 96%, which can provide an important foundation for building future artificial intelligence interactive systems.
  • ZHENG Kun, ZHANG Ziyan, LI Xiaoli
    Accepted: 2025-06-19
    The measurement of physiological parameters from facial videos in online education is currently a research hotspot in intelligent education. Traditional remote photoplethysmography (rPPG) cannot adapt to the changes in the lighting environment in online education scenarios, which affects the flexibility and accuracy of physiological parameter measurement based on facial videos. Aiming at the typical lighting scenarios in online education, a method for extracting blood volume pulse (BVP) signals based on lighting adaptability is proposed, and a dual correction model for BVP signals is constructed by combining a generative adversarial network (GAN) and a convolutional neural network (CNN). Firstly, the optimal solution of the orthogonal chrominance signal under different lighting conditions is calculated based on the simulated annealing algorithm. At the same time, a lighting scene prediction mechanism for classifying lighting scenes using the average gray intensity is established to achieve the optimal chrominance signal that adapts to the lighting scene. Furthermore, the GAN and CNN models are combined to perform dual correction on the BVP signal to ensure that the finally output physiological parameters are more accurate and reliable. The model is verified on four publicly available datasets reorganized for typical educational scenarios. The experimental results show that the root mean square error (RMSE) of the heart rate is reduced by an average of 8.3 bpm, demonstrating the robustness and accuracy of the model under different lighting conditions. This model has significant advantages in improving the accuracy of heart rate and heart rate variability prediction, and can provide effective support for contactless physiological parameter detection in complex lighting environments.
  • SHI Kangwei, CHAI Yidong, QIAN Yang, JIANG Yuanchun, LIU Yezheng
    Accepted: 2025-06-19
    Web Application Firewall (WAF) is an effective tool for protecting web applications from cyberattacks. The rapid development of web applications in recent years has made research on WAF increasingly significant. Common approaches to building WAF include rule-based methods and machine learning-based methods. Rule-based WAF detect attacks using a predefined set of rules, which are often complex and challenging to update dynamically or manually. Machine learning-based WAF, primarily utilizing methods such as Support Vector Machines, classify payloads but struggle to identify sudden malicious payloads as effectively as rule-based methods and lack the breadth of coverage provided by rule-based approaches. To address these limitations, this paper proposes a WAF enhancement method based on pretrained language models, which strengthens rule-based WAF. The method first fine-tunes a pretrained language model using collected malicious and benign payloads to endow it with preliminary discriminative capabilities. Subsequently, the model undergoes iterative fine-tuning using malicious payloads intercepted by the WAF to learn the textual features of these payloads. During deployment, the pretrained language model is positioned in front of the WAF to perform initial payload screening. Additionally, returning deceptive responses to some requests intercepted by the pretrained language model further enhances the effectiveness of the proposed method. Adversarial experiments were conducted on two open-source WAF, targeting SQL injection and cross-site scripting attacks with two attack methods. The results demonstrate that the average interception rates for payloads generated by the two attack methods increased from 40.01% and 36.07% to 96.91% and 97.13%, respectively, after enhancement with the pretrained language model, while maintaining a false positive rate of 0. These findings validate the effectiveness of the proposed method.
  • Zhang Hang, Wang Jinsong
    Accepted: 2025-06-13
    For user devices (UD) with limited computing resources, handling computation-intensive tasks is quite challenging. Edge computing helps by extending computational resources to the network edge, and one of its key enabling functions is the efficient offloading of tasks. Coordinating the computational resources of numerous edge nodes for task offloading, while ensuring data security during the offloading process, is a significant challenge. Therefore, a task security offloading method based on deep reinforcement learning (DRL) is proposed. First, an edge computing network model is constructed, and a variable security protection mechanism is designed to adaptively ensure data security. Then, the edge computing network model and objectives are formalized and further transformed into a Markov decision process (MDP). Finally, a DRL method based on a penalized action space is proposed to derive the optimal task offloading strategy. Simulation results show that the proposed method can reduce latency and energy consumption costs while ensuring security protection, and consistently maintain a zero task loss rate.
  • WANG Guanyu, GU Yijun
    Accepted: 2025-06-11
    In the field of malicious encrypted traffic classification, algorithms enhance the richness of learning discriminative representations by increasing the dimensionality of traffic features. However, challenges persist, such as the mismatch between selected models and the characteristics of malicious encrypted traffic data, insufficient feature selection, and a lack of in-depth discussion on the characteristics of encrypted traffic data. To address these issues, a classification method based on multi-representation fusion is proposed for the domain of IoT malicious encrypted traffic classification. On one hand, an abstract representation learning module is used to learn packet-level byte association representations and session statistical representations of traffic sessions. On the other hand, a plaintext representation learning module is employed to learn session connection representations of unencrypted plaintext. Finally, the classification results of the two modules are fused based on the confidence scores of the abstract representation learning module to obtain the final malicious traffic classification result. To validate the method's advancement, its performance is compared with 7 benchmark methods based on different methods. The method achieves an F1 score of 0.7694, significantly outperforming other existing benchmark methods. Additionally, to discuss and validate the adaptability of each module to traffic representation learning and the complementarity between the discriminative representations contained in the selected features, 10 variant models based on different inputs and model architectures are generated and compared. The results demonstrate that the proposed method has superior detection performance, proving the adaptability of the model architecture and the complementarity between the representations.
  • SHEN Xianhao , GU Ling , CHEN Yi , YANG Jiazhi
    Accepted: 2025-06-06
    With the accelerated integration of renewable energy into the grid and the intelligent transformation of the new power system, the Power Internet of Things (PIoT) has become key to realizing the intelligence of power systems. However, Power Internet of Things Devices (PIoTD) in remote areas face numerous challenges, including inadequate network coverage, limited energy harvesting, and poor communication conditions. To address these issues, a cloud-edge-device cooperation framework based on artificial intelligence is processed, which employs Unmanned Aerial Vehicle Simultaneous Wireless Information and Power Transfer (UAV-SWIPT) to provide continuous energy to energy-constrained PIoTD. Energy replenishment and communication relay frameworks for SAG-PIoT devices are facilitated by deploying SWIPT services on UAVs in a low-altitude network within the space-air-ground network. Furthermore, to optimize the collaborative work of multiple UAVs and enhance data relay, transmission power allocation, Global Energy Efficiency (GEE), and PIoTD association scheduling, a multi-agent deep reinforcement learning algorithm is introduced to tackle the problems of incomplete global information and high-dimensional variable coupling in dynamic environments. The simulation results show that the proposed algorithm converges faster and demonstrates superior energy efficiency compared to several other benchmark algorithms. On the other hand, in terms of maximizing the minimum transmission rate, MADDPG achieves the highest performance, reaching bits/s. Additionally, it is observed that the optimal SWIPT power splitting ratio is approximately 0.7, and the GEE is the highest.
  • YUAN Lining, FENG Wengang, LIU Zhao
    Accepted: 2025-06-06
    In order to solve the problems of current academic paper classification methods, which neglect the relational information, we propose a novel classification model that integrates Graph Convolutional Networks (GCN) with contrastive learning, called Contrastive Graph Convolutional Network (CGCN). Firstly, we define two distinct types of homogeneous-heterogeneous relational information based on the content and citations of the papers, transforming these into self-supervised information for constructing the contrastive loss. Secondly, we enhance the feature extraction process of GCN by employing contrastive loss, pushing homogeneous papers to be close to one another while ensuring that heterogeneous papers remain distant. Thirdly, we utilize cross-entropy loss and the softmax function to complete end-to-end academic paper classification. On three benchmark academic datasets, the CGCN outperformed advanced baselines in classification task. Micro-F1 and Macro-F1 are raised by 8.29% and 7.91% respectively compared to the original GCN on the Cora dataset. CGCN enhances the capacity to represent potential information in papers by employing a contrastive loss based on the homogeneous-heterogeneous relationship, thereby improving prediction accuracy and generalization. This approach provides innovative ideas and methods for research in academic paper classification.
  • CHEN Haixiu, CHEN Ziang, FANG Weizhi, LU Haitao, HUANG Zijie, CHENG Rong
    Accepted: 2025-06-05
    Dense pedestrian detection is one of the key problems in the development of crowd flow monitoring system in large public places. Aiming at the difficulty of small target detection caused by crowd occlusion in dense pedestrian detection scenes and the deployment requirement of lightweight model, this paper proposes an improved YOLOv8-n dense pedestrian detection model CAD-YOLO(CGDown-Adaptive Fusion Module-Dyhead). Embedded CGDown subsampling module, through an efficient context information extraction mechanism, effectively alleviates the problem that the traditional target detector is easy to lose context features when dealing with dense scenes, and significantly enhances the ability to capture dense pedestrian features and focus on small targets. A BiFPN-Adaptive structure was designed and the neck network was reconstructed. By adaptive fusion of feature information of different scales, the model was more accurate in extracting features of obscured pedestrians and small and medium-sized target pedestrians, and the number of parameters and calculation cost of the model were greatly reduced. The dynamic detection head Dyhead, combined with the new 160×160 small target detection layer, enables the model to capture the fine features of the dense small target area more accurately, thus effectively alleviating the problem of missing detection in the occlusion scene. The experimental results show that compared with YOLOv8-n, the detection accuracy of CAD-YOLO on Crowd Human dataset and WiderPerson dataset is improved by 5.1% and 2.1%, respectively. Despite the significant performance improvement, CAD-YOLO has a reference count of only 2.9M and a model compute capacity of 12.3GFLOPs, meeting the requirements of low power consumption and high precision when deployed on edge devices or mobile devices.
  • LIU Tao, Man Dapeng, XU Chen, LV Jiguang, FENG Zhu, ZENG Fanyi, ZHOU Xue, YANG Wu
    Accepted: 2025-06-05
    Conventional clean label backdoor attacks often fail to establish a strong link between the trigger and target class, resulting in a low attack success rate, and extensive experimental experience shows that this failure is even more severe in federated learning. The main reason for the failure of the attack is that the random selection of the trigger makes it lack a direct connection with the target class. To this end, a learnable trigger backdoor attack was designed for federated learning, which made full use of the task information and shared model issued by the central server to train a trigger that was strongly correlated with the target class, and formalized this training process into a dual-objective optimization problem and solved it. Found the optimal perturbation under constraint conditions to blur the original features of the image as much as possible, thereby maximizing the model's learning ability for the trigger; these blurred images were then trained by adding the triggers allowed within the specified range as inputs, minimizing their image classification loss and generating the optimal trigger quickly using the optimization method of small-batch projected gradient descent. The backdoor attack activated with this trigger still guaranteed excellent attack performance in federated learning. Experimental results on three datasets showed that the attack success rate of the proposed method in federated learning was much higher than that of all kinds of existing clean label backdoor attacks, especially on CIFAR-10, which had an improvement of more than 82% compared to the baseline method. The proposed attack method presents new challenges to the security of federated learning.
  • Li Junliang, Ma Junpeng, Liu Mengxuan, Liu Yuxue, Zhang Junsan
    Accepted: 2025-06-03
    Medical report generation from images is challenging due to low image contrast and the small size of abnormal regions, making it difficult to accurately capture abnormal features using visual information alone. Therefore, introducing external knowledge to enhance visual representation becomes a key issue. In addition, the co-occurrence patterns of abnormal features are complex and cannot be effectively captured from a single instance, making it crucial to leverage similar cases to model such patterns. To address the aforementioned challenges, a Similar-Instance Guided method for medical report generation is proposed, consisting of two main components: Image Feature Memory Module Incorporating Heterogeneous Graphs(FMHG) and Similar Instance Feature Fusion Module(SIFF). FMHG extracts entity relationships from the report and constructs a corresponding heterogeneous graph as a bridge, guiding the model's attention to the abnormal regions of the image, thus enhancing abnormal visual features. SIFF retrieves similar instances and integrates their abnormal visual features, thereby augmenting the representation of abnormal regions while acquiring a more comprehensive under-standing of the abnormal information. Experiments conducted on the IU X-ray and MIMIC-CXR medical imaging datasets demonstrate that the proposed method performs well on the BLEU evaluation metrics, achieving BLEU-1 to BLEU-4 scores of 0.539, 0.353, 0.265, and 0.193 respectively on the IU X-ray dataset. Additionally, it excels in METEOR and ROUGE-L metrics, indicating that the proposed method outperforms existing methods in terms of NLG metrics as well as the accuracy and completeness of the generated reports.
  • Hu Wei, Chen Yuner, Du Puliang
    Accepted: 2025-06-03
    Aiming at the low efficiency of parameter optimization of Variational Mode Decomposition (VMD) in current short-term electricity price prediction methods, the insufficient feature expression ability of single prediction models, and the problem of feature redundancy, this paper proposes a short-term electricity price prediction method based on Multi-Strategy Improved Crested Porcupine Optimizer (MSICPO) algorithm and deep learning. First, the Crested Porcupine Optimizer (CPO) algorithm is improved by introducing Lévy flight strategy, periodic population variation, and dynamic parameter adjustment mechanism to enhance its global search ability and convergence speed. It is used to optimize the modal number and penalty factor parameters of VMD to improve the accuracy of signal decomposition. Second, a deep learning model integrating feature weighting is constructed. By designing a dynamic weighting module to suppress noise interference and enhance the impact of key features, combined with the long-term dependency capture ability of sLSTM and the parallel computing advantage of Transformer, multi-scale feature collaborative optimization processing is realized. Finally, the MSICPO-VMD-WF-sLSTM-Transformer hybrid model is constructed for electricity price prediction. Experimental results show that the Multi-Strategy Improved Crested Porcupine Optimizer algorithm achieves a refined balance of optimal solution precision and optimization efficiency in VMD parameter optimization compared with the original CPO algorithm and other traditional optimization algorithms. The proposed hybrid forecasting model performs well in prediction accuracy, with a coefficient of determination reaching 0.95. In addition, cross-regional data prediction experiments further verify the applicability and generalization ability of the model in different regional electricity markets. The method proposed in this paper not only provides theoretical references for the improvement of intelligent optimization algorithms and multi-feature prediction technologies, but also offers a high-precision and strong generalization solution for short-term electricity price prediction in complex electricity markets.
  • GENG Xia, LIN Xianwen, YANG Zhi
    Accepted: 2025-06-03
    In text-based person search tasks, initializing models with parameters from pre-training models has become a mainstream paradigm, which effectively alleviates the feature alignment bottleneck of single-modal models caused by the lack of cross-modal information. Existing methods focus on mining semantic features at different scales in the image-text joint embedding space for optimization. However, the introduction of the new alignment paradigm is prone to cause the pre-training model to fall into local minimum during fine-tuning. To solve above issues, this paper proposes a Prompt-based Information Transfer (PIT) framework. By introducing cross-modal prompt tokens in the original forward process of the single-modal encoder and the cross-modal image-text encoder, it promotes early feature fusion and implicitly guides the model to focus more on modal-invariant information. PIT includes a prompt-based contrastive loss and a prompt training strategy. The prompt-based contrastive loss aims to construct a shared feature embedding space with both intra-modal discrimination and inter-modal semantic consistency by constraining the similarity between graphic and text features. The prompt training strategy can be regarded as a form of self-distillation, which treats the pseudo-targets generated by non-prompt features and ground-truth as another view of image-text pair, supervising the training process and making the learned embeddings contain richer multi-modal information. Only 0.61M additional parameters introduced on the basis of fine-tuning, PIT achieves Rank-1 improvements of 1.48%, 1.5%, and 1.55% on three public datasets, respectively.
  • GU Yingshuang , GUI Tao , ZHANG Qi
    Accepted: 2025-06-03
    Large language models (LLMs)’s factual hallucination refers to the generation of content that conflicts with established real-world facts, significantly reducing model credibility and applicability in high-risk domains such as healthcare, law, and scientific research. Current methods for hallucination mitigation primarily depend on input optimization, supervised learning, or integration with external knowledge bases. However, these approaches exhibit limited generalizability, substantial dependence on extensive labeled datasets, and constraints in real-time scenarios, making it challenging to fundamentally improve the factual accuracy of LLMs. To address these limitations, this paper proposes a reinforcement learning-based framework incorporating semantic entropy as feedback to mitigate factual hallucinations. Semantic entropy serves as a precise measure of uncertainty at the semantic level, enabling an accurate assessment of the model's confidence in its generated responses. By embedding semantic entropy into the reinforcement learning process as a reward signal, the model is encouraged to proactively avoid responses with a high likelihood of hallucination. Compared to traditional predictive entropy-based methods, semantic entropy more effectively distinguishes semantically equivalent expressions and enhances factual judgment capabilities without reliance on external knowledge sources. Experimental results show that this paper’s method, while maintaining the richness and coherence of the generated content, can improve factual judgment accuracy by up to 5.7% and factual generation accuracy by up to 7.8%, compared to the best baseline model, significantly validating its superiority in factitious hallucination mitigation.
  • ZHANG Lei, LI Shihua, GAO Hao, WANG Xiaoyong
    Accepted: 2025-05-26
    With the escalating energy consumption of urban rail transit system, enhancing the utilization of regenerative braking energy to reduce energy consumption of train operation has become a critical issue. This paper focuses on the optimization problem of tracking train operation control strategy in the process of multi-train cooperative operation. Firstly, building upon the traditional transition strategy of operation mode, the strategy of “Traction-Coasting-Traction-Cruising-Coasting-Braking” is proposed specifically for the tracking operation scenario. Secondly, the train dynamics model in spatial-domain, state transition equation, and energy consumption model are constructed. By employing interpolation method, the cooperative operation problem in time-domain is transformed into the problem of solving optimal switch points in spatial-domain. Subsequently, an optimization decision-making model with the goal of energy consumption and punctuality is constructed, which is then efficiently solved by using the Dung Beetle Optimizer. Finally, taking the Yizhuang Line of Beijing Subway as the simulation line, comparative analyses are conducted to evaluate the influence on optimization performance of Communication-Based Train Control (CBTC) and Train Autonomous Control System (TACS) architectures, as well as different transition strategies. The results demonstrate that TACS significantly enhances the optimization performance of cooperative operation, compared to CBTC. The proposed strategy not only meets punctuality requirement but also outperforms the traditional strategy in energy consumption at various departure intervals. The net absorbed energy consumption can be increased by 14.651 kWh at most, and the actual operational energy consumption can be decreased by 11.284 kWh at most. Therefore, the proposed operational mode transition strategy and optimization method effectively improve the energy consumption of train operation, and have certain reference significance for the development of urban rail train operation control technology. The code has been published in Github: https://github.com/eva-777/Tracking-Train-Operation-Optimization.git.
  • Zukun Wan, Runming Wang, Tianming Ma, Xingdong Song, Shengrong Yuan, Yajun Ding
    Accepted: 2025-05-23
    视觉问答(Visual Question Answering, VQA)理解和解析输入图像及其对应的文本问题,进而提供与问题相关的自然语言答案,已成为跨模态分析领域一个前景广阔的研究方向。现有工作极大程度上依赖于数据集的一些因素,如伪相关、数据集偏差和捷径学习,都对算法鲁棒性带来了极大的挑战。现有基于集成学习的方法通过训练偏差模型捕捉数据集偏差,但由于偏差模型对偏差样本的识别能力不足,导致其难以充分学习偏差信息,进而削弱去偏效果。为了增强偏差模型学习数据集偏差的能力,本文针对 VQA 任务提出了一种自适应偏差学习网络(命名为 ABLNet)。ABLNet 的核心设计包括: 首先,提出了一种自适应的样本重加权机制,基于每个样本的梯度信息动态分配权重,从而增强模型对数据集中偏差特征的学习,提升模型的泛化能力。其次,提出了一种基于受限学习的网络剪枝策略,通过限制偏差模型的学习能力,使其依赖于数据集中的表面相关性和偏差特征。在 VQA-CPv1、VQA-CPv2 和 VQA-v2 这些具有挑战性的 VQA 数据集上进行了大量实验,实验结果证明了我们方法的优越性。
  • CAO Xiaofei, WANG Runmin, CUI Lingxin, CHAI Xinling, Ding Yajun, Han Chang
    Accepted: 2025-05-23
    Breast ultrasound image segmentation plays a significant role in computer-aided diagnosis, but existing methods are constrained by the bottleneck of scarce annotated data. In recent years, generative models have demonstrated potential in medical image synthesis, yet current approaches struggle to simultaneously ensure image realism and mask semantic consistency. To address the performance bottleneck of segmentation models caused by the limited scale of ultrasound image datasets, this paper proposes an innovative ultrasound image dataset augmentation method. First, from a pathological perspective, we design a mask generation module based on the characteristics of benign and malignant tumors, which efficiently generates multiple semantically plausible masks. Next, to synthesize ultrasound images corresponding to these masks, we propose a Mask-guided Diffusion Model (MDM). This model incorporates mask information into the denoising network of the diffusion model through normalization methods, thereby generating ultrasound images that exhibit high semantic consistency with the masks. Experimental results demonstrate that the proposed method significantly outperforms mainstream generative models in terms of image fidelity (FID) and semantic alignment (mIoU). By validating the strategy of incrementally generating data, the performance of segmentation models improves markedly with increasing data volume, proving the effectiveness of the synthesized data.
  • Kai Chen, Zhihua Chen, Lei Dai
    Accepted: 2025-05-22
    Multi-agent Deep Deterministic Policy Gradient Algorithm (MADDPG) alleviates the problem of environmental non-stationarity by introducing global information when solving multi-agent path planning problems. However, in complex environments, multi-agent reinforcement learning algorithms still have shortcomings such as sparse rewards and low levels of agent collaboration. To solve these problems, a multi-agent path planning algorithm based on state action prediction (SA-MADDPG) is proposed. In SA-MADDPG, a Novelty Reward Module based on Long Short-Term Memory network is designed, which can give novel reward values to the agent without relying on current observations and actions to alleviate the problem of reward sparseness. In addition, an Action Prediction Module is designed by explicitly incorporating collaborative information, and a dynamic weight term based on Q-value gain to guide the agents in balancing the optimization of its own task strategy with the optimization of collaborative task strategies, thereby enhancing the level of collaboration among agents. Finally, a three-dimensional multi-agent path planning simulation environment based on drones is constructed to comprehensively evaluate the performance of the proposed algorithm. Experimental results show that the average reward and average episode time of SA-MADDPG: in the obstacle density experiment, they increased by 5.26%-15.81% and decreased by 10.96%-16.05% respectively; in the agent number experiment, they increased by 16.32%-22.9% and decreased by 15.03%-25.15%.
  • TIAN Qing, SHEN Junyu, YU Jiangsen
    Accepted: 2025-05-22
    Unsupervised Domain Adaptation (UDA) aims to migrate knowledge from the labeled source domain to an unlabeled target domain to improve the performance of the target domain model. However, traditional UDA methods assume that the category spaces of the source domain and target domain are entirely consistent, making it impossible to handle unknown categories in the target domain. This limitation restricts their application in real-world scenarios. Open-Set Domain Adaptation (OSDA) addresses this issue by introducing recognition of unknown categories, but effectively reducing inter-domain differences and category imbalance remains a significant challenge. Existing OSDA methods often overlook domain specific features and simply minimize domain differences. This can lead to unclear boundaries between categories and weaken the model’s generalization ability. Therefore, to address this problem, this paper proposes Open-Set Domain Adaptation with Optimal Transport Distance Regularization and Neighborhood Clustering (OTRNC). This method maximizes the distribution distance between high and low confidence sample sets using optimal transport distance regularization, thereby reducing the interference of unknown categories in the domain adaptation process. Subsequently, dynamic nearest neighbor retrieval and invariant feature learning are employed to reduce intra-class variations within the target domain, enhancing feature generalization capabilities. Experimental results show that OTRNC performs well across multiple benchmark datasets.
  • Gao Lingping, Xu Wei, Chen Xi, Mu Yibo, Zhang Kai
    Accepted: 2025-05-22
    As software scale and complexity grow exponentially, monitoring and analyzing program runtime behavior has become increasingly challenging. Dynamic binary instrumentation is an effective solution to this problem, with mature tools like Pin and Valgrind supporting mainstream architectures such as x86 and ARM. However, these tools lack support for emerging domestic instruction set architectures, such as LoongArch. LoongArch, a self-developed instruction set architecture in China, exhibits high levels of autonomy, advancement, and compatibility. Nevertheless, due to its relatively short development history, its ecosystem remains incomplete, particularly in the debugging toolchain. To address this gap and promote the maturation of the LoongArch ecosystem, developing a dynamic binary instrumentation tool for LoongArch is of significant importance. This study aims to design and implement a dynamic binary instrumentation tool based on the QEMU framework to support program monitoring and analysis on LoongArch. The tool, modeled after Pin, implements five fundamental instrumentation granularities and related APIs, along with over 20 instrumentation tools for direct use or as learning resources for tool development. To enhance performance, the framework was optimized through improvements in conditional jump instruction translation, basic block linking, and instrumentation inlining. Performance tests demonstrate that the optimized framework achieves over 100 times improvement in instruction-level instrumentation efficiency and nearly 33 times improvement in basic block-level instrumentation efficiency. Finally, the source code has been open-sourced on GitHub to facilitate the further development of the LoongArch ecosystem and provide a reference for researchers in related fields.
  • FENG Tao, HU Bin, XU Guangyuan
    Accepted: 2025-05-22
    Crowd escape behavior in public places is easy to cause serious public safety disasters. Traditional computer vision technology can detect a few characteristics of crowd escape behavior, but it is difficult to face complex dynamic visual scenes. To address this issue, based on the structure characteristics of locust visual nerve, the danger perception mechanism of locust Lobula Giant Movement Detector (LGMD) and mammalian retinal luminance adaptation mechanism, this paper proposes an Enhanced Crowd Escape Detection Neural Network (ECEDNN). The proposed neural network collects the luminance changes caused by crowd activities in the field of view. With the help of the mammalian retinal luminance adaptive mechanism, the visual response excitation is tuned to adapt to the lighting scene. Visual excitation and suppression are mixed to filter background noise and center-surround mechanism was used to enhance motion edges. Finally, neural spike adaptive tuning is used to detect the burst escape behavior of the crowd and output strong membrane potential excitation. This work is involved the research of crowd activity detection inspired by biological visual perception mechanism, which can provide new ideas and methods for crowd behavior activity perception and anomaly detection in artificial intelligence.
  • HU Caifu, WEI Bo, REN Ruibin
    Accepted: 2025-05-22
    As the network environment continues to evolve and internet applications emerge, machine learning classifiers trained on previous traffic data are becoming increasingly less adaptable to new sample spaces. This leads to a decline in the identification capabilities of classification models, which cannot meet the growing demands of network services and network security. Manually updating classifiers based on experience requires a significant amount of effort and does not guarantee the generalization performance of the new classifiers. At the same time, the continuous influx of new data poses a severe challenge to balancing model training accuracy with computational resource storage. Considering this, this paper innovatively proposes an incremental learning strategy spatial optimization technique to achieve efficient network traffic classification. First, by optimizing the spatial distribution of new and old traffic samples, clusters of new and old categories are kept at a minimum interval, avoiding distribution conflicts between new and old tasks due to sharing the same feature space. Then, within the optimized feature space, a small amount of old data samples are replayed, and knowledge distillation technology is combined to maintain the stability of the original model parameters, adjusting only the extended part of the model to update the classifier at the minimum cost. Experiments on the USTC-TFC2016 dataset show that, compared with other methods, the proposed method in this paper demonstrates higher stability and effectiveness in terms of model accuracy, resource consumption, performance, and ablation experiments.
  • XIE Qingqing, LIU Yuanyuan
    Accepted: 2025-05-20
    In the field of cybersecurity, phishing attacks are becoming increasingly complex and frequent. Traditional phishing detection schemes based on predefined reference templates rely on brand-domain mapping lists, using visual feature matching to identify brand intent and verify domain consistency for explainable detection. While these methods can counter zero-day phishing attacks, they face scalability challenges due to the need for continuous updates to reference lists to cover emerging brands, leading to high maintenance costs. To address these, the paper proposes Phish-RAGLLM, a novel reference-based phishing detection scheme leveraging Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG). By reframing traditional visual problems into language tasks, Phish-RAGLLM eliminates reliance on predefined reference lists, utilizing LLMs' extensive brand knowledge while enhancing generation capabilities through RAG integration with external brand knowledge bases. This approach effectively mitigates LLM hallucination issues and improves detection precision and robustness. Experimental results demonstrate that compared to the current state-of-the-art model PhishLLM, Phish-RAGLLM—using GPT-3.5-turbo-instruct as the main LLM—balances model performance, inference cost and knowledge base completeness, achieving 5.88% increase in F1score and a 12.5% improvement in operational efficiency. Moreover, it shows strong robustness against dataset variations and prompt injection attacks. Based on the characteristics of LLM, Phish-RAGLLM exhibits good adaptability to multilingual phishing websites, effectively detecting phishing webpages in different linguistic contexts. Furthermore, real-world evaluations reveal that Phish-RAGLLM has broader detection capabilities than VirusTotal (a threat intelligence source), further validating its feasibility and effectiveness.
  • WANG Chaoyang, SUN Weiwei
    Accepted: 2025-05-20
    Combinatorial optimization problems have important applications in areas such as logistics path planning, but their solution space exponentially expands with the problem size, leading to severe challenges for traditional methods. In recent years, neural combinatorial optimization methods based on reinforcement learning have been able to achieve solution quality close to that of traditional solvers while keeping the solution consumption time short. The mainstream method POMO (Policy Optimization with Multiple Optima) enhances the training stability through symmetry optimization, but its unidirectional sequence generation mechanism still suffers from a double limitation: on the one hand, it is difficult for the traditional constructive method to fully exploit the symmetry features of the problem; on the other hand, the endpoint information can’t effectively participate in the decision-making process of the remote node. To address this problem, this paper proposes a Bidirectional Construction Strategy (BCS)-based POMO model, named BCS-POMO, which dynamically selects the extension direction with higher confidence by constructing the solution sequence in parallel from the start point and the end point, avoiding models that are caught in a dilemma due to unidirectional constructions. The model exploits the symmetry of the construction sequence to achieve weight parameter sharing and improves the efficiency through batch parallel computation. Experiments have shown that the BCS-POMO effectively reinforces the role of endpoint information as a decision aid in the construction process, which reduces the error by 16% and 18% for the traveling salesman problem (TSP) and the capacitated vehicle routing problem (CVRP), respectively, verifying the effectiveness of the bidirectional construction strategy in exploiting the endpoint information and the advantages of symmetry modelling.
  • Guo Ziyun, Tian Youliang, Li Mengqian
    Accepted: 2025-05-19
    Federated learning leverages client data resources to collaboratively train a global model, whose performance depends on the quality of client data and their level of participation. Clients expect to receive appropriate compensation after contributing high-quality data to enhance their motivation for participation. Additionally, since the local model parameters uploaded by clients contain information about private data resources, they face the risk of privacy leakage. To address these challenges, this paper proposes an incentive-based adaptive privacy-preserving federated learning scheme. First, a pre-decision game auction mechanism is designed to ensure that clients truthfully report their costs while achieving Nash Equilibrium (NE). Second, a training quality evaluation algorithm is developed based on training time and model loss, which determines client compensation according to the overall training quality evaluation score, thereby incentivizing high-quality data contributors to participate in training. Finally, an adaptive differential privacy technique is employed to perturb local model parameters, enhancing model utility through dynamic noise allocation. Theoretical analysis demonstrates that the proposed scheme satisfies security and privacy protection requirements, while experimental results validate its effectiveness.
  • CHEN Xinluo, ZHAO Shuang , CAO Fang
    Accepted: 2025-05-19
    With the development of multimedia technology, the difficulty of unauthorized forgery and dissemination of false information has greatly decreased. This may lead to a series of negative consequences. Effective content authentication algorithms are urgently needed to ensure the authenticity and security of image content. In recent years, perceptual image hashing has shown excellent performance in the field of image authentication. However, existing algorithms are not ideal for processing images with a large proportion of text, and they can not effectively cope with new content-preservation manipulations such as scribble. Therefore, a text-picture mixed image content authentication algorithm based on perceptual hashing is proposed. The proposed algorithm adopts the image segmentation algorithm of ring partition, and it calculates the frequency and distribution characteristics of SIFT key points within each ring. These features have rotation invariance and can effectively improve the anti-collision performance of the proposed algorithm. By obtaining key point information, the proposed algorithm performs good robustness performance against content-preservation manipulations, including irregular scribble. A Text-Picture Mixed Image (TPMI) dataset is constructed to validate the performance of the proposed algorithm. Compared with some representative algorithms, this algorithm has better performance in perceptual robustness, anti-collision, and security. Partial tampering with images can effectively identify each tampered image as similar to the original image. In addition, experiments on scribble attacks are constructed in reality, and the results show that it can effectively identify such attack images.
  • ZHU XingPo, WANG Xiaoyang
    Accepted: 2025-05-19
    Bi-triangle (6-cycle) enumeration in bipartite graphs is essential for graph analysis tasks like local clustering coefficient computation. As real-world bipartite graph data scales beyond single-machine capacity, efficient distributed algorithms are needed. However, the existing distributed graph partitioning (GP) enumeration algorithm struggles with large subgraph combinations, message overload, and redundant enumeration. In this regard, two optimized algorithms are proposed based on the topological characteristics of bi-triangles: Method 1 views the bi-triangle as three wedge structures, generating subgraphs using wedge groups as the basic unit. A subgraph combination mechanism via A-type and V-type wedge group concatenation is introduced, greatly reducing the number and scale of subgraph combinations, ultimately enumerating bi-triangles through wedge triplet. To prevent message overload and redundancy, a subgraph reading mechanism via a distributed storage system and a deduplication mechanism based on vertex ordering are proposed. Method 2 decomposes the bi-triangle into two zedge structures. It first partitions the graph using wedge groups and then applies a “compressed zedge” construction and restoration mechanism for a second partition, ultimately enumerating bi-triangles through zedge pairs with lower computational complexity than Method 1. Experiments show that, compared to GP, Method 1 reduces subgraph data by 205x on average and enumeration time by at least 45x, while Method 2 achieves average reductions of 30x and at least 101x, respectively.
  • You Yiheng, Wang Xin, Ma Menglu, Wang Hui
    Accepted: 2025-05-19
    知识图谱作为人工智能领域的关键数据组织形式,在大数据与大模型蓬勃发展的当下,被广泛应用于众多领域。随着知识图谱规模不断扩大,现有存储结构暴露出数据导入速度慢、存储空间占用大等问题。为此,本文提出一种“关系型+键值对”的混合存储方案(KGHS),并设计基于属性频率的实体聚类算法。KGHS借助基于属性频率的实体聚类算法,对不同属性频率的实体簇进行分类。对于高频属性,利用关系型数据库存储,发挥其查询效率高的优势;对于稀有属性,则采用键值对形式存储,以展现键值对存储在处理稀疏数据时的灵活性。这种设计有效规避了关系型存储面对稀疏数据时产生大量空值的弊端,减少了键值对存储中键的重复存储问题,在确保数据灵活性的同时,显著提升了存储效率。在合成数据集和真实数据集上的实验显示,与现有方案相比,KGHS在真实数据集上存储空间节省50%以上,数据导入速度提升一个量级,且查询性能不受显著影响,充分说明KGHS有效地解决了大规模知识图谱的存储难题,为知识图谱在各个领域的广泛应用提供了有力的存储支持,具有重要的理论意义和实际应用价值。
  • Xu Xinhao, Li Ziqi, Yin Hefeng, Zhang Yonghong
    Accepted: 2025-05-19
    The surface texture of a printed circuit board (PCB) is complex, with defects that are small and come in a variety of shapes. In order to accurately detect small targets, smaller-scale detection heads are often added, which has the effect of significantly increasing the computational cost and slowing down the detection speed. To address this issue, we propose a multi-scale feature fusion learning model for PCB small-target defect detection, named PCB-Det. Based on the YOLOv8 architecture, the model replaces the original backbone network with the lightweight PP-HGNet and incorporates the GSPPFCSPC module for multi-level feature extraction, thereby expanding the receptive field to enrich feature information. Furthermore, we have devised the Pro-BiFPN feature fusion network with the objective of enhancing the interaction between features from adjacent layers, thereby optimizing the fusion of shallow detail information and deep semantic information. Furthermore, the model incorporates shared feature branches to reduce the computational burden of the original detection heads and employs the Wise-IoU loss function to dynamically adjust the loss weights, thereby accelerating model convergence. The experimental results demonstrate that the proposed PCB-Det model achieves an average precision of 97.7% on the PCB_DATASET defect dataset, representing a 3.1% improvement over the baseline model. The model effectively reduces both missed detections and false positives, thereby enhancing the detection capability for small-target defects in PCBs.
  • XU Shaoping, WANG Zichao, TANG Yiling, XIONG Silong
    Accepted: 2025-05-16
    The human visual cortex has a hierarchical structure, in which binocular fusion and binocular rivalry first occur in the low-level visual areas. However, current deep learning-based stereoscopic image quality assessment (SIQA) models generally estimate the quality values of stereoscopic images by fusing the features of left and right view images at different levels of the network, resulting in insufficient simulation of the perception in the low-level visual areas of human visual cortex. To address this issue, this paper proposes a SIQA method that simulates binocular rivalry to further enhance the evaluation accuracy. First, we leverage the ability of deep convolutional neural networks to acquire prior knowledge of input image and build a binocular image fusion model based on an unsupervised approach. This model takes the left and right views as learning targets to simulate the binocular fusion process in the human visual system. The gradient magnitude responses of the left and right images are utilized to calculate the image degradation coefficient, which is then used to obtain the fusion weights of the left and right views, simulating the binocular rivalry phenomenon. Then, we utilize a pre-trained ResNet50 model to extract quality-aware features from the fused image and establish a feature-quality mapping model based on support vector regression to estimate the quality value of the stereoscopic image. Experimental results demonstrate that our proposed SIQA method achieves over 0.96 on both Pearson linear correlation coefficient (PLCC) and Spearman The human visual cortex has a hierarchical structure, in which binocular fusion and binocular rivalry first occur in the low-level visual areas. However, current deep learning-based stereoscopic image quality assessment (SIQA) models generally estimate the quality values of stereoscopic images by fusing the features of left and right view images at different levels of the network, resulting in insufficient simulation of the perception in the low-level visual areas of human visual cortex. To address this issue, this paper proposes a SIQA method that simulates binocular rivalry to further enhance the evaluation accuracy. First, we leverage the ability of deep convolutional neural networks to acquire prior knowledge of input image and build a binocular image fusion model based on an unsupervised approach. This model takes the left and right views as learning targets to simulate the binocular fusion process in the human visual system. The gradient magnitude responses of the left and right images are utilized to calculate the image degradation coefficient, which is then used to obtain the fusion weights of the left and right views, simulating the binocular rivalry phenomenon. Then, we utilize a pre-trained ResNet50 model to extract quality-aware features from the fused image and establish a feature-quality mapping model based on support vector regression to estimate the quality value of the stereoscopic image. Experimental results demonstrate that our proposed SIQA method achieves over 0.96 on both Pearson linear correlation coefficient (PLCC) and Spearman
  • PAN Yincang, ZHANG Dong , LI Guanyu, Chen Heng
    Accepted: 2025-05-16
    Knowledge graph embedding techniques map complex semantic information into low-dimensional vector representations, enabling like link prediction and knowledge completion. However, Models are constrained by a single mathematical structure, making it difficult to simultaneously accommodate three-dimensional, direction-sensitive rotations and non-commutative composition. This limits effective joint inference of complex relational patterns. To overcome this, we propose the TransQD knowledge graph embedding model, which integrates quaternion and dual complex embeddings. Addressing the expressiveness bottleneck of single-structure methods, TransQD introduces a collaborative mechanism between quaternion and dual complex embeddings: the quaternion component uses the Hamiltonian product to model three-dimensional, direction-sensitive rotations, capturing spatial interactions between entities; the dual complex component employs non-commutative multiplication to rigorously represent order-dependent relations—such as when reordering in path compositions causes semantic shifts. By weighting each component, the model achieves a complementary effect, covering a broader range of relational patterns. Finally, TransQD demonstrates outstanding performance in link prediction and path query tasks on multiple public datasets, with ablation experiments confirming the necessity of dual-component collaboration.
  • Yuan Huanyu, Fu Jianming
    Accepted: 2025-05-16
    Reverse engineering Android applications not only facilitates the detection of security issues such as privacy leaks and cryptographic misuses in legitimate applications but also supports the analysis of malicious application behaviors. Key challenges in this process include locating cryptographic functions within native binary code, identifying the cryptographic algorithms they employ, and determining their functionalities. Among existing methods for identifying cryptographic functions, dynamic analysis-based methods often achieve high accuracy due to their ability to capture detailed runtime information. However, existing dynamic analysis-based tools are primarily designed for x86/x64 architectures, making them less effective for Android applications, which are predominantly based on the 64-bit ARM architecture. To address this issue, this paper proposes a hook-based method for identifying cryptographic functions in the native code of Android applications. The method first filters suspected cryptographic functions using three types of static features: constant characteristics, computational instruction ratios, and cryptographic instructions. Next, the dynamic instrumentation toolkit Frida is employed to hook the filtered functions and collect runtime information, such as parameters and return values. Finally, the execution results of the hooked functions are compared with cryptographic functions from open source cryptographic libraries to identify their types and functionalities. The proposed method is tested on three popular Android applications. Experimental results demonstrate that the proposed method effectively identifies cryptographic functions in the native code of real-world Android applications.
  • ZHANG Ru, SUN Weifeng, ZHANG Peng, ZHANG Chao, DAI Yongshou
    Accepted: 2025-05-16
    With the continuous updating of electromagnetic signal analysis technology and methods, higher requirements are put forward for the rapid iteration capability of electromagnetic signal analysis software functions. However, due to the high system coupling and strong inter-module dependency of traditional architecture, software maintenance is difficult and the update efficiency is low, which makes it difficult to meet the expansion requirements of new functions. In order to solve the above problems, a plug-in-based electromagnetic signal analysis software architecture design method is proposed, aiming to reduce the system coupling through plug-in design and improve the scalability and maintenance efficiency of the software. First, according to the functional requirements analysis of the software, the software architecture design principles are formulated, and on this basis, the hierarchical modular overall architecture of electromagnetic signal analysis software is designed. Then, based on the idea of "platform + plug-in", the platform extension interface and standard plug-in interface are designed to standardize the development and integration of plug-ins. At the same time, based on the dynamic plug-in loading mechanism of the Qt framework, the plug-in manager is designed and implemented, and the prototype software of electromagnetic signal analysis that supports cross-platform operation is developed. Finally, the scalability of the prototype software is evaluated based on the EMSA measurement method, and tested and verified in actual application scenarios. Experimental results show that the expansion capability of the plug-in electromagnetic signal analysis software is improved by 58.33% compared with the modular architecture, and it exhibits high stability and robustness in actual application.
  • Li Danbo, Yan Xuexiong, Mao Enhui
    Accepted: 2025-05-15
    HTTP protocol is the core infrastructure of Internet communication, and its modern communication model relies on the collaboration of multiple servers. If the servers in the processing chain do not strictly follow the protocol specifications or have differences in semantic interpretation, it will cause semantic inconsistency problems of systemic characteristics, leading to security threats such as access control bypass, multi-host problems, request smuggling and cache pollution. Differential fuzz testing analyzes semantic inconsistency problems by observing the differences in message processing results of different servers. In order to solve the problems of inaccurate field set range, low mutation efficiency and single observation dimension in existing tools, an improved differential fuzz testing method is proposed. First, based on the message construction method of key headers, the core fields are selected to simplify the test space; based on the mutation method of field semantics, the mutation method is designed by combining semantic classification and vulnerability characteristics to enrich test cases; the extended message analysis method expands the message analysis scope to request and response messages, fully observes the communication process, and covers the existing scenarios of semantic inconsistency problems. Finally, tests are conducted on 7 commonly used servers, and 18 types of server processing differences are found and 9 pairs of combinations with semantic inconsistency problems are verified. Compared with similar tools such as t-reqs, it reduces the size of the test set by an order of magnitude, increases the average proportion of valid test cases by 12.67%, discovers two additional types of difference problems from the same observation angle, and expands the test scope to cover four scenarios of current semantic inconsistency problems.
  • LIU Ziyi, SHA Ying
    Accepted: 2025-05-15
    The Vision and Language Navigation (VLN) task aims to guide an agent to move to a target location in a 3D or real-world environment based on language instructions. However, traditional end-to-end VLN algorithms have limitations. When an erroneous action occurs in navigation planning, the agent tends to go into incorrect paths, resulting in an inability to continue following instructions or exploring unnecessary areas. To address this issue, an agent named Nav-Explore is proposed, which is based on a large language model and an exploration module. The agent leverages the reasoning capabilities of the large language model to predict the next action according to the language instructions and current visual information, and uses an exploration module to balance exploration and exploitation. The exploration module employs an epsilon-greedy strategy to toggle between normal navigation and exploration modes. When the random probability is below epsilon, the agent explores possible future paths to assess the feasibility of next actions, thus avoiding wrong decisions. If the probability exceeds epsilon, it directly uses the large language model's output for navigation. This modular design enables the Nav-Explore to effectively enhance navigation success rates and improve the agent’s generalization ability in unseen environments. Experimental results demonstrate that the Nav-Explore achieves superior performance on two outdoor VLN benchmark datasets, Touchdown and Map2seq, significantly increasing navigation success rates. Furthermore, the Nav-Explore also exhibits strong generalization capabilities, effectively completing navigation tasks in different environments.
  • YAN Shize, FANG Zhijun
    Accepted: 2025-05-15
    知识图谱作为一种图结构数据组织方式,为推荐系统提供了更为丰富的语义信息和上下文背景,使得推荐系统能够有效地处理复杂的用户行为和物品特征。现有的基于知识图谱的推荐方法仍然面临诸如信息过度平滑和异常数据处理等问题,尤其是在大规模数据处理的场景中,过度平滑往往导致模型无法捕捉到个性化的用户需求,异常数据的干扰也可能影响推荐结果的准确性和鲁棒性。为此,提出了一种基于用户行为融合特征与异常点检测的知识图谱推荐模型。该模型通过引入用户融合行为特征,有效避免信息过度平滑的问题。且该模型结合了异常点检测机制,通过识别和剔除噪声数据和异常行为,显著提升了推荐结果的准确性和鲁棒性,减少了不良数据对推荐结果的影响。为了验证模型的有效性,在三个真实世界数据集上进行了实验。实验结果表明,与现有的最优基线模型相比,提出的模型在AUC和F1等指标上分别平均提升了6.77%和5.09%,尤其在数据稀疏程度较高的数据集上,模型的性能提升尤为显著,能够有效缓解数据稀疏性带来的问题。
  • SHI Xu, XIE Qing, TANG MengZi, WANG YuHan, LIU YongJian
    Accepted: 2025-05-15
    With the rapid development of Internet technology, personalized recommendation systems play a crucial role in helping users filter content of interest. Traditional recommendation methods have limitations in processing large-scale data and capturing users' complex preferences. Existing recommendation methods based on Graph Neural Networks (GNN) primarily focus on mining direct interactions between users and items. Although they improve the accuracy of recommendations, they often ignore the integration and utilization of multimodal information such as text, images, audio and video. Metapath, as a concept describing the composite relationship between nodes in heterogeneous graphs, can further improve the embedding quality and recommendation effect. However, existing models either ignore node content features, discard intermediate nodes on metapath, or only consider a single metapath. To address the challenges of existing multimodal recommendation systems, this study proposes a multimodal recommendation algorithm based on meta-path guidance (MAMGNN). Firstly, it constructs a multimodal heterogeneous information network to integrate information from different modalities, and then uses meta-paths to guide the propagation and aggregation of information within the intra-metapath and inter-metapath. Furthermore, it introduces Graph Neural Networks and attention mechanisms to learn high-quality embedding representations of users and items, thereby generating more accurate and explainable recommendation results. Extensive experiments on two real-world datasets, MovieLens-20M and H&M, and a small-scale user survey demonstrate that MAMGNN significantly enhances the performance in predicting users' preferences for items, outperforming baseline models in Precision@10, Recall@10, and NDCG@10 metrics by approximately 2.93%, 1.98%, 2.12%, and 3.43%, 1.18%, 2.40% respectively.
  • DAI Zinan , ZHANG Jie, CHEN Chongchong, CHEN Zhangyi, CHEN Fulong
    Accepted: 2025-05-14
    Chinese medical named entity recognition aims to identify entities with specific meanings from medical texts, such as diseases, drugs, symptoms, and anatomical parts. This task provides robust support for clinical decision-making, medical information integration, and medical record management. However, existing research on Chinese medical named entity recognition has not fully addressed the complexity of medical texts, which are characterized by the abundance of specialized terminology, limited embedding diversity, and insufficient utilization of semantic information. To address these issues, this paper proposes a Chinese medical named entity recognition model that integrates multi-granularity features. The model first employs the BERT pre-trained model to generate character embeddings for the text. It then uses both one-dimensional and two-dimensional convolutional neural networks to extract character shape and stroke features, while external lexicons are incorporated to introduce word-level features, enhancing the representation of word and entity boundaries. Additionally, sentence-level features are included to capture global semantic information. A cross-attention mechanism is utilized to iteratively fuse these multi-granularity features, resulting in embeddings enriched with deep semantic information. Finally, conditional random fields (CRF) are used to output the entity recognition results. Experimental results on the CCKS2017 and CCKS2019 datasets demonstrate that the proposed model achieves F1 scores of 92.88% and 87.86%, respectively, outperforming mainstream models in recognition performance.
  • LI Hongbang, DONG Li, WANG Rangding, YAN Diqun, LI Yuanman, LIAO Xin
    Accepted: 2025-05-14
    As an efficient information storage and transmission method, QR codes are widely used in payment, advertising and logistics. However, the robustness and steganography of existing QR code steganography techniques in noisy and disturbed environments are still insufficient to meet the needs of high-security information transmission. To this end, a robust QR code steganography algorithm based on path planning and pixel flipping is proposed. The algorithm ensures that the anti-jamming ability of the secret information is enhanced without affecting the normal recognition of the QR code by treating the QR code as a labyrinth and selecting pixels to be flipped in combination with the path planning algorithm. In terms of technical implementation, firstly, a path planning algorithm is designed for selecting the optimal pixel points to reduce the impact of information embedding on the image quality of the QR code; secondly, the embedding of the secret information is realized by combining the pixel flipping technique and analyzing its performance under different noise conditions. The experiments use typical interference scenarios such as noise attack, image perturbation, and physical distortion to test the algorithm, and the embedding capacity, image quality, and information recovery rate are used as evaluation indexes. The results show that the algorithm has significant advantages in improving the robustness and steganography of QR code steganography, which is suitable for the scenarios with high information security requirements, and at the same time provides new ideas for the further development of QR code steganography technology.
  • LIN Shu, HUANG Jiawei, SHAO Jing, LI Sitan, LIANG Qi, WANG Qile, ZHAO Yilin
    Accepted: 2025-05-14
    In the fields of network communication and traffic management, the ability to quickly and accurately identify heavy flows is of great significance for tasks such as congestion control and malicious traffic monitoring. However, the extremely high transmission rates of data flows in real-world network environments make heavy flow detection highly complex and challenging. Most existing heavy flow detection methods rely primarily on single-dimensional statistical data, typically using only flow size estimation to perform traffic statistics and analysis. The limitation of these approaches lies in their neglect of other critical dimensions of information, such as the distribution characteristics of packet intervals, which may play a key role in accurately identifying heavy flows. To address these issues, this paper proposes a novel heavy flow detection algorithm called IntervalSketch. The algorithm introduces two key traffic features: flow size estimation and packet interval distribution characteristics. By leveraging these two dimensions, IntervalSketch optimizes the protection of heavy flows and the replacement of small flows. Specifically, by incorporating the packet interval distribution, IntervalSketch effectively distinguishes between heavy flows and small flows, thereby significantly improving detection performance under low-memory conditions. To evaluate the accuracy and effectiveness of IntervalSketch, two real-world network traffic datasets, CAIDA and MAWI, were used for experimental analysis. The results demonstrate that IntervalSketch exhibits significant advantages across various memory configurations and traffic distributions. Compared to existing methods, IntervalSketch not only maintains high detection accuracy in memory-constrained environments but also achieves substantial improvements in F1 score, with gains of up to 2.4 times over current state-of-the-art designs.