Author Login Chief Editor Login Reviewer Login Editor Login Remote Office
Home Browse Just accepted

Just accepted

Please wait a minute...
  • Select all
    |
  • Zhang Hang, Wang Jinsong
    Accepted: 2025-06-13
    For user devices (UD) with limited computing resources, handling computation-intensive tasks is quite challenging. Edge computing helps by extending computational resources to the network edge, and one of its key enabling functions is the efficient offloading of tasks. Coordinating the computational resources of numerous edge nodes for task offloading, while ensuring data security during the offloading process, is a significant challenge. Therefore, a task security offloading method based on deep reinforcement learning (DRL) is proposed. First, an edge computing network model is constructed, and a variable security protection mechanism is designed to adaptively ensure data security. Then, the edge computing network model and objectives are formalized and further transformed into a Markov decision process (MDP). Finally, a DRL method based on a penalized action space is proposed to derive the optimal task offloading strategy. Simulation results show that the proposed method can reduce latency and energy consumption costs while ensuring security protection, and consistently maintain a zero task loss rate.
  • WANG Guanyu, GU Yijun
    Accepted: 2025-06-11
    In the field of malicious encrypted traffic classification, algorithms enhance the richness of learning discriminative representations by increasing the dimensionality of traffic features. However, challenges persist, such as the mismatch between selected models and the characteristics of malicious encrypted traffic data, insufficient feature selection, and a lack of in-depth discussion on the characteristics of encrypted traffic data. To address these issues, a classification method based on multi-representation fusion is proposed for the domain of IoT malicious encrypted traffic classification. On one hand, an abstract representation learning module is used to learn packet-level byte association representations and session statistical representations of traffic sessions. On the other hand, a plaintext representation learning module is employed to learn session connection representations of unencrypted plaintext. Finally, the classification results of the two modules are fused based on the confidence scores of the abstract representation learning module to obtain the final malicious traffic classification result. To validate the method's advancement, its performance is compared with 7 benchmark methods based on different methods. The method achieves an F1 score of 0.7694, significantly outperforming other existing benchmark methods. Additionally, to discuss and validate the adaptability of each module to traffic representation learning and the complementarity between the discriminative representations contained in the selected features, 10 variant models based on different inputs and model architectures are generated and compared. The results demonstrate that the proposed method has superior detection performance, proving the adaptability of the model architecture and the complementarity between the representations.
  • SHEN Xianhao , GU Ling , CHEN Yi , YANG Jiazhi
    Accepted: 2025-06-06
    With the accelerated integration of renewable energy into the grid and the intelligent transformation of the new power system, the Power Internet of Things (PIoT) has become key to realizing the intelligence of power systems. However, Power Internet of Things Devices (PIoTD) in remote areas face numerous challenges, including inadequate network coverage, limited energy harvesting, and poor communication conditions. To address these issues, a cloud-edge-device cooperation framework based on artificial intelligence is processed, which employs Unmanned Aerial Vehicle Simultaneous Wireless Information and Power Transfer (UAV-SWIPT) to provide continuous energy to energy-constrained PIoTD. Energy replenishment and communication relay frameworks for SAG-PIoT devices are facilitated by deploying SWIPT services on UAVs in a low-altitude network within the space-air-ground network. Furthermore, to optimize the collaborative work of multiple UAVs and enhance data relay, transmission power allocation, Global Energy Efficiency (GEE), and PIoTD association scheduling, a multi-agent deep reinforcement learning algorithm is introduced to tackle the problems of incomplete global information and high-dimensional variable coupling in dynamic environments. The simulation results show that the proposed algorithm converges faster and demonstrates superior energy efficiency compared to several other benchmark algorithms. On the other hand, in terms of maximizing the minimum transmission rate, MADDPG achieves the highest performance, reaching bits/s. Additionally, it is observed that the optimal SWIPT power splitting ratio is approximately 0.7, and the GEE is the highest.
  • YUAN Lining, FENG Wengang, LIU Zhao
    Accepted: 2025-06-06
    In order to solve the problems of current academic paper classification methods, which neglect the relational information, we propose a novel classification model that integrates Graph Convolutional Networks (GCN) with contrastive learning, called Contrastive Graph Convolutional Network (CGCN). Firstly, we define two distinct types of homogeneous-heterogeneous relational information based on the content and citations of the papers, transforming these into self-supervised information for constructing the contrastive loss. Secondly, we enhance the feature extraction process of GCN by employing contrastive loss, pushing homogeneous papers to be close to one another while ensuring that heterogeneous papers remain distant. Thirdly, we utilize cross-entropy loss and the softmax function to complete end-to-end academic paper classification. On three benchmark academic datasets, the CGCN outperformed advanced baselines in classification task. Micro-F1 and Macro-F1 are raised by 8.29% and 7.91% respectively compared to the original GCN on the Cora dataset. CGCN enhances the capacity to represent potential information in papers by employing a contrastive loss based on the homogeneous-heterogeneous relationship, thereby improving prediction accuracy and generalization. This approach provides innovative ideas and methods for research in academic paper classification.
  • CHEN Haixiu, CHEN Ziang, FANG Weizhi, LU Haitao, HUANG Zijie, CHENG Rong
    Accepted: 2025-06-05
    Dense pedestrian detection is one of the key problems in the development of crowd flow monitoring system in large public places. Aiming at the difficulty of small target detection caused by crowd occlusion in dense pedestrian detection scenes and the deployment requirement of lightweight model, this paper proposes an improved YOLOv8-n dense pedestrian detection model CAD-YOLO(CGDown-Adaptive Fusion Module-Dyhead). Embedded CGDown subsampling module, through an efficient context information extraction mechanism, effectively alleviates the problem that the traditional target detector is easy to lose context features when dealing with dense scenes, and significantly enhances the ability to capture dense pedestrian features and focus on small targets. A BiFPN-Adaptive structure was designed and the neck network was reconstructed. By adaptive fusion of feature information of different scales, the model was more accurate in extracting features of obscured pedestrians and small and medium-sized target pedestrians, and the number of parameters and calculation cost of the model were greatly reduced. The dynamic detection head Dyhead, combined with the new 160×160 small target detection layer, enables the model to capture the fine features of the dense small target area more accurately, thus effectively alleviating the problem of missing detection in the occlusion scene. The experimental results show that compared with YOLOv8-n, the detection accuracy of CAD-YOLO on Crowd Human dataset and WiderPerson dataset is improved by 5.1% and 2.1%, respectively. Despite the significant performance improvement, CAD-YOLO has a reference count of only 2.9M and a model compute capacity of 12.3GFLOPs, meeting the requirements of low power consumption and high precision when deployed on edge devices or mobile devices.
  • LIU Tao, Man Dapeng, XU Chen, LV Jiguang, FENG Zhu, ZENG Fanyi, ZHOU Xue, YANG Wu
    Accepted: 2025-06-05
    Conventional clean label backdoor attacks often fail to establish a strong link between the trigger and target class, resulting in a low attack success rate, and extensive experimental experience shows that this failure is even more severe in federated learning. The main reason for the failure of the attack is that the random selection of the trigger makes it lack a direct connection with the target class. To this end, a learnable trigger backdoor attack was designed for federated learning, which made full use of the task information and shared model issued by the central server to train a trigger that was strongly correlated with the target class, and formalized this training process into a dual-objective optimization problem and solved it. Found the optimal perturbation under constraint conditions to blur the original features of the image as much as possible, thereby maximizing the model's learning ability for the trigger; these blurred images were then trained by adding the triggers allowed within the specified range as inputs, minimizing their image classification loss and generating the optimal trigger quickly using the optimization method of small-batch projected gradient descent. The backdoor attack activated with this trigger still guaranteed excellent attack performance in federated learning. Experimental results on three datasets showed that the attack success rate of the proposed method in federated learning was much higher than that of all kinds of existing clean label backdoor attacks, especially on CIFAR-10, which had an improvement of more than 82% compared to the baseline method. The proposed attack method presents new challenges to the security of federated learning.
  • Li Junliang, Ma Junpeng, Liu Mengxuan, Liu Yuxue, Zhang Junsan
    Accepted: 2025-06-03
    Medical report generation from images is challenging due to low image contrast and the small size of abnormal regions, making it difficult to accurately capture abnormal features using visual information alone. Therefore, introducing external knowledge to enhance visual representation becomes a key issue. In addition, the co-occurrence patterns of abnormal features are complex and cannot be effectively captured from a single instance, making it crucial to leverage similar cases to model such patterns. To address the aforementioned challenges, a Similar-Instance Guided method for medical report generation is proposed, consisting of two main components: Image Feature Memory Module Incorporating Heterogeneous Graphs(FMHG) and Similar Instance Feature Fusion Module(SIFF). FMHG extracts entity relationships from the report and constructs a corresponding heterogeneous graph as a bridge, guiding the model's attention to the abnormal regions of the image, thus enhancing abnormal visual features. SIFF retrieves similar instances and integrates their abnormal visual features, thereby augmenting the representation of abnormal regions while acquiring a more comprehensive under-standing of the abnormal information. Experiments conducted on the IU X-ray and MIMIC-CXR medical imaging datasets demonstrate that the proposed method performs well on the BLEU evaluation metrics, achieving BLEU-1 to BLEU-4 scores of 0.539, 0.353, 0.265, and 0.193 respectively on the IU X-ray dataset. Additionally, it excels in METEOR and ROUGE-L metrics, indicating that the proposed method outperforms existing methods in terms of NLG metrics as well as the accuracy and completeness of the generated reports.
  • Hu Wei, Chen Yuner, Du Puliang
    Accepted: 2025-06-03
    Aiming at the low efficiency of parameter optimization of Variational Mode Decomposition (VMD) in current short-term electricity price prediction methods, the insufficient feature expression ability of single prediction models, and the problem of feature redundancy, this paper proposes a short-term electricity price prediction method based on Multi-Strategy Improved Crested Porcupine Optimizer (MSICPO) algorithm and deep learning. First, the Crested Porcupine Optimizer (CPO) algorithm is improved by introducing Lévy flight strategy, periodic population variation, and dynamic parameter adjustment mechanism to enhance its global search ability and convergence speed. It is used to optimize the modal number and penalty factor parameters of VMD to improve the accuracy of signal decomposition. Second, a deep learning model integrating feature weighting is constructed. By designing a dynamic weighting module to suppress noise interference and enhance the impact of key features, combined with the long-term dependency capture ability of sLSTM and the parallel computing advantage of Transformer, multi-scale feature collaborative optimization processing is realized. Finally, the MSICPO-VMD-WF-sLSTM-Transformer hybrid model is constructed for electricity price prediction. Experimental results show that the Multi-Strategy Improved Crested Porcupine Optimizer algorithm achieves a refined balance of optimal solution precision and optimization efficiency in VMD parameter optimization compared with the original CPO algorithm and other traditional optimization algorithms. The proposed hybrid forecasting model performs well in prediction accuracy, with a coefficient of determination reaching 0.95. In addition, cross-regional data prediction experiments further verify the applicability and generalization ability of the model in different regional electricity markets. The method proposed in this paper not only provides theoretical references for the improvement of intelligent optimization algorithms and multi-feature prediction technologies, but also offers a high-precision and strong generalization solution for short-term electricity price prediction in complex electricity markets.
  • GENG Xia, LIN Xianwen, YANG Zhi
    Accepted: 2025-06-03
    In text-based person search tasks, initializing models with parameters from pre-training models has become a mainstream paradigm, which effectively alleviates the feature alignment bottleneck of single-modal models caused by the lack of cross-modal information. Existing methods focus on mining semantic features at different scales in the image-text joint embedding space for optimization. However, the introduction of the new alignment paradigm is prone to cause the pre-training model to fall into local minimum during fine-tuning. To solve above issues, this paper proposes a Prompt-based Information Transfer (PIT) framework. By introducing cross-modal prompt tokens in the original forward process of the single-modal encoder and the cross-modal image-text encoder, it promotes early feature fusion and implicitly guides the model to focus more on modal-invariant information. PIT includes a prompt-based contrastive loss and a prompt training strategy. The prompt-based contrastive loss aims to construct a shared feature embedding space with both intra-modal discrimination and inter-modal semantic consistency by constraining the similarity between graphic and text features. The prompt training strategy can be regarded as a form of self-distillation, which treats the pseudo-targets generated by non-prompt features and ground-truth as another view of image-text pair, supervising the training process and making the learned embeddings contain richer multi-modal information. Only 0.61M additional parameters introduced on the basis of fine-tuning, PIT achieves Rank-1 improvements of 1.48%, 1.5%, and 1.55% on three public datasets, respectively.
  • GU Yingshuang , GUI Tao , ZHANG Qi
    Accepted: 2025-06-03
    Large language models (LLMs)’s factual hallucination refers to the generation of content that conflicts with established real-world facts, significantly reducing model credibility and applicability in high-risk domains such as healthcare, law, and scientific research. Current methods for hallucination mitigation primarily depend on input optimization, supervised learning, or integration with external knowledge bases. However, these approaches exhibit limited generalizability, substantial dependence on extensive labeled datasets, and constraints in real-time scenarios, making it challenging to fundamentally improve the factual accuracy of LLMs. To address these limitations, this paper proposes a reinforcement learning-based framework incorporating semantic entropy as feedback to mitigate factual hallucinations. Semantic entropy serves as a precise measure of uncertainty at the semantic level, enabling an accurate assessment of the model's confidence in its generated responses. By embedding semantic entropy into the reinforcement learning process as a reward signal, the model is encouraged to proactively avoid responses with a high likelihood of hallucination. Compared to traditional predictive entropy-based methods, semantic entropy more effectively distinguishes semantically equivalent expressions and enhances factual judgment capabilities without reliance on external knowledge sources. Experimental results show that this paper’s method, while maintaining the richness and coherence of the generated content, can improve factual judgment accuracy by up to 5.7% and factual generation accuracy by up to 7.8%, compared to the best baseline model, significantly validating its superiority in factitious hallucination mitigation.
  • ZHANG Lei, LI Shihua, GAO Hao, WANG Xiaoyong
    Accepted: 2025-05-26
    With the escalating energy consumption of urban rail transit system, enhancing the utilization of regenerative braking energy to reduce energy consumption of train operation has become a critical issue. This paper focuses on the optimization problem of tracking train operation control strategy in the process of multi-train cooperative operation. Firstly, building upon the traditional transition strategy of operation mode, the strategy of “Traction-Coasting-Traction-Cruising-Coasting-Braking” is proposed specifically for the tracking operation scenario. Secondly, the train dynamics model in spatial-domain, state transition equation, and energy consumption model are constructed. By employing interpolation method, the cooperative operation problem in time-domain is transformed into the problem of solving optimal switch points in spatial-domain. Subsequently, an optimization decision-making model with the goal of energy consumption and punctuality is constructed, which is then efficiently solved by using the Dung Beetle Optimizer. Finally, taking the Yizhuang Line of Beijing Subway as the simulation line, comparative analyses are conducted to evaluate the influence on optimization performance of Communication-Based Train Control (CBTC) and Train Autonomous Control System (TACS) architectures, as well as different transition strategies. The results demonstrate that TACS significantly enhances the optimization performance of cooperative operation, compared to CBTC. The proposed strategy not only meets punctuality requirement but also outperforms the traditional strategy in energy consumption at various departure intervals. The net absorbed energy consumption can be increased by 14.651 kWh at most, and the actual operational energy consumption can be decreased by 11.284 kWh at most. Therefore, the proposed operational mode transition strategy and optimization method effectively improve the energy consumption of train operation, and have certain reference significance for the development of urban rail train operation control technology. The code has been published in Github: https://github.com/eva-777/Tracking-Train-Operation-Optimization.git.
  • Zukun Wan, Runming Wang, Tianming Ma, Xingdong Song, Shengrong Yuan, Yajun Ding
    Accepted: 2025-05-23
    视觉问答(Visual Question Answering, VQA)理解和解析输入图像及其对应的文本问题,进而提供与问题相关的自然语言答案,已成为跨模态分析领域一个前景广阔的研究方向。现有工作极大程度上依赖于数据集的一些因素,如伪相关、数据集偏差和捷径学习,都对算法鲁棒性带来了极大的挑战。现有基于集成学习的方法通过训练偏差模型捕捉数据集偏差,但由于偏差模型对偏差样本的识别能力不足,导致其难以充分学习偏差信息,进而削弱去偏效果。为了增强偏差模型学习数据集偏差的能力,本文针对 VQA 任务提出了一种自适应偏差学习网络(命名为 ABLNet)。ABLNet 的核心设计包括: 首先,提出了一种自适应的样本重加权机制,基于每个样本的梯度信息动态分配权重,从而增强模型对数据集中偏差特征的学习,提升模型的泛化能力。其次,提出了一种基于受限学习的网络剪枝策略,通过限制偏差模型的学习能力,使其依赖于数据集中的表面相关性和偏差特征。在 VQA-CPv1、VQA-CPv2 和 VQA-v2 这些具有挑战性的 VQA 数据集上进行了大量实验,实验结果证明了我们方法的优越性。
  • CAO Xiaofei, WANG Runmin, CUI Lingxin, CHAI Xinling, Ding Yajun, Han Chang
    Accepted: 2025-05-23
    Breast ultrasound image segmentation plays a significant role in computer-aided diagnosis, but existing methods are constrained by the bottleneck of scarce annotated data. In recent years, generative models have demonstrated potential in medical image synthesis, yet current approaches struggle to simultaneously ensure image realism and mask semantic consistency. To address the performance bottleneck of segmentation models caused by the limited scale of ultrasound image datasets, this paper proposes an innovative ultrasound image dataset augmentation method. First, from a pathological perspective, we design a mask generation module based on the characteristics of benign and malignant tumors, which efficiently generates multiple semantically plausible masks. Next, to synthesize ultrasound images corresponding to these masks, we propose a Mask-guided Diffusion Model (MDM). This model incorporates mask information into the denoising network of the diffusion model through normalization methods, thereby generating ultrasound images that exhibit high semantic consistency with the masks. Experimental results demonstrate that the proposed method significantly outperforms mainstream generative models in terms of image fidelity (FID) and semantic alignment (mIoU). By validating the strategy of incrementally generating data, the performance of segmentation models improves markedly with increasing data volume, proving the effectiveness of the synthesized data.
  • Kai Chen, Zhihua Chen, Lei Dai
    Accepted: 2025-05-22
    Multi-agent Deep Deterministic Policy Gradient Algorithm (MADDPG) alleviates the problem of environmental non-stationarity by introducing global information when solving multi-agent path planning problems. However, in complex environments, multi-agent reinforcement learning algorithms still have shortcomings such as sparse rewards and low levels of agent collaboration. To solve these problems, a multi-agent path planning algorithm based on state action prediction (SA-MADDPG) is proposed. In SA-MADDPG, a Novelty Reward Module based on Long Short-Term Memory network is designed, which can give novel reward values to the agent without relying on current observations and actions to alleviate the problem of reward sparseness. In addition, an Action Prediction Module is designed by explicitly incorporating collaborative information, and a dynamic weight term based on Q-value gain to guide the agents in balancing the optimization of its own task strategy with the optimization of collaborative task strategies, thereby enhancing the level of collaboration among agents. Finally, a three-dimensional multi-agent path planning simulation environment based on drones is constructed to comprehensively evaluate the performance of the proposed algorithm. Experimental results show that the average reward and average episode time of SA-MADDPG: in the obstacle density experiment, they increased by 5.26%-15.81% and decreased by 10.96%-16.05% respectively; in the agent number experiment, they increased by 16.32%-22.9% and decreased by 15.03%-25.15%.
  • TIAN Qing, SHEN Junyu, YU Jiangsen
    Accepted: 2025-05-22
    Unsupervised Domain Adaptation (UDA) aims to migrate knowledge from the labeled source domain to an unlabeled target domain to improve the performance of the target domain model. However, traditional UDA methods assume that the category spaces of the source domain and target domain are entirely consistent, making it impossible to handle unknown categories in the target domain. This limitation restricts their application in real-world scenarios. Open-Set Domain Adaptation (OSDA) addresses this issue by introducing recognition of unknown categories, but effectively reducing inter-domain differences and category imbalance remains a significant challenge. Existing OSDA methods often overlook domain specific features and simply minimize domain differences. This can lead to unclear boundaries between categories and weaken the model’s generalization ability. Therefore, to address this problem, this paper proposes Open-Set Domain Adaptation with Optimal Transport Distance Regularization and Neighborhood Clustering (OTRNC). This method maximizes the distribution distance between high and low confidence sample sets using optimal transport distance regularization, thereby reducing the interference of unknown categories in the domain adaptation process. Subsequently, dynamic nearest neighbor retrieval and invariant feature learning are employed to reduce intra-class variations within the target domain, enhancing feature generalization capabilities. Experimental results show that OTRNC performs well across multiple benchmark datasets.
  • Gao Lingping, Xu Wei, Chen Xi, Mu Yibo, Zhang Kai
    Accepted: 2025-05-22
    As software scale and complexity grow exponentially, monitoring and analyzing program runtime behavior has become increasingly challenging. Dynamic binary instrumentation is an effective solution to this problem, with mature tools like Pin and Valgrind supporting mainstream architectures such as x86 and ARM. However, these tools lack support for emerging domestic instruction set architectures, such as LoongArch. LoongArch, a self-developed instruction set architecture in China, exhibits high levels of autonomy, advancement, and compatibility. Nevertheless, due to its relatively short development history, its ecosystem remains incomplete, particularly in the debugging toolchain. To address this gap and promote the maturation of the LoongArch ecosystem, developing a dynamic binary instrumentation tool for LoongArch is of significant importance. This study aims to design and implement a dynamic binary instrumentation tool based on the QEMU framework to support program monitoring and analysis on LoongArch. The tool, modeled after Pin, implements five fundamental instrumentation granularities and related APIs, along with over 20 instrumentation tools for direct use or as learning resources for tool development. To enhance performance, the framework was optimized through improvements in conditional jump instruction translation, basic block linking, and instrumentation inlining. Performance tests demonstrate that the optimized framework achieves over 100 times improvement in instruction-level instrumentation efficiency and nearly 33 times improvement in basic block-level instrumentation efficiency. Finally, the source code has been open-sourced on GitHub to facilitate the further development of the LoongArch ecosystem and provide a reference for researchers in related fields.
  • FENG Tao, HU Bin, XU Guangyuan
    Accepted: 2025-05-22
    Crowd escape behavior in public places is easy to cause serious public safety disasters. Traditional computer vision technology can detect a few characteristics of crowd escape behavior, but it is difficult to face complex dynamic visual scenes. To address this issue, based on the structure characteristics of locust visual nerve, the danger perception mechanism of locust Lobula Giant Movement Detector (LGMD) and mammalian retinal luminance adaptation mechanism, this paper proposes an Enhanced Crowd Escape Detection Neural Network (ECEDNN). The proposed neural network collects the luminance changes caused by crowd activities in the field of view. With the help of the mammalian retinal luminance adaptive mechanism, the visual response excitation is tuned to adapt to the lighting scene. Visual excitation and suppression are mixed to filter background noise and center-surround mechanism was used to enhance motion edges. Finally, neural spike adaptive tuning is used to detect the burst escape behavior of the crowd and output strong membrane potential excitation. This work is involved the research of crowd activity detection inspired by biological visual perception mechanism, which can provide new ideas and methods for crowd behavior activity perception and anomaly detection in artificial intelligence.
  • HU Caifu, WEI Bo, REN Ruibin
    Accepted: 2025-05-22
    As the network environment continues to evolve and internet applications emerge, machine learning classifiers trained on previous traffic data are becoming increasingly less adaptable to new sample spaces. This leads to a decline in the identification capabilities of classification models, which cannot meet the growing demands of network services and network security. Manually updating classifiers based on experience requires a significant amount of effort and does not guarantee the generalization performance of the new classifiers. At the same time, the continuous influx of new data poses a severe challenge to balancing model training accuracy with computational resource storage. Considering this, this paper innovatively proposes an incremental learning strategy spatial optimization technique to achieve efficient network traffic classification. First, by optimizing the spatial distribution of new and old traffic samples, clusters of new and old categories are kept at a minimum interval, avoiding distribution conflicts between new and old tasks due to sharing the same feature space. Then, within the optimized feature space, a small amount of old data samples are replayed, and knowledge distillation technology is combined to maintain the stability of the original model parameters, adjusting only the extended part of the model to update the classifier at the minimum cost. Experiments on the USTC-TFC2016 dataset show that, compared with other methods, the proposed method in this paper demonstrates higher stability and effectiveness in terms of model accuracy, resource consumption, performance, and ablation experiments.
  • XIE Qingqing, LIU Yuanyuan
    Accepted: 2025-05-20
    In the field of cybersecurity, phishing attacks are becoming increasingly complex and frequent. Traditional phishing detection schemes based on predefined reference templates rely on brand-domain mapping lists, using visual feature matching to identify brand intent and verify domain consistency for explainable detection. While these methods can counter zero-day phishing attacks, they face scalability challenges due to the need for continuous updates to reference lists to cover emerging brands, leading to high maintenance costs. To address these, the paper proposes Phish-RAGLLM, a novel reference-based phishing detection scheme leveraging Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG). By reframing traditional visual problems into language tasks, Phish-RAGLLM eliminates reliance on predefined reference lists, utilizing LLMs' extensive brand knowledge while enhancing generation capabilities through RAG integration with external brand knowledge bases. This approach effectively mitigates LLM hallucination issues and improves detection precision and robustness. Experimental results demonstrate that compared to the current state-of-the-art model PhishLLM, Phish-RAGLLM—using GPT-3.5-turbo-instruct as the main LLM—balances model performance, inference cost and knowledge base completeness, achieving 5.88% increase in F1score and a 12.5% improvement in operational efficiency. Moreover, it shows strong robustness against dataset variations and prompt injection attacks. Based on the characteristics of LLM, Phish-RAGLLM exhibits good adaptability to multilingual phishing websites, effectively detecting phishing webpages in different linguistic contexts. Furthermore, real-world evaluations reveal that Phish-RAGLLM has broader detection capabilities than VirusTotal (a threat intelligence source), further validating its feasibility and effectiveness.
  • WANG Chaoyang, SUN Weiwei
    Accepted: 2025-05-20
    Combinatorial optimization problems have important applications in areas such as logistics path planning, but their solution space exponentially expands with the problem size, leading to severe challenges for traditional methods. In recent years, neural combinatorial optimization methods based on reinforcement learning have been able to achieve solution quality close to that of traditional solvers while keeping the solution consumption time short. The mainstream method POMO (Policy Optimization with Multiple Optima) enhances the training stability through symmetry optimization, but its unidirectional sequence generation mechanism still suffers from a double limitation: on the one hand, it is difficult for the traditional constructive method to fully exploit the symmetry features of the problem; on the other hand, the endpoint information can’t effectively participate in the decision-making process of the remote node. To address this problem, this paper proposes a Bidirectional Construction Strategy (BCS)-based POMO model, named BCS-POMO, which dynamically selects the extension direction with higher confidence by constructing the solution sequence in parallel from the start point and the end point, avoiding models that are caught in a dilemma due to unidirectional constructions. The model exploits the symmetry of the construction sequence to achieve weight parameter sharing and improves the efficiency through batch parallel computation. Experiments have shown that the BCS-POMO effectively reinforces the role of endpoint information as a decision aid in the construction process, which reduces the error by 16% and 18% for the traveling salesman problem (TSP) and the capacitated vehicle routing problem (CVRP), respectively, verifying the effectiveness of the bidirectional construction strategy in exploiting the endpoint information and the advantages of symmetry modelling.
  • Guo Ziyun, Tian Youliang, Li Mengqian
    Accepted: 2025-05-19
    Federated learning leverages client data resources to collaboratively train a global model, whose performance depends on the quality of client data and their level of participation. Clients expect to receive appropriate compensation after contributing high-quality data to enhance their motivation for participation. Additionally, since the local model parameters uploaded by clients contain information about private data resources, they face the risk of privacy leakage. To address these challenges, this paper proposes an incentive-based adaptive privacy-preserving federated learning scheme. First, a pre-decision game auction mechanism is designed to ensure that clients truthfully report their costs while achieving Nash Equilibrium (NE). Second, a training quality evaluation algorithm is developed based on training time and model loss, which determines client compensation according to the overall training quality evaluation score, thereby incentivizing high-quality data contributors to participate in training. Finally, an adaptive differential privacy technique is employed to perturb local model parameters, enhancing model utility through dynamic noise allocation. Theoretical analysis demonstrates that the proposed scheme satisfies security and privacy protection requirements, while experimental results validate its effectiveness.
  • CHEN Xinluo, ZHAO Shuang , CAO Fang
    Accepted: 2025-05-19
    With the development of multimedia technology, the difficulty of unauthorized forgery and dissemination of false information has greatly decreased. This may lead to a series of negative consequences. Effective content authentication algorithms are urgently needed to ensure the authenticity and security of image content. In recent years, perceptual image hashing has shown excellent performance in the field of image authentication. However, existing algorithms are not ideal for processing images with a large proportion of text, and they can not effectively cope with new content-preservation manipulations such as scribble. Therefore, a text-picture mixed image content authentication algorithm based on perceptual hashing is proposed. The proposed algorithm adopts the image segmentation algorithm of ring partition, and it calculates the frequency and distribution characteristics of SIFT key points within each ring. These features have rotation invariance and can effectively improve the anti-collision performance of the proposed algorithm. By obtaining key point information, the proposed algorithm performs good robustness performance against content-preservation manipulations, including irregular scribble. A Text-Picture Mixed Image (TPMI) dataset is constructed to validate the performance of the proposed algorithm. Compared with some representative algorithms, this algorithm has better performance in perceptual robustness, anti-collision, and security. Partial tampering with images can effectively identify each tampered image as similar to the original image. In addition, experiments on scribble attacks are constructed in reality, and the results show that it can effectively identify such attack images.
  • ZHU XingPo, WANG Xiaoyang
    Accepted: 2025-05-19
    Bi-triangle (6-cycle) enumeration in bipartite graphs is essential for graph analysis tasks like local clustering coefficient computation. As real-world bipartite graph data scales beyond single-machine capacity, efficient distributed algorithms are needed. However, the existing distributed graph partitioning (GP) enumeration algorithm struggles with large subgraph combinations, message overload, and redundant enumeration. In this regard, two optimized algorithms are proposed based on the topological characteristics of bi-triangles: Method 1 views the bi-triangle as three wedge structures, generating subgraphs using wedge groups as the basic unit. A subgraph combination mechanism via A-type and V-type wedge group concatenation is introduced, greatly reducing the number and scale of subgraph combinations, ultimately enumerating bi-triangles through wedge triplet. To prevent message overload and redundancy, a subgraph reading mechanism via a distributed storage system and a deduplication mechanism based on vertex ordering are proposed. Method 2 decomposes the bi-triangle into two zedge structures. It first partitions the graph using wedge groups and then applies a “compressed zedge” construction and restoration mechanism for a second partition, ultimately enumerating bi-triangles through zedge pairs with lower computational complexity than Method 1. Experiments show that, compared to GP, Method 1 reduces subgraph data by 205x on average and enumeration time by at least 45x, while Method 2 achieves average reductions of 30x and at least 101x, respectively.
  • You Yiheng, Wang Xin, Ma Menglu, Wang Hui
    Accepted: 2025-05-19
    知识图谱作为人工智能领域的关键数据组织形式,在大数据与大模型蓬勃发展的当下,被广泛应用于众多领域。随着知识图谱规模不断扩大,现有存储结构暴露出数据导入速度慢、存储空间占用大等问题。为此,本文提出一种“关系型+键值对”的混合存储方案(KGHS),并设计基于属性频率的实体聚类算法。KGHS借助基于属性频率的实体聚类算法,对不同属性频率的实体簇进行分类。对于高频属性,利用关系型数据库存储,发挥其查询效率高的优势;对于稀有属性,则采用键值对形式存储,以展现键值对存储在处理稀疏数据时的灵活性。这种设计有效规避了关系型存储面对稀疏数据时产生大量空值的弊端,减少了键值对存储中键的重复存储问题,在确保数据灵活性的同时,显著提升了存储效率。在合成数据集和真实数据集上的实验显示,与现有方案相比,KGHS在真实数据集上存储空间节省50%以上,数据导入速度提升一个量级,且查询性能不受显著影响,充分说明KGHS有效地解决了大规模知识图谱的存储难题,为知识图谱在各个领域的广泛应用提供了有力的存储支持,具有重要的理论意义和实际应用价值。
  • Xu Xinhao, Li Ziqi, Yin Hefeng, Zhang Yonghong
    Accepted: 2025-05-19
    The surface texture of a printed circuit board (PCB) is complex, with defects that are small and come in a variety of shapes. In order to accurately detect small targets, smaller-scale detection heads are often added, which has the effect of significantly increasing the computational cost and slowing down the detection speed. To address this issue, we propose a multi-scale feature fusion learning model for PCB small-target defect detection, named PCB-Det. Based on the YOLOv8 architecture, the model replaces the original backbone network with the lightweight PP-HGNet and incorporates the GSPPFCSPC module for multi-level feature extraction, thereby expanding the receptive field to enrich feature information. Furthermore, we have devised the Pro-BiFPN feature fusion network with the objective of enhancing the interaction between features from adjacent layers, thereby optimizing the fusion of shallow detail information and deep semantic information. Furthermore, the model incorporates shared feature branches to reduce the computational burden of the original detection heads and employs the Wise-IoU loss function to dynamically adjust the loss weights, thereby accelerating model convergence. The experimental results demonstrate that the proposed PCB-Det model achieves an average precision of 97.7% on the PCB_DATASET defect dataset, representing a 3.1% improvement over the baseline model. The model effectively reduces both missed detections and false positives, thereby enhancing the detection capability for small-target defects in PCBs.
  • XU Shaoping, WANG Zichao, TANG Yiling, XIONG Silong
    Accepted: 2025-05-16
    The human visual cortex has a hierarchical structure, in which binocular fusion and binocular rivalry first occur in the low-level visual areas. However, current deep learning-based stereoscopic image quality assessment (SIQA) models generally estimate the quality values of stereoscopic images by fusing the features of left and right view images at different levels of the network, resulting in insufficient simulation of the perception in the low-level visual areas of human visual cortex. To address this issue, this paper proposes a SIQA method that simulates binocular rivalry to further enhance the evaluation accuracy. First, we leverage the ability of deep convolutional neural networks to acquire prior knowledge of input image and build a binocular image fusion model based on an unsupervised approach. This model takes the left and right views as learning targets to simulate the binocular fusion process in the human visual system. The gradient magnitude responses of the left and right images are utilized to calculate the image degradation coefficient, which is then used to obtain the fusion weights of the left and right views, simulating the binocular rivalry phenomenon. Then, we utilize a pre-trained ResNet50 model to extract quality-aware features from the fused image and establish a feature-quality mapping model based on support vector regression to estimate the quality value of the stereoscopic image. Experimental results demonstrate that our proposed SIQA method achieves over 0.96 on both Pearson linear correlation coefficient (PLCC) and Spearman The human visual cortex has a hierarchical structure, in which binocular fusion and binocular rivalry first occur in the low-level visual areas. However, current deep learning-based stereoscopic image quality assessment (SIQA) models generally estimate the quality values of stereoscopic images by fusing the features of left and right view images at different levels of the network, resulting in insufficient simulation of the perception in the low-level visual areas of human visual cortex. To address this issue, this paper proposes a SIQA method that simulates binocular rivalry to further enhance the evaluation accuracy. First, we leverage the ability of deep convolutional neural networks to acquire prior knowledge of input image and build a binocular image fusion model based on an unsupervised approach. This model takes the left and right views as learning targets to simulate the binocular fusion process in the human visual system. The gradient magnitude responses of the left and right images are utilized to calculate the image degradation coefficient, which is then used to obtain the fusion weights of the left and right views, simulating the binocular rivalry phenomenon. Then, we utilize a pre-trained ResNet50 model to extract quality-aware features from the fused image and establish a feature-quality mapping model based on support vector regression to estimate the quality value of the stereoscopic image. Experimental results demonstrate that our proposed SIQA method achieves over 0.96 on both Pearson linear correlation coefficient (PLCC) and Spearman
  • PAN Yincang, ZHANG Dong , LI Guanyu, Chen Heng
    Accepted: 2025-05-16
    Knowledge graph embedding techniques map complex semantic information into low-dimensional vector representations, enabling like link prediction and knowledge completion. However, Models are constrained by a single mathematical structure, making it difficult to simultaneously accommodate three-dimensional, direction-sensitive rotations and non-commutative composition. This limits effective joint inference of complex relational patterns. To overcome this, we propose the TransQD knowledge graph embedding model, which integrates quaternion and dual complex embeddings. Addressing the expressiveness bottleneck of single-structure methods, TransQD introduces a collaborative mechanism between quaternion and dual complex embeddings: the quaternion component uses the Hamiltonian product to model three-dimensional, direction-sensitive rotations, capturing spatial interactions between entities; the dual complex component employs non-commutative multiplication to rigorously represent order-dependent relations—such as when reordering in path compositions causes semantic shifts. By weighting each component, the model achieves a complementary effect, covering a broader range of relational patterns. Finally, TransQD demonstrates outstanding performance in link prediction and path query tasks on multiple public datasets, with ablation experiments confirming the necessity of dual-component collaboration.
  • Yuan Huanyu, Fu Jianming
    Accepted: 2025-05-16
    Reverse engineering Android applications not only facilitates the detection of security issues such as privacy leaks and cryptographic misuses in legitimate applications but also supports the analysis of malicious application behaviors. Key challenges in this process include locating cryptographic functions within native binary code, identifying the cryptographic algorithms they employ, and determining their functionalities. Among existing methods for identifying cryptographic functions, dynamic analysis-based methods often achieve high accuracy due to their ability to capture detailed runtime information. However, existing dynamic analysis-based tools are primarily designed for x86/x64 architectures, making them less effective for Android applications, which are predominantly based on the 64-bit ARM architecture. To address this issue, this paper proposes a hook-based method for identifying cryptographic functions in the native code of Android applications. The method first filters suspected cryptographic functions using three types of static features: constant characteristics, computational instruction ratios, and cryptographic instructions. Next, the dynamic instrumentation toolkit Frida is employed to hook the filtered functions and collect runtime information, such as parameters and return values. Finally, the execution results of the hooked functions are compared with cryptographic functions from open source cryptographic libraries to identify their types and functionalities. The proposed method is tested on three popular Android applications. Experimental results demonstrate that the proposed method effectively identifies cryptographic functions in the native code of real-world Android applications.
  • ZHANG Ru, SUN Weifeng, ZHANG Peng, ZHANG Chao, DAI Yongshou
    Accepted: 2025-05-16
    With the continuous updating of electromagnetic signal analysis technology and methods, higher requirements are put forward for the rapid iteration capability of electromagnetic signal analysis software functions. However, due to the high system coupling and strong inter-module dependency of traditional architecture, software maintenance is difficult and the update efficiency is low, which makes it difficult to meet the expansion requirements of new functions. In order to solve the above problems, a plug-in-based electromagnetic signal analysis software architecture design method is proposed, aiming to reduce the system coupling through plug-in design and improve the scalability and maintenance efficiency of the software. First, according to the functional requirements analysis of the software, the software architecture design principles are formulated, and on this basis, the hierarchical modular overall architecture of electromagnetic signal analysis software is designed. Then, based on the idea of "platform + plug-in", the platform extension interface and standard plug-in interface are designed to standardize the development and integration of plug-ins. At the same time, based on the dynamic plug-in loading mechanism of the Qt framework, the plug-in manager is designed and implemented, and the prototype software of electromagnetic signal analysis that supports cross-platform operation is developed. Finally, the scalability of the prototype software is evaluated based on the EMSA measurement method, and tested and verified in actual application scenarios. Experimental results show that the expansion capability of the plug-in electromagnetic signal analysis software is improved by 58.33% compared with the modular architecture, and it exhibits high stability and robustness in actual application.
  • Li Danbo, Yan Xuexiong, Mao Enhui
    Accepted: 2025-05-15
    HTTP protocol is the core infrastructure of Internet communication, and its modern communication model relies on the collaboration of multiple servers. If the servers in the processing chain do not strictly follow the protocol specifications or have differences in semantic interpretation, it will cause semantic inconsistency problems of systemic characteristics, leading to security threats such as access control bypass, multi-host problems, request smuggling and cache pollution. Differential fuzz testing analyzes semantic inconsistency problems by observing the differences in message processing results of different servers. In order to solve the problems of inaccurate field set range, low mutation efficiency and single observation dimension in existing tools, an improved differential fuzz testing method is proposed. First, based on the message construction method of key headers, the core fields are selected to simplify the test space; based on the mutation method of field semantics, the mutation method is designed by combining semantic classification and vulnerability characteristics to enrich test cases; the extended message analysis method expands the message analysis scope to request and response messages, fully observes the communication process, and covers the existing scenarios of semantic inconsistency problems. Finally, tests are conducted on 7 commonly used servers, and 18 types of server processing differences are found and 9 pairs of combinations with semantic inconsistency problems are verified. Compared with similar tools such as t-reqs, it reduces the size of the test set by an order of magnitude, increases the average proportion of valid test cases by 12.67%, discovers two additional types of difference problems from the same observation angle, and expands the test scope to cover four scenarios of current semantic inconsistency problems.
  • LIU Ziyi, SHA Ying
    Accepted: 2025-05-15
    The Vision and Language Navigation (VLN) task aims to guide an agent to move to a target location in a 3D or real-world environment based on language instructions. However, traditional end-to-end VLN algorithms have limitations. When an erroneous action occurs in navigation planning, the agent tends to go into incorrect paths, resulting in an inability to continue following instructions or exploring unnecessary areas. To address this issue, an agent named Nav-Explore is proposed, which is based on a large language model and an exploration module. The agent leverages the reasoning capabilities of the large language model to predict the next action according to the language instructions and current visual information, and uses an exploration module to balance exploration and exploitation. The exploration module employs an epsilon-greedy strategy to toggle between normal navigation and exploration modes. When the random probability is below epsilon, the agent explores possible future paths to assess the feasibility of next actions, thus avoiding wrong decisions. If the probability exceeds epsilon, it directly uses the large language model's output for navigation. This modular design enables the Nav-Explore to effectively enhance navigation success rates and improve the agent’s generalization ability in unseen environments. Experimental results demonstrate that the Nav-Explore achieves superior performance on two outdoor VLN benchmark datasets, Touchdown and Map2seq, significantly increasing navigation success rates. Furthermore, the Nav-Explore also exhibits strong generalization capabilities, effectively completing navigation tasks in different environments.
  • YAN Shize, FANG Zhijun
    Accepted: 2025-05-15
    知识图谱作为一种图结构数据组织方式,为推荐系统提供了更为丰富的语义信息和上下文背景,使得推荐系统能够有效地处理复杂的用户行为和物品特征。现有的基于知识图谱的推荐方法仍然面临诸如信息过度平滑和异常数据处理等问题,尤其是在大规模数据处理的场景中,过度平滑往往导致模型无法捕捉到个性化的用户需求,异常数据的干扰也可能影响推荐结果的准确性和鲁棒性。为此,提出了一种基于用户行为融合特征与异常点检测的知识图谱推荐模型。该模型通过引入用户融合行为特征,有效避免信息过度平滑的问题。且该模型结合了异常点检测机制,通过识别和剔除噪声数据和异常行为,显著提升了推荐结果的准确性和鲁棒性,减少了不良数据对推荐结果的影响。为了验证模型的有效性,在三个真实世界数据集上进行了实验。实验结果表明,与现有的最优基线模型相比,提出的模型在AUC和F1等指标上分别平均提升了6.77%和5.09%,尤其在数据稀疏程度较高的数据集上,模型的性能提升尤为显著,能够有效缓解数据稀疏性带来的问题。
  • SHI Xu, XIE Qing, TANG MengZi, WANG YuHan, LIU YongJian
    Accepted: 2025-05-15
    With the rapid development of Internet technology, personalized recommendation systems play a crucial role in helping users filter content of interest. Traditional recommendation methods have limitations in processing large-scale data and capturing users' complex preferences. Existing recommendation methods based on Graph Neural Networks (GNN) primarily focus on mining direct interactions between users and items. Although they improve the accuracy of recommendations, they often ignore the integration and utilization of multimodal information such as text, images, audio and video. Metapath, as a concept describing the composite relationship between nodes in heterogeneous graphs, can further improve the embedding quality and recommendation effect. However, existing models either ignore node content features, discard intermediate nodes on metapath, or only consider a single metapath. To address the challenges of existing multimodal recommendation systems, this study proposes a multimodal recommendation algorithm based on meta-path guidance (MAMGNN). Firstly, it constructs a multimodal heterogeneous information network to integrate information from different modalities, and then uses meta-paths to guide the propagation and aggregation of information within the intra-metapath and inter-metapath. Furthermore, it introduces Graph Neural Networks and attention mechanisms to learn high-quality embedding representations of users and items, thereby generating more accurate and explainable recommendation results. Extensive experiments on two real-world datasets, MovieLens-20M and H&M, and a small-scale user survey demonstrate that MAMGNN significantly enhances the performance in predicting users' preferences for items, outperforming baseline models in Precision@10, Recall@10, and NDCG@10 metrics by approximately 2.93%, 1.98%, 2.12%, and 3.43%, 1.18%, 2.40% respectively.
  • DAI Zinan , ZHANG Jie, CHEN Chongchong, CHEN Zhangyi, CHEN Fulong
    Accepted: 2025-05-14
    Chinese medical named entity recognition aims to identify entities with specific meanings from medical texts, such as diseases, drugs, symptoms, and anatomical parts. This task provides robust support for clinical decision-making, medical information integration, and medical record management. However, existing research on Chinese medical named entity recognition has not fully addressed the complexity of medical texts, which are characterized by the abundance of specialized terminology, limited embedding diversity, and insufficient utilization of semantic information. To address these issues, this paper proposes a Chinese medical named entity recognition model that integrates multi-granularity features. The model first employs the BERT pre-trained model to generate character embeddings for the text. It then uses both one-dimensional and two-dimensional convolutional neural networks to extract character shape and stroke features, while external lexicons are incorporated to introduce word-level features, enhancing the representation of word and entity boundaries. Additionally, sentence-level features are included to capture global semantic information. A cross-attention mechanism is utilized to iteratively fuse these multi-granularity features, resulting in embeddings enriched with deep semantic information. Finally, conditional random fields (CRF) are used to output the entity recognition results. Experimental results on the CCKS2017 and CCKS2019 datasets demonstrate that the proposed model achieves F1 scores of 92.88% and 87.86%, respectively, outperforming mainstream models in recognition performance.
  • LI Hongbang, DONG Li, WANG Rangding, YAN Diqun, LI Yuanman, LIAO Xin
    Accepted: 2025-05-14
    As an efficient information storage and transmission method, QR codes are widely used in payment, advertising and logistics. However, the robustness and steganography of existing QR code steganography techniques in noisy and disturbed environments are still insufficient to meet the needs of high-security information transmission. To this end, a robust QR code steganography algorithm based on path planning and pixel flipping is proposed. The algorithm ensures that the anti-jamming ability of the secret information is enhanced without affecting the normal recognition of the QR code by treating the QR code as a labyrinth and selecting pixels to be flipped in combination with the path planning algorithm. In terms of technical implementation, firstly, a path planning algorithm is designed for selecting the optimal pixel points to reduce the impact of information embedding on the image quality of the QR code; secondly, the embedding of the secret information is realized by combining the pixel flipping technique and analyzing its performance under different noise conditions. The experiments use typical interference scenarios such as noise attack, image perturbation, and physical distortion to test the algorithm, and the embedding capacity, image quality, and information recovery rate are used as evaluation indexes. The results show that the algorithm has significant advantages in improving the robustness and steganography of QR code steganography, which is suitable for the scenarios with high information security requirements, and at the same time provides new ideas for the further development of QR code steganography technology.
  • LIN Shu, HUANG Jiawei, SHAO Jing, LI Sitan, LIANG Qi, WANG Qile, ZHAO Yilin
    Accepted: 2025-05-14
    In the fields of network communication and traffic management, the ability to quickly and accurately identify heavy flows is of great significance for tasks such as congestion control and malicious traffic monitoring. However, the extremely high transmission rates of data flows in real-world network environments make heavy flow detection highly complex and challenging. Most existing heavy flow detection methods rely primarily on single-dimensional statistical data, typically using only flow size estimation to perform traffic statistics and analysis. The limitation of these approaches lies in their neglect of other critical dimensions of information, such as the distribution characteristics of packet intervals, which may play a key role in accurately identifying heavy flows. To address these issues, this paper proposes a novel heavy flow detection algorithm called IntervalSketch. The algorithm introduces two key traffic features: flow size estimation and packet interval distribution characteristics. By leveraging these two dimensions, IntervalSketch optimizes the protection of heavy flows and the replacement of small flows. Specifically, by incorporating the packet interval distribution, IntervalSketch effectively distinguishes between heavy flows and small flows, thereby significantly improving detection performance under low-memory conditions. To evaluate the accuracy and effectiveness of IntervalSketch, two real-world network traffic datasets, CAIDA and MAWI, were used for experimental analysis. The results demonstrate that IntervalSketch exhibits significant advantages across various memory configurations and traffic distributions. Compared to existing methods, IntervalSketch not only maintains high detection accuracy in memory-constrained environments but also achieves substantial improvements in F1 score, with gains of up to 2.4 times over current state-of-the-art designs.
  • Shen Qinfeng, Huang Luyao
    Accepted: 2025-05-14
    As a deep learning paradigm which can train in continuous data streams, continuous learning is suitable for increasingly open and complex intelligent application scenarios. The main challenge for incremental learning is catastrophic forgetting, which refers to the precipitous drop in performance on previously learned tasks after learning a new one. The state of art works on continuous learning ignore the impact of uncertainty on model training. In addition, existing works mainly focus on mitigating forgetting in phases after the initial one while the role of the initial phase is largely neglected. Motivated by this, we propose a trusted continuous learning method based on uncertainty correction, which constrains the uncertainty of the model at the initial stage. Thus, this constraint can alleviate errors caused by model parameter drift and the catastrophic forgetting can be relieved. Our method can be combined with other continuous learning methods, so it is pretty universal, for example, we improve three classic methods in continuous learning by our method, and the experimental results show that improved models outperform the original ones: the average accuracy is improved by 1.2% to 19.1% on two datasets. Moreover, we use expected calibration error to evaluate the reliability of the models. Experimental results show that the models improved by our method have lower expected calibration error, which proves our method can improve the reliability of original ones.
  • SONG Shuhan, TIAN Youliang, WANG Shuai
    Accepted: 2025-05-14
    Federated Learning enables collaborative model training while preserving data privacy. However, challenges remain in privacy protection, participant trust, and defense against adversarial attacks. In Hierarchical Federated Learning , untrusted central servers, intermediaries, and edge devices pose risks of data leakage and malicious manipulation. Additionally, adversarial clients may upload abnormal gradients, compromising model performance. Efficient security verification and adversarial detection in HFL are therefore critical issues. To address the challenges of mutual distrust among participants and Byzantine attacks in hierarchical federated learning, a secure aggregation scheme with non-interactive verification under hierarchical architecture is proposed. First, a mutual verification mechanism for hierarchical federated learning is designed based on a commitment scheme, allowing participants to perform mutual verification. Second, a constraint and detection scheme for malicious updates is constructed using non-interactive zero-knowledge range proofs, enabling the server to detect and exclude malicious users. Third, a noise masking scheme is designed based on the Chinese Remainder Theorem, supporting user exit and reconnection while ensuring local user privacy. Finally, security analysis and experimental evaluation demonstrate that the proposed scheme can achieve secure mutual verification and malicious detection with high efficiency.
  • Wenqian ZHU, Lijuan SONG, Xinru GUO, Zirui MA
    Accepted: 2025-05-13
    In the process of multi-view 3D reconstruction based on neural implicit surface learning, there are inherent ambiguities in the representation of the geometric shape and appearance of complex object. Therefore, fine geometric details of the object are prone to being lost in sparse texture areas, boundary, and large smooth surfaces, making accurate recovery difficult. To address this issue, a novel neural implicit surface reconstruction method based on multi-view mixed consistency constraints is proposed. This method uses multi-view stereo (MVS), multi-view photometric consistency, feature consistency, and volume rendering techniques to optimize the implicit surface representation, enabling the reconstruction of object models with fine geometric details. Firstly, a dense point generation module based on multi-view stereo is proposed, which generates dense points through MVS to supplement detail information in sparse texture and boundary of the object surface, achieving multi-view geometric optimization of the object surface. Secondly, a multi-view mixed consistency constraints module is introduced, which uses the signed distance function (SDF) to locate the zero-level set. It applies multi-view photometric consistency constraints to impose geometric constraints on the smooth regions of the object, supervising the extracted implicit surface. Additionally, multi-view feature consistency constraints are applied to surface points at the zero-crossing of the linearly interpolated SDF, compensating for pixel matching errors in texture-sparse or structurally complex regions, thereby optimizing the object reconstruction model. Finally, volume rendering technology is applied to produce high-quality image renderings from the implicit SDF, enabling precise surface reconstruction of objects. Experimental results show that, compared to methods like Colmap, the proposed method achieves a improvement in peak signal-to-noise ratio (PSNR), increasing by over 40.3% on the DTU dataset, and successfully enables accurate surface reconstruction of the objects.
  • YE Yuhang, REN Xiaoning, WU Yuming
    Accepted: 2025-05-09
    This paper proposes TGMM, a large-scale code clone detection method based on parse trees and GPU acceleration, addressing the limitations of existing tools in multi-language adaptation and large-scale analysis. The method employs a three-stage architecture for clone detection. First, it generates standardized parse trees based on programming languages’ lexical and syntactic rules, then extracts subtrees meeting granularity requirements. Second, it simplifies subtrees through pruning and removes non-functional differences via semantic equivalence transformations. Finally, a global suffix array is constructed in parallel using GPUs to rapidly calculate the similarity of code blocks. The method is tested in terms of clone detection efficiency and language scalability. On the public benchmark dataset BigCloneBench, TGMM achieves a precision of 97%, significantly outperforming seven mainstream tools, with an average execution time reduced by over 50% compared to the second-best tool, while maintaining a comparable recall rate across various clone types. In the language scalability test, TGMM successfully parses 25 out of 30 mainstream programming languages. Additionally, TGMM is applied to conduct a multi-granularity clone analysis on the top 45 GitHub projects (covering 9 programming languages), revealing significant differences in clone density across different languages and providing an in-depth analysis of the underlying causes, thus offering practical and effective references for software maintenance.
  • Luo Zhengdong, Zhang Guohao, Han Yunfei, Wang Yi, Zhou Xi
    Accepted: 2025-05-09
    Existing methods for tabular data prediction primarily focus on classical classification and regression tasks. However, there is a type of data in the tabular data domain where the labels have an ordinal relationship, and its prediction task is called tabular ordinal classification. Current methods for tabular ordinal classification mainly rely on retrieving similar features and augmenting the sample feature representation by fusing the similar features with the ordinal distance between classes. However, existing methods neglect the full utilization of label ordinal knowledge. To address this, a method based on ordinal label entropy optimization is proposed, which effectively guides the model to learn ordinal information by mining the ordinal entropy embedded in the label order knowledge. Specifically, first establish an ordinal entropy calculation module that quantifies the ordinal entropy based on the ranking differences between the predicted and true labels. Through step-by-step analysis and derivation, the ordinal label entropy is designed as a novel rank loss function, which is introduced as a regularization term into the model. This encourages the model to learn the ordinal relationship between labels and reduces the information loss caused by unordered predictions. Then, the ordinal-entropy optimized ranking loss function is combined with the original loss function of the model to jointly improve the model's predictive ability. Finally, experimental results on multiple ordinal tabular datasets show that this method outperforms various baseline models, fully demonstrating the effectiveness and advantages of the ordinal entropy optimization model in tabular ordinal classification tasks.
  • PAN Minmin, ZHAO Qilu
    Accepted: 2025-05-09
    Self-supervised learning has demonstrated strong potential in computer vision tasks. However, how to effectively fuse features extracted from multiple self-supervised tasks remains a major challenge in the current research field. Traditional multi-task learning methods struggle to effectively integrate heterogeneous self-supervised features due to issues such as input conflicts and architectural incompatibilities. Existing feature fusion methods (e.g., subspace learning) often over-compress the feature space, leading to the loss of task-specific information. This paper proposes a multi-self-supervised feature fusion method based on a feature regression task, which treats the feature fusion problem as a multi-view learning task. The goal is to learn a shared latent space across different views and maximize the correlation between different self-supervised features. The model first treats the multi-self-supervised features as complementary "multi-view" representations and constructs a feature interaction network centered around a Transformer encoder. Then, the feature regression task uses masked features as input, and through a self-attention mechanism, it explores cross-task correlations to reconstruct the original features, forcing the model to preserve unique information while maximizing shared information. The resulting features contain a large amount of shared and unique information from different views of the image, making the features more generalized. Image classification experiments conducted on multiple well-known datasets show that the fused features exhibit significantly better generalization performance compared to the features before fusion, thus validating the effectiveness of the feature fusion method.
  • JIAO Luyao, YANG Xiaoya, MENG Yaofei, LIU Songhua
    Accepted: 2025-05-09
    Existing time series prediction methods fail to fully account for the spatiotemporal dependencies among variables, which limits the improvement of prediction accuracy. The spatial modeling methods based on graph neural networks also face limitations in graph structure construction: 1) Statically predefined graphs struggle to capture the dynamic interaction characteristics between variables; 2) Adaptive graph structure learning is highly sensitive to parameter initialization and may easily fall into local optima. To address these issues, this paper proposes a multivariate time series prediction model that combines spatiotemporal and Kolmogorov-Arnold networks. In the spatial dimension, a graph structure learning module uses the pearson correlation coefficient to establish an initial adjacency matrix for variables. Learnable parameters are introduced to dynamically adjust and optimize the graph structure. By stacking multiple layers of graph convolutional networks, the model effectively captures the spatial dependencies between variables. In the temporal dimension, the multi-head self-attention mechanism combined with gated recurrent units extracts time dependencies in different subspaces, capturing both local temporal patterns and global key information simultaneously. In order to further improve the representation ability of the model, the Kolmogorov-Arnold networks are used to replace the traditional multilayer perceptron, and the nonlinear fusion of spatiotemporal features is realized through the learnable activation functions. Experimental results show that this proposed model achieves average reductions of 36.9 percentage points in mean squared error and 24.8 percentage points in mean absolute error across seven benchmark datasets. The generalization performance of the model is verified by using the Australian electricity load dataset for testing. Compared with other mainstream models, the proposed model can accurately capture the dependencies between variables and effectively integrate spatiotemporal features, which improves the accuracy and robustness of prediction.
  • LIU Zhongmin, LUO Qiang
    Accepted: 2025-05-09
    To address the challenges of insufficient contextual feature extraction and information loss in Dunhuang mural inpainting, this paper proposes an inpainting method based on efficient feature representation. The proposed method integrates attention mechanisms with multi-scale feature fusion to enhance the quality of inpainted murals. Specifically, built upon an encoder-decoder architecture, a dual attention module is introduced to refine features in both spatial and channel dimensions, thereby enhancing the contextual feature representation of murals and improving semantic consistency and detail reconstruction in damaged regions. Furthermore, a multi-scale gating module is incorporated into the skip connections to capture and transmit feature information from the encoding stage to the decoding stage, thereby strengthening long-range dependencies and improving the fusion of global structures with local details. To further reduce computational complexity while preserving effective feature information, a nonlinear activation-free block is proposed, optimizing both computational efficiency and inpainting quality. To evaluate the effectiveness of the proposed method, extensive experiments are conducted on the Dunhuang mural dataset and the FFHQ face dataset. Experimental results demonstrate that the proposed method not only performs well in mural inpainting tasks but also exhibits strong generalization ability across different image inpainting tasks, generating visually natural inpainting results and achieving superior performance in comparison to existing algorithms.
  • YANG Yang, WEI Hongkai, SUN Shijie, HU Hongli, WANG Rong, WANG Tiantian
    Accepted: 2025-05-09
    Biomedical imaging plays a crucial role in the diagnosis and treatment of various diseases. The application of deep learning methods to medical image analysis can enhance the readability of medical images and provide more reliable support for clinical decision-making. However, traditional medical image processing methods face certain limitations in effectively capturing spatial features and complex structural information in 3D images, especially when handling complex 3D medical images generated by different imaging modalities. This often challenges the model's accuracy and generalization ability. To address this challenge, an MTM3D model is proposed for medical image classification tasks. This model combines the excellent performance of the Mamba model in complex sequential tasks with the external memory storage function of the improved Token Turning Machines (TTM) network. By introducing a cyclic chain storage structure, MTM3D enables effective interaction of features from different spatial structures within memory units, thus enhancing its ability to capture complex spatial relationships. Furthermore, the incorporation of Mamba further strengthens the interaction between the memory and processing units, allowing the model to possess stronger generalization capability and perform excellently across different medical imaging datasets. Experimental results demonstrate that MTM3D exhibits outstanding medical image understanding capabilities on the MedMNIST v2 dataset. Compared to the current best medical image analysis networks, MTM3D improves the average accuracy (ACC) by 3.97% and the average area under the curve (AUC) by 2.00%, fully showcasing its tremendous potential in medical image interpretation and assisting healthcare professionals in diagnosis and treatment planning.
  • Ding Lei, Li Siwei, Huang Ruiting, Yu Huikun, Yu Lie
    Accepted: 2025-05-09
    To address the privacy leakage risks posed by the labeled sensor data collected by wearable fitness action recognition methods and the limitations of traditional centralized model training methods in adapting to new users, this paper proposes a fitness action recognition method based on wearable sports knee sleeve and personalized federated learning. The method achieves both fitness action type and user performance level recognition, enabling users to gain insights into their fitness performance and enhance their exercise outcomes. First, the method treats each user as an independent task and employs a federated approach to meta-train a global embedding network that learns shared representations across users, enabling effective generalization to any user. Then, through an adaptation process, a two-stage fine-tuning of the local classification network is performed on the basis of the global embedding network, generating a personalized model for each user. Finally, through extensive experiments on real-world fitness datasets demonstrate that the proposed method achieves 100% accuracy in fitness action type recognition and 95.94% accuracy in user performance level recognition, significantly outperforming existing state-of-the-art methods. The experimental results indicate that the system not only protects user privacy but also exhibits excellent generalization capability for new users.
  • ZHAI Sheping, MA Mengyao, ZHANG Wenjing, YANG Rui
    Accepted: 2025-05-08
    Existing knowledge graph completion methods fail to adequately differentiate semantic distinctions across paths of varying hierarchies, and the representations of relationships do not sufficiently leverage neighborhood context information for dynamic adjustments, resulting in an incomplete understanding of contextual semantics. To address the above issues, a knowledge graph completion model, RCSKGC, is proposed, aiming to solve these problems by enhancing the semantic representation of learning paths and neighborhood information. Initially, local and global encodings of multi-hop paths at various levels are performed using bidirectional gated recurrent units and attention mechanisms, enabling the effective extraction of relevant path information. Moreover, relation embedding contrastive learning is employed to further refine the fine-grained semantic features of the path information. Subsequently, a dual attention mechanism and dynamic weighting strategy are utilized to capture hierarchical neighborhood information, with the semantics of relations being learned through a “neighborhood-entity-relation” framework. Finally, the two types of relational representations are aggregated to obtain the final representations of relationships, which is then input into the decoder for completion. Experimental results demonstrate that on the FB15k-237 dataset, RCSKGC outperforms the best results among the baseline methods, achieving improvements of 1.4, 0.8, 1.3, and 2.1 percentage points in MRR, Hits@1, Hits@3, and Hits@10, respectively. On the WN18RR dataset, RCSKGC achieves comparable performance to the best baseline result in Hits@1, while improving MRR and Hits@3 by 0.8 and 1 percentage points, respectively, thereby validating the efficacy of the proposed method.
  • Zhu Li, Xu Wanru, Gao Jingkai, Zhu Chunqiang, Deng Fan
    Accepted: 2025-05-08
    Accurate generation of multivariate time series data provides an effective way to solve the problem of insufficient data scale, and is crucial for downstream tasks such as power load forecasting and wind and solar power generation evaluation. However, the existing methods are difficult to capture the long and short term dependence and the correlation between variables, and lack of interpretability, which can not meet the needs of energy system analysis. Therefore, an Interpretable Diffusion Model for Multivariate Time Series (IDMTS) is proposed. Firstly, Transformer architecture containing Triplet Attention (TA) is introduced into the denoising network of diffusion model to capture long and short term dependencies and variable feature associations. Then, combined with the multiscale trend seasonal decomposition, the trend term and the season term are modeled respectively by using Bidirectional Long Short-Term Memory (BiLSTM) and Fourier Attention (FA). Improve the accuracy and interpretability of generated data; At the same time, the generation quality is optimized by multiscale Adaptive Maximum Mean Discrepancy (Ada-MMD) loss function. The experimental results show that the generation accuracy of IDMTS on the four public data sets is significantly better than that of the baseline method, in which the Context-FID score, correlation score, discriminant score and prediction score are reduced by 51.5% to 84.5%, 4.1% to 26.8%, 24% to 68.8%, and 0.3% to 40%, respectively. At the same time, IDMTS shows good interpretability and generalization ability in interpretability experiment, conditional interpolation experiment and prediction experiment.
  • Zeyuan Cui, Wenhan Ge, Junfeng Wang
    Accepted: 2025-05-08
    Cyber Threat Hunting enables rapid response to attack events through proactive discovery of attack clues and malicious evidence. While existing Cyber Threat Hunting methods can search across extensive information sources, they face challenges in real-world scenarios due to the problems of insufficient prior knowledge and sparse feedback. To address these problems, this paper proposes a Cyber Threat Hunting algorithm, RE-HUNTER, based on Large Language Models and Reinforcement Learning. To address the lack of prior knowledge, this method constructs a contextual vector database and leverages domain expertise from Large Language Models as well as unstructured knowledge from Cyber Threat Intelligence for cold-start decision-making to initialize Reinforcement Learning weights. To address the problem of sparse feedback, this method improves the Monte Carlo Tree Search algorithm by introducing a recursive update mechanism and a method similarity mechanism to amplify the feedback on the execution results of both entities and methods. Experiments conducted on 186 real-world attack cases demonstrate that this model significantly improves search efficiency compared to the current state-of-the-art baseline methods. Within the 0–2000-step range, the average recall rate achieves an 18.24% relative improvement. Notably, in the 0–250-step cold-start scenario, the average recall rate attains an 86.28% relative improvement over the best baseline method. Furthermore, ablation experiments indicate that each component of the proposed method positively contributes to the overall performance, effectively reducing the cost of Cyber Threat Hunting.
  • Zheng Mingyu, Shao Huichao, Shao Yanhua, Chu Hongyu
    Accepted: 2025-05-07
    Object detection and multi-target tracking technologies are becoming increasingly mature. However, when performing aerial multi-target tracking tasks in complex scenarios, issues such as small target size, large size variation, and occlusion still lead to unsatisfactory detection and tracking performance. Therefore, this paper proposes an aerial multi-target tracking algorithm based on an improved YOLOv8 and ByteTrack (YBTrack). First, a detector (MSA-YOLO) is constructed. The original convolution in YOLOv8 is replaced with a space-depth convolution, which transforms spatial information into channel dimensions, effectively preserving target details and reducing missed and false detections caused by information loss during multi-scale feature map fusion. At the same time, a lightweight accelerated space-channel attention module is designed for neck convolution to reduce computational complexity. This module also acts as a feature refinement module before the detection head, further enhancing the ability to extract target feature information. Next, to improve tracking performance, the ByteTrack tracking model is optimized. A spatial-appearance similarity matrix (ASM) is designed to enhance the model's ability to distinguish similar targets. Additionally, a target correction function is proposed to reduce the error accumulation of the Kalman filter, decreasing target offset and loss rates. Finally, the MSA-YOLO and the optimized ByteTrack are combined for multi-target tracking experiments. MSA-YOLO achieves a 9.4% improvement in mAP_0.5 on the VisDrone2019-DET dataset. The multi-target tracking algorithm improves MOTA by 11.2% and 8.3% and IDF1 by 8.9% and 7.4% on the VisDrone2019-MOT and MOT17 datasets, respectively. Experimental results demonstrate the significant tracking performance of the proposed method. Furthermore, comparison experiments with other multi-target tracking algorithms also confirm the superiority of the proposed algorithm.