Author Login Chief Editor Login Reviewer Login Editor Login Remote Office
Home Browse Just accepted

Just accepted

Accepted, unedited articles published online and citable. The final edited and typeset version of record will appear in the future.
Please wait a minute...
  • Select all
    |
  • PU Zhenyu, LIU Zhiwei, HUANG Bo, HE Shufeng, CHEN Nanxi, HAO Wenzeng
    Accepted: 2025-04-25
    In the modern industrial sector, the perception and analysis of text data have become essential for promoting intelligent manufacturing and optimizing production processes. However, industrial text data is typically characterized by high specialization, diversity, and complexity, along with high annotation costs, making traditional large-scale annotation methods unsuitable. Existing few-shot named entity recognition(NER) methods often use prototypical networks to classify entities, where the prototype is the average of the features of all samples belonging to the same category. These methods, however, are highly sensitive to support set data and prone to sample selection bias. To address this, we propose a few-shot named entity recognition model based on distribution calibration—DC-NER(Distribution Calibration-based Named Entity Recognition). The model innovatively decomposes the task into two phases: span detection and entity classification. During the entity classification phase, a precise distance measurement function is employed to identify similar categories between the source domain and the target domain. Based on this, the distribution of samples in the target domain is corrected to generate more accurate class prototypes. Experimental results on both in-domain dataset (Few-NERD) and cross-domain dataset (Cross-NER) demonstrate that DC-NER significantly outperforms comparative models in terms of F1 score, validating its effectiveness in few-shot named entity recognition.
  • ZHANG Kejia, WANG Xiaofan, LIU Tao, LIU Zongbao, ZHANG Yan, WANG Chenyu, WANG Haoran
    Accepted: 2025-04-25
    The annotation of diagenetic facies samples is a crucial step to ensure the accuracy of intelligent diagenetic facies recognition. In response to the problems of large sample demand and low accuracy in the automatic annotation technology of diagenetic facies samples, this paper proposes an automatic annotation method - AP-GCN, which combines Affinity Propagation Clustering and Graph Convolutional Neural Network. This method fully integrates the advantages of Affinity Propagation Clustering in capturing complex correlation relationships and the ability of Graph Convolutional Neural Network in mining spatial distribution features. The Fuyu oil layer in the Zhouliu block of the Sanzhao depression in the Songliao Basin is selected as the target area to achieve the automatic annotation of diagenetic facies samples. Firstly, the diagenetic facies types are summarized and the logging curve data is preprocessed, with a small number of labels annotated, thereby constructing an automatic annotation dataset, laying the foundation for the subsequent automatic annotation process. Secondly, the graph structure is constructed by using Affinity Propagation Clustering to establish the correlation between the depth nodes of the logging curves. Then, the node features are aggregated through the graph convolutional layer to achieve rapid and accurate annotation of diagenetic facies. Finally, a comparative experiment is designed to verify the effectiveness of the proposed method. The experimental results show that the Precision of AP-GCN method for various diagenetic facies annotations is above 86%, the Recall is above 90%, and the F1 score is above 88%. The accuracy of automatic annotation of diagenetic facies samples is 90.6%, which proves the effectiveness and practicality of this method and provides a new solution for the automatic annotation of diagenetic facies samples.
  • Zhang Linghao, Tan Haibo, Zhao He, Chen Zhong
    Accepted: 2025-04-24
    In recent years, large models have become one of the most important tools in human life. As a bridge for communication between humans and large models, prompts play a crucial role. While high-quality prompts can fully unleash the potential of large models, their design requires both specialized skills and substantial resources, leading to the emergence of prompt trading markets and licensing service models. However, treating prompts as commodities presents three main challenges: 1. Once prompt text is leaked, it can be easily copied and disseminated, losing its value. 2. There is a lack of objective standards and pricing systems for prompt quality. 3. Users have difficulty holding providers of false prompts accountable, and the actions of centralized platforms can also affect the rights of providers. To address these challenges, this paper proposes PromptDEX, a blockchain-based service platform for large model prompts. Through a prompt rental mechanism based on smart contracts and service templates built on LangChain, the platform enables direct interaction between providers and demanders, reducing the impact of centralized institutions on participants’ interests and privacy. All transaction records are transparently stored on the blockchain, ensuring security, reliability, and accountability. Additionally, a dynamic pricing mechanism based on demand-side ratings is implemented. Experimental results show that the additional cost incurred by integrating blockchain is negligible, calculated in terms of fractions of a cent. Providers can build API services with only about 10 lines of code, requiring minimal network bandwidth, demonstrating strong feasibility and practicality.
  • Du Kangning, Yang Xiaochen, Zhang Benkui, Wang Jinxiao, Song Peiran, CaoLin
    Accepted: 2025-04-24
    With the frequent occurrence of terrorist attacks, the crowd evacuation path planning problem in indoor public places has received increasing attention. Aiming at the crowd evacuation path planning problem in indoor public places, a path planning method based on Proximal Policy Optimization (PPO) algorithm is proposed to improve the efficiency and safety of pedestrian evacuation. First, the indoor terrorist attack scenario is described, and the static obstacles, idle locations, dynamic obstacles, exits and pedestrians in indoor public places are modeled using a cellular automata model. On this basis, a feature construction method based on distance information is proposed to construct pedestrian features including shortest path features and safe path features by combining the distance from pedestrians to exits in non-threatening environments and threat-facing scenarios, so as to portray the escape difficulty of evacuation paths. Finally, by describing the evacuation path planning problem as a reinforcement learning problem, a reward function based on evacuation efficiency, death penalty and successful escape reward is designed. Through the feedback of the real-time environment, the evacuation strategy is provided to the pedestrians, which in turn realizes the overall optimization of the escape path by the PPO algorithm. Compared with existing field methods, this method can improve the efficiency and safety of crowd evacuation in different simulation scenarios, especially in complex and high-density environments. Meanwhile, the effectiveness of the shortest path feature and the safe path feature is verified by ablation experiments.
  • LIU Yanghong, FUYANG Youran, DONG Xingping
    Accepted: 2025-04-24
    The generation of high-definition (HD) environmental semantic maps plays a crucial and irreplaceable role in autonomous driving systems. To address the modality discrepancy between cameras and LiDAR in perception tasks, this paper proposes an innovative multi-modal fusion paradigm, HDMapFusion. Unlike traditional methods that directly fuse raw sensor data, this approach achieves physically interpretable fusion of multi-modal information by unifying camera and LiDAR features into a bird's-eye view (BEV) representation. Experimental results on the nuScenes benchmark dataset demonstrate that HDMapFusion significantly outperforms existing baseline models in HD map generation accuracy, with a 23.0% improvement in IoU score, fully validating the effectiveness and superiority of the proposed method.
  • GAO Yufei, JIA Xin, HUANG Zhangchi, XU Zhinan, HUO Pengfei, LU Zhiyin
    Accepted: 2025-04-21
    Multi-view 3D reconstruction aims to reconstruct the 3D shape of a given object from multiple 2D images. However, existing methods usually ignore the learning of both rotation invariance and regional consistency of objects. It is difficult to aggregate features from multiple views accurately, resulting in the loss of fine-grained details in reconstruction results. To address this challenge, a Dual-view Point cloud reconstruction based on Rotation-invariant Regional consistency, called DPR2, is proposed. It takes two RGB images as input and learns the regional consistency across views on the basis of exploring the rotation invariance of object region to promote feature aggregation, reconstructing a refined point cloud. In encoding, a point cloud initialization network is introduced to initialize a rough point cloud for each view. Besides, a region-level rotation-invariant feature extraction network is presented. It captures rotation-invariant features from different regions of the rough point clouds by utilizing the Euclidean distance between points. In decoding, a dual-stage cross-attention mechanism is devised, which learns high-quality region consistency across the point clouds, achieving feature aggregation accurately. Moreover, a point cloud refinement network is developed to refine the rough point cloud as a point cloud with fine-grained details and smooth surfaces using the aggregated features. Extensive experiments on the ShapeNet and Pix3D datasets show that the DPR2 outperforms existing SOTA methods in terms of reconstruction performance. Compared to SOTA methods P2M++ and MVP2M++, the CD metric has improved by 23.62% and 9.06%, respectively.
  • Yuebing Liang, Bohao Qian, Mengying Zhu , Xiaolin Zheng
    Accepted: 2025-04-21
    With the widespread adoption of microservice in digital service networks, the large scale of service nodes and the complexity of call graphs in these networks present significant challenges for operations management. Although distributed tracing technology has made significant progress, it still has limitations, such as the need to invade source code and reliance on specific middleware, leading to insufficiencies in the accuracy and completeness of tracing chains and affecting the reliability of downstream analysis tasks based on observability data. To address this problem, we propose a Trace-Metric Oriented Heterogeneous Dynamic Graph Neural Network, named TM-HEDGE. First, we construct a heterogeneous dynamic directed acyclic graph by incorporating metric data. Then, we propose an intra-snapshot heterogeneous attention encoder and an inter-snapshot Transformer encoder to learn node heterogeneous spatiotemporal representations. Finally, we complete missing tracing chains through link completion, achieving tracing reconstruction. Experimental evaluation results show that TM-HEDGE, when performing tracing reconstruction tasks, improves accuracy by 5.22% on average compared to existing stateof-the-art link completion GNNs on three public datasets, which significantly enhances the completeness and accuracy of tracing chains in digital service networks.
  • GUO Yu-xin, JIA Xiang-dong, LI Yue
    Accepted: 2025-04-21
    Reconfigurable intelligence surface (RIS) is considered a promising technology for future wireless communication. Unmanned aerial vehicles (UAV) are increasingly gaining attention in the wireless communication field due to their unique advantages. However, with the increasing complexity of communication environments, information transmission faces increasingly severe eavesdropping risks, and the integration of RIS with UAV opens up new solutions and possibilities for the future field of wireless communications. Therefore, a secure wireless communication system based on RIS-UAV is proposed. The system utilizes the flexibility of UAVs and the dynamic adjustment capability of RIS to counter potential eavesdropping behavior effectively. Starting from the physical layer security perspective, the system considers the phase shift of RIS and the flight trajectory of UAV, aiming to maximize the average security rate of the system. Since the problem is inherently non-convex, the solution decomposes the problem into two subproblems. First, the phase shift's closed-form solution has been derived from the system's unique structure. Subsequently, the flight trajectory of the UAV is converted into a convex optimization problem via first-order Taylor expansion, and the iterative convex approximation method is used for the iterative solution. The simulation results indicate that, in comparison with the benchmark scheme, the average secrecy rate of this system has increased by approximately . The performance has also improved significantly compared with the fixed RIS and no RIS schemes. Additionally, this scheme demonstrates distinct advantages in aspects such as security and computational complexity, with a computational complexity of .
  • ZHANG Ruijia, MA Huifang , ZHANG Yingyue, PENG Shengjiang
    Accepted: 2025-04-21
    Dissolved Gas Analysis (DGA) aim to identify faults in oil-immersed power transformers by monitoring the dissolved gases in the insulating oil. However, the effectiveness of current DGA methods is hindered by the lack of labeled data, resulting in sub-optimal performance. To address this limitation, this study introduces a novel Graph Knowledge Distillation approach (GKDG) to enhance the accuracy and efficiency of DGA. The approach employs a dual-perspective graph construction strategy to leverage additional supervision from sample neighborhoods, facilitating direct information aggregation through propagation. Furthermore, by distilling knowledge from a teacher Graph Neural Network(GNN) into the student GNN model, the student model is equipped to adeptly capture and interpret the intricate relationships among dissolved gases. We then introduce diverse knowledge to align the student and teacher graphs in the embedding space to enhance the learning capacity of the student model, enabling it to benefit more effectively from the teacher model. Extensive experiments have confirmed the significant effectiveness of this method in enhancing DGA performance, thereby providing robust support for the maintenance and fault detection of power equipment.
  • LUO Li, FENG Quanyou, ZHOU Li , TIE Junbo, GONG Rui, PAN Guoteng
    Accepted: 2025-04-17
    core configurations, cache coherence protocols are becoming increasingly complex. The verification of cache coherence protocols for chiplet-based microprocessors is of significant value. The directory is the most widely used circuits for implementing hardware coherence. This paper focuses on the agile verification of directory controllers for multi-core processors based on chiplets. By adopting an optimized random testing approach based on the negative selection algorithm, we achieved a 28% increase in functional coverage while reducing random test stimuli by 40%. By combining random and directed test stimulus, we ultimately achieved 100% functional coverage. Furthermore, we designed a coherence checker that monitors the lifecycle of coherence transactions flows. This tool enables rapid identification of design bugs, precise trace of transaction scenarios, and localization of 90% protocol bugs. Overall, these efforts have significantly improved both the efficiency and quality of the verification process.
  • LIANG Ziyi, WANG Zihao, LI Liping, LIU Tianquan, ZHU Yuanfei, LU Cunyue
    Accepted: 2025-04-16
    In the field of clinical medicine, pathological analysis of patient tissue sections is considered the gold standard for assessing complex diseases. Traditional super-resolution methods often fail to effectively capture fine structures and textures in pathological images, leading to suboptimal reconstruction performance. To address this issue, this paper proposes a novel super-resolution generative adversarial network based on a parallel attention mechanism, referred to as PASRGAN (Parallel Attention Super-Resolution Generative Adversarial Network). The proposed algorithm adopts a parallel execution of channel and spatial attention mechanisms to overcome the information dispersion issues inherent in traditional attention mechanisms. Furthermore, a feature grouping and channel shuffle strategy is introduced, which enhances feature diversity while maintaining low computational costs, thereby significantly improving the reconstruction performance of pathological images. Considering that most existing super-resolution studies on pathological images are conducted on simulated datasets, which fail to fully reflect the challenges of real-world image degradation, this paper constructs paired low-resolution and high-resolution image datasets based on the Camelyon16 dataset to validate the proposed algorithm's effectiveness in real-world scenarios. Experimental results demonstrate that, compared to state-of-the-art super-resolution methods (e.g., ESRGAN, CWT-Net, Histo-Diffusion, and URCDM), PASRGAN achieves superior performance with a PSNR of 25.33 dB, SSIM of 0.6659, and perceptual index (PI) of 5.14. In addition, PASRGAN achieves significantly lower parameter complexity (10.8M) and floating-point operations (FLOPs, 48.9G) compared to traditional methods, confirming its computational efficiency. Ablation studies further analyze the contributions of the parallel attention mechanism, the shuffle operation, and the improvements in the generator and discriminator structures, verifying their effectiveness.
  • Li Tianran, Piao Yong, Kong Zihan
    Accepted: 2025-04-16
    Nowadays, open source software is widely used in various industries, especially in key fields such as aerospace and automotive electronics, but most open source software has security vulnerabilities. In China, the component analysis and verification of open source software are seriously missing in software development and evaluation, which makes it difficult to ensure the safety of software in key areas. Therefore, software component analysis is indispensable to ensure software security, and accurate identification of Third Party Dependencies (TPDS) is the key to software vulnerability management and compliance assessment. To solve the above problems, this paper proposes a lightweight dependency analysis method for software component analysis, which improves the accuracy of TPD identification and the efficiency of large-scale project file processing. The main contents are as follows: first, the method includes an analysis algorithm for Java Maven project, which constructs the project structure model and extracts the third-party dependency information by identifying the construction configuration file of the project; Second, the method includes a redundant dependency detection algorithm based on winnowing algorithm. The algorithm detects the actual use of third-party dependencies and eliminates redundant dependencies by comparing the code file and the hash fingerprint that identifies the third-party dependency information step by step; Thirdly, based on the proposed algorithm, a lightweight component analysis framework is designed and implemented. The framework wraps the analysis algorithm through a specific analyzer class, registers and executes the analysis task using the ServiceLoader API in Java. In order to verify the effectiveness of the method, we built a database containing 56870 different versions of TPD, and collected four real open source projects from GitHub for experimental verification. The results show that the proposed algorithm performs well in detection accuracy: compared with the clustering algorithm based on machine learning and the technology based on code similarity comparison, the proposed algorithm has higher accuracy, F1 score and lower detection time. In addition, the ServiceLoader API applied in the system makes the system more extensible, convenient for adding different analysis algorithms, and has strong practicability, which lays the foundation for the subsequent implementation of multilingual TPD detection.
  • Wei Fangda, Liu Miao, Sun Yi, Wang Jing, Zhao Shenghui
    Accepted: 2025-04-15
    Deep learning has achieved significant success in fields of computer vision and speech signal processing. However, the rapid development of deep learning has also brought negative impacts. All kinds of fake videos and voices are flooding on the Internet. Some criminals use deep learning technology to replace the face of the original video, edit facial attributes, synthesize the speaker's voice, and clone speaker's voices. Criminals can cause social unrest and chaos by producing pornographic videos, fake news, political rumors, etc., threatening personal interests and national security. Many scholars have proposed solutions from different perspectives to eliminate these negative effects. Early forgery mainly focused on single-modal forgery. Therefore, most current solutions focus on single-modal forgery recognition problems and fail to consider the intrinsic relationship between audio and video fully. Existing single-modal detection methods often exhibit suboptimal recognition performance when both audio and video are forged. Recently, with the deepening of research, some scholars have begun to explore the use of multi-modal models for forgery detection and have achieved remarkable results. This survey reviews video, voice forgery, and forgery detection technologies, collects and sorts video, voice, audio and video forgery data sets, and summarizes multi-modal forgery detection methods. Finally, the existing problems and research directions of current detection technology are analyzed and suggestions are given.
  • XU Zhigang, YU Hao
    Accepted: 2025-04-14
    Murals, as an important part of cultural heritage, have received widespread attention for their digital preservation and restoration in recent years. However, the super-resolution reconstruction of mural images often faces challenges such as texture blurring and the loss of original information. To address these issues, this paper proposes a Reference-based Two-stage Mural Image Super-Resolution Reconstruction (RTMISR) method. First, a multi-scale residual feature extraction module is employed to accurately capture the feature relationships between high-resolution and low-resolution mural images, ensuring the complete retention of low-resolution image information and achieving an initial reconstruction of mural contours and partial details. Then, a texture feature enhancement module is introduced, utilizing a coarse-to-fine feature matching method to extract high-quality texture information from reference images and effectively integrate it into the reconstructed images to enhance texture detail representation. Moreover, to ensure the relevance and quality of the reference images, a reference image selection module is designed to select reference images that are highly correlated with the target low-resolution image. Experimental results on mural datasets show that, compared to representative super-resolution methods such as SRGAN, MADNet, and ESRT, RTMISR achieves superior performance in objective metrics: for ×2 super-resolution, PSNR is improved by an average of 2.83 dB, and SSIM by 0.04; for ×4 super-resolution, PSNR is improved by an average of 2.00 dB, and SSIM by 0.02. In terms of subjective visual quality, RTMISR effectively retains the original information of murals while enhancing the texture details of mural images, achieving a better balance between model complexity and reconstruction performance.
  • SUN Yu, WANG Honejie, DU Yanhui, LIU Nan
    Accepted: 2025-04-14
    Deep neural network language models are vulnerable to adversarial attacks during application, where adversarial samples can be generated by adding small perturbations to original samples to mislead models into making incorrect decisions. Research on adversarial sample generation methods effectively reveals and evaluates model robustness deficiencies. Existing Chinese adversarial sample generation methods mostly focus on improving attack success rates while neglecting quality metrics like sample stealthiness. This research focuses on Chinese text adversarial sample generation techniques and proposes CMSPSO, a multi-level adversarial sample generation method that combines Chinese character glyph and semantic information, considering the unique characteristics of Chinese characters in glyph structure and semantic features. CMSPSO uses particle swarm optimization algorithms to search for suitable replacement combinations in pre-designed replacement knowledge bases to generate adversarial samples. CMSPSO-M combines visually similar multi-language character features and constructs a high-quality visual replacement character knowledge base through trained siamese neural networks to calculate visual similarity for character-level adversarial sample generation. CMSPSO-S builds semantic replacement word knowledge bases based on HowNet and WordNet to generate word-level adversarial samples, evaluated through attack effectiveness and attack cost metrics. Experimental results demonstrate that CMSPSO exhibits significant attack effectiveness across multiple models and datasets. In particular, CMSPSO-M achieves an attack success rate of 84.22% against the Roberta model on the XNLI dataset. Furthermore, CMSPSO shows clear advantages in attack cost metrics and outperforms baseline methods in overall performance.
  • LIU Genhao, ZHANG Neng, ZHENG Zibin
    Accepted: 2025-04-11
    API usage constraints are conditions or restrictions that developers must follow when invoking APIs to ensure correct usage and prevent misuse. API documentation serves as an important source for extracting these constraints. Existing NLP-based methods for extracting API usage constraints often rely on syntactic patterns but have limited ability to handle complex coordinated sentences and impose strict requirements on syntactic structures. To address these issues, this paper proposes an API usage constraint knowledge extraction method based on large language models, referred to as AUCK. AUCK first preprocesses Java API documentation and extracts sentences containing API usage constraints. Then, it summarizes syntactic patterns of coordinated sentences and designs corresponding cases to guide the large language model in decomposing coordinated sentences into simple sentences. Finally, it summarizes syntactic patterns of triplets and designs cases to guide the large language model in extracting API usage constraint triplets. Experimental results on Java API documentation show that AUCK achieves an accuracy of 92.23% and a recall of 93.14%, significantly outperforming existing methods, including DRONE (accuracy: 80.61%, recall: 86.81%), the mainstream triplet extraction tool OpenIE (accuracy: 76.92%, recall: 52.63%), and the large language model ChatGPT-3.5 (accuracy: 82.23%, recall: 67.71%). In addition, applying AUCK to Android and Python API documentation verifies its good transferability.
  • LI Yang, JIANG Yi, CHEN Shuai, YAN Shichao, WANG Lei, MA Li
    Accepted: 2025-04-11
    Personalized federated learning algorithms have great advantages in handling non-independent and identically distributed (Non-IID) datasets and client-side model personalization. Personalized federated learning based on hypernetwork utilizes the client's own hypernetwork to achieve personalized client model. However, the impact of sharing client-side hypernetwork parameters and client-side data on the accuracy of client-side personalized models is still unclear. The multi-layer hypernetwork personalized federated learning (pFedMHN) framework is proposed to optimize client models through the weighted aggregation of local and global hypernetworks. The server learns a global hypernetwork and each client's multi-layer local hypernetworks, then aggregates them. Clients use these aggregated hypernetwork parameters to iteratively update their models, resulting in more accurate personalized models. Experimental results show that on general public datasets, pFedMHN outperforms four benchmark algorithms in terms of accuracy, effectively solves the problems of data heterogeneity and model accuracy faced in personalized federated learning on Non-IID datasets, and achieves a more accurate personalized model for clients by utilizing hypernetwork parameters and client data sharing.
  • LIANG Xuning, WANG Siqi, YANG Hailong, LUAN Zhongzhi, LIU Yi, QIAN Depei
    Accepted: 2025-04-11
    Large Language Models(LLMs) have demonstrated remarkable capabilities across a wide range of natrual language processing tasks. However, LLMs come with an extremely high amount of parameters, posing significant challenges on inference tasks with GPU memory bottleneck. To address the above issues, AdaptiveLLM is proposed to select the optimal offloading strategy between tensor swapping and tensor re-computation, with awareness of the real time workload. To extract the workload characteristic during inference tasks, AdaptiveLLM employs a black-box ML model for tensor swapping overhead prediction by operator-level complexity analysis, and conducts fine-grained KV Cache memory usage modeling for tensor re-computation overhead estimation. To select the offloading strategy adaptively, AdaptiveLLM adopts a cost-aware memory optimization algorithm during preemptive scheduling phase, selecting the method with lower overhead with limited GPU memory. AdaptiveLLM also introduces a fairness-based request scheduling strategy during startup scheduling phase, handling a larger batch of user requests following fairness-oriented principles when GPU memory is available. The experiment shows that, compared with the mainstream LLM inference baselines, AdaptiveLLM improves the overall throughput, while achieves fairness-oriented scheduling by reducing the average weighted around time.
  • WEI Xin, PENG Ningning
    Accepted: 2025-04-11
    To address the limitations of stability in complex network construction and poor classification performance of shape features extracted in complex situations in existing complex network image shape classification algorithms, this paper proposes a shape classification algorithm based on continuous homology and complex networks. The algorithm combines complex networks with Vietoris-Rips filtration to build a persistent complex network on the image contour point cloud. Persistent homology is used to compute global topological features at different dimensions. Local shape features are extracted from the degree distribution, and these are fused with global topological features to form two feature sets: PHCND and PHCNJD, enhancing the image’s shape representation. The fused feature vectors are classified using Linear Discriminant Analysis (LDA). Experiments on nine benchmark datasets show that our algorithm outperforms traditional methods and ResNet-50. Ablation studies confirm the effectiveness of global topological features and the complementarity between local and global features in the persistent complex network. Results demonstrate that our algorithm achieves the highest accuracy and F1 score on five datasets, with accuracy improvements of 2.2%–30.3% and F1 score improvements of 2.2%–30.9% compared to seven traditional algorithms. These findings validate the effectiveness and robustness of the algorithm for image shape classification.
  • Wang XiaoLong, Wang JiaLiang, Ji Qing, Hou FengYao
    Accepted: 2025-04-10
    In recent years, a domestic deep learning accelerator has developed rapidly. The hardware resources have been changing continuously and a series of tensor core instructions have been introduced, which makes it a huge challenge for developers to manually adapt and optimize the convolution operator in the accelerator. To this end, this paper proposes a convolution code generator for the domestic accelerator to simplify the adaptation and optimization process of the convolution operator. The generator provides configuration parameters as an external interface, and users only need to configure parameters to generate specific convolution operators. The generator itself consists of a three-layer architecture: the instruction layer encapsulates the underlying instructions and distinguishes them according to the hardware architecture; the component layer organizes the corresponding instructions according to the preset hardware architecture information, and provides highly abstract and reusable functional components from the perspective of thread blocks and thread bundles. The operator construction layer splices the functional components according to the implicit convolution algorithm and finally generates the convolution operator. To ensure the computing performance of the convolution operator, the generator is optimized from two aspects: using vectorization algorithm and thread partitioning algorithm to optimize the global memory access performance; using the transposition algorithm to transform the thread structure of the multiply-accumulate instruction to optimize the write-back performance. The test results show that the optimization algorithm of the generator can significantly improve the operator performance; under two hardware versions, the convolution operator performance of the NHWC storage layout reaches 95% and 90% of the official operator performance respectively. The generator provides a new solution for the adaptation and optimization of the convolution operator of domestic accelerators.
  • LIU Xiangbin, FANG Cheng, LIU Shuai
    Accepted: 2025-04-10
    As one of the core tasks in the field of computer vision, real-time semantic segmentation plays a very critical role in many aspects such as unmanned vehicle driving and traffic control system. Existing real-time semantic segmentation algorithms based on encoder-decoder structure usually achieve real-time performance at the cost of segmentation accuracy. However, in order to ensure real-time performance, such algorithms usually have a small receptive field, which leads to poor segmentation effect on large-scale objects in road scenes. Therefore, this paper proposes a real-time semantic segmentation algorithm for road scene based on encoder-decoder structure to solve this problem. Firstly, a multi-scale feature fusion mechanism is introduced in the feature extraction stage to effectively fuse the receptive field features within a large scale and improve the segmentation effect on large-scale objects. Then a polarized self-attention mechanism is designed at the end of the encoder to enhance the local perception in the large-scale receptive fields and further improve the segmentation effect on large-scale objects. The algorithm was implemented and tested on the Cityscapes and Camvid datasets. The experimental results show that a single NVIDIA RTX 3090 can gain a MIoU of 80.6 and 81.1 at 43.5FPS and 91.2FPS respectively, achieving better segmentation accuracy.
  • WU Zhengjiang, WANG Mengsong, WU Xingchen
    Accepted: 2025-04-10
    To address the problem of redundant tolerance classes in asymmetric tolerance relation, and to improve the computational efficiency of approximations. this paper introduces a Boolean matrix representation of the upper and lower approximations based on improved tolerance relation, and designs a block-matrix algorithm, and accelerates the approximations calculation process on GPU. Specifically, this paper proposes a nearest tolerance relation to construct the structure between objects in incomplete information systems, these structures are transformed into multiple local relation matrices to quickly calculate the nearest tolerance class, these vectorized class are loaded in blocks into memory to calculate the approximations. The experimental results on the UCI dataset and user-defined dataset show that the tolerance class has been reduced, the block-matrix algorithm based on the nearest tolerance relation effectively accelerates the approximates computation process on GPU, compared with CPU and distributed computing, the average execution speed of block-matrix algorithm has increased by 16.69 times and 3.89 times respectively.
  • He Mingyan, Li Siyuan, Liu Peng, Huang Jianhua
    Accepted: 2025-04-10
    As a key technology of multi-agent game confrontation, opponent modeling aims to learn the behavior of the opponent to reduce the uncertainty of the environment and help decision-making. However, most of the existing opponent modeling methods adopt the structure of offline training and online adaptation. In offline training, the traditional neural dynamics model is used to predict the agent step by step, which is easy to form a single step error and then a cumulative error. In addition, when facing an unknown opponent in online adaptation, the planned state of the controlled agent will deviate from the distribution of the data set. In order to solve the above problems, a framework based on diffusion model and cross-attention is proposed to establish correlation with opponent. The cumulative bias problem is solved by using the feature that the diffusion model can generate multi-step planning sequence at the same time. The concept of strategy set is proposed, and the deviation problem is not only solved by online fine-tuning, but also the problem that the offline strategy will be destroyed in the initial stage of online training. Experimental results in both open intensive reward and sparse reward competitive environments fully demonstrate the superior performance of this method.
  • Mao Yuyang, Xu Chongjun, Yang Huayu, Zhai Xilin, Zhang Hua
    Accepted: 2025-04-10
    The number of malicious websites accessed by in-vehicle third-party services has been rapidly increasing, posing a significant threat to the security of the Internet of Vehicles. Currently, there are three major challenges in IoV malicious website detection: traditional tools exhibit high detection latency when processing large-scale website data, the presence of obfuscated malicious URLs reduces detection accuracy, and the difficulty in obtaining malicious website datasets further hinders effective detection. These factors collectively limit both the efficiency and accuracy of the detection process. To address these issues, this paper proposes a multi-stage rapid malicious website detection method based on logistic regression. The method uses search engines for preliminary filtering of legitimate websites to reduce computational resource wastage. It designs matching rules through the analysis and summarization of malicious obfuscation techniques and introduces a heuristic rule-based malicious website filtering method to effectively filter obfuscated website URLs, overcoming the limitations of traditional tools in detecting URLs with malicious obfuscation. To further enhance detection accuracy, it constructs a comprehensive and lightweight set of malicious website detection features and employs logistic regression classification for feature extraction and analysis. Experimental results demonstrate that the MSHL method significantly outperforms traditional methods in terms of accuracy and efficiency in malicious website detection, achieving an accuracy of 98.1% on public datasets and reducing detection time by approximately 75%
  • HAN Yanling , ZHU Xiaojun , WANG Jing , PAN Haiyan, ZHANG Yun
    Accepted: 2025-04-09
    Sea ice thickness, a key parameter in global climate change research, plays a vital role in regulating the Earth's climate system, ocean circulation, and heat exchange. However, due to the influence of the height change of the physical characteristics of sea ice, the accurate inversion of sea ice thickness faces great challenges. In response to these issues, this paper presents an improved passive microwave remote sensing method for sea ice thickness inversion, SIT-TransNET, which utilizes brightness temperature data from the AMSR2 satellite. The method incorporates auxiliary data, including surface snow temperature, sea surface salinity, and 1.4 GHz brightness temperature, to explore complex correlations with sea ice thickness. Analysis of the importance of different features and the establishment of various feature fusion schemes enhance the effective representation of sea ice thickness. Through the self-attention mechanism and multi-head attention mechanism of the SIT-TransNET model, contributions of different features and their combinations to sea ice thickness inversion are captured, allowing dynamic adjustment of feature weights to achieve precise inversion of sea ice thickness. Experimental results demonstrate that the SIT-TransNET method significantly improves the accuracy of sea ice thickness inversion, with a coefficient of determination (R⊃2;) reaching 0.96 and a root mean square error (RMSE) of 6 cm. This method proves suitable for sea ice thickness inversion and provides an effective technical means for large-scale sea ice thickness monitoring and climate change research.
  • PAN Keyue, GUO Wei, CHENG Xiang, LIU Yi
    Accepted: 2025-04-09
    Scene elements are fundamental for understanding urban geographic information, and their accurate extraction is essential for smart city development and geographic information systems. To address the complexity of street view images, limitations of existing deep learning models in interpreting complex scenes, and challenges in associating visual data with context, a method based on large multimodal models for extracting typical scene elements from street view images is proposed. Firstly, the approach extends the LLaVA by integrating a multilayer perceptron and a high-resolution visual encoder to create GeoLLaVA. Secondly, a Street View Visual-Instruction Following Dataset is constructed for scene element extraction tasks, providing multidimensional instructions. The model was fine-tuned using visual instructions to enhance its contextual understanding. Low-Rank Adaptation (LoRA) is used to optimize computational efficiency. Finally, GeoLLaVA generates multidimensional scene descriptions from street view images and extracts key element keywords for effective scene element extraction. In comparative experiments with semantic segmentation, object detection, and other multimodal models, GeoLLaVA demonstrates significant advantages, achieving F1 scores of 0.938, 0.842, and 0.829 for the extraction of traffic signals, intersections, and parking lots, respectively. The comparison between the model before and after fine-tuning clearly demonstrates the effectiveness of the fine-tuning process. Ablation studies further validate the performance improvements achieved by the modified GeoLLaVA architecture, and LoRA effectively reduces computational resources consumption. Regional application experiments, using batch inference on street view images with geographic coordinates, a comparison with OpenStreetMap (OSM) data not only confirms the model’s accuracy but also highlights the limitations of OSM data in providing comprehensive element information.
  • LIU Junping, WANG Runpeng, HU Xinrong, PENG Tao, WANG Bangchao, YANG Huali, ZHU Qiang
    Accepted: 2025-04-09
    Entity Linking (EL) is the task of linking entity mentions in texts with the corresponding entities in a knowledge base. It plays a crucial role in information retrieval and question answering system. The challenge in entity linking lies in leveraging the context of mentions and the feature information of entities in the knowledge base to generate candidate entities and select the correct one. Although some approaches rely on certain strategies to generate relevant candidate entities and use feature information to select the appropriate entity . But , these approaches fail to learn deeper semantic information, resulting in low-quality candidate entities and may exclude some gold entities . Additionally , in certain specialized domains , the lack of sufficient entity information resources makes it difficult for some methods to interact on multiple levels . To address these issues, this paper adopts a two-stage EL method that initially generates high-quality candidate entities and subsequently integrates entity feature information for re-ranking. Specifically, this method employ a contrastive learning based on mixed negative sampling approach for retrieving high-quality candidates. Then, this method predicts the fine-grained entity type through weakly supervised learning, and re-ranks of candidates based on the coarse and fine-grained entity types. In the end, extensive experiments on three public datasets confirm that the method could improve the EL performance.
  • Yipeng Wu, Zhikun Huo, Mengzhi Han
    Accepted: 2025-04-09
    Currently, General-Purpose Graphics Processing Units (GPGPUs) are widely utilized for various computational tasks due to their robust parallel processing capabilities. However, GPGPUs employing the Single Instruction Multiple Threads (SIMT) execution model often encounter divergent control flow during kernel execution, leading to warp divergence and a subsequent decline in overall accelerator performance. To address the performance degradation caused by divergent control flow in kernel execution, this paper introduces a branch compilation optimization technique tailored for specific scenarios—MergeCFG. During the intermediate code optimization phase in the compiler, MergeCFG conducts control flow analysis to identify consecutive branch structures in the control flow graph that share identical conditional branches, thereby pinpointing potential optimization opportunities. Subsequently, based on instruction analysis, it assesses the feasibility of optimization to determine whether there exists an opportunity to reduce branch operations. Finally, by employing basic block duplication and merging techniques, it optimizes the control flow structure to minimize branch operations, thereby simplifying control flow and enhancing program execution efficiency. To validate the feasibility of this method, experiments were conducted on a domestic GPGPU using seven suitable benchmark test suites. The results demonstrate that this method effectively reduces branch operations within programs, leading to significant performance improvements in the optimized test cases. The average speedup across the evaluated cases ranged from 2% to 12%, with certain test cases exhibiting performance enhancements exceeding fivefold.
  • Lai Xiaoling, He Manman, Hu Wei, Zhang Yi, Du Puliang, Liu Rui, Song Xiaotong, Zheng Tingting
    Accepted: 2025-04-08
    Aiming at the problems of low accuracy and large noise of load data in traditional power load forecasting methods, this paper proposed a multi-factor power load forecasting method based on improved variational modal decomposition (VMD), convolutional neural network (CNN) and deformed length short-term memory network (Mogrifier LSTM). Firstly, it utilized the Sparrow Search Algorithm (SSA) to optimize the variational mode decomposition, and obtained the decomposition subsequence with the best effect, which effectively reduces the influence of load data noise on the prediction accuracy. Secondly, it analyzed the influence mechanism of each factor on load prediction, derived the correlation between each influencing factor and load by using Pearson's correlation coefficient, and eliminated redundancy features, which greatly reduces the probability of model inaccuracy. Finally, it used CNN to extract feature vectors, The decomposed load data and feature data such as temperature and humidity are fed into the CNN-Mogrifier LSTM deep network model.and input the decomposed load data and feature data such as temperature and humidity into the CNN-Mogrifier LSTM deep network model, and analyzed the feature data in multiple dimensions through the CNN-Mogrifier LSTM deep network model, so that improved the short-term load prediction accuracy. The results show that the multi-factor power load prediction model proposed in this paper has good adaptability and prediction effect. Compared with the sub-optimal VMD-CNN-Mogrifier LSTM model, the prediction accuracy of the proposed model on the two real datasets is improved by 0.5 and 2.4 percentage points, respectively, which provides a feasible solution for short-term power load forecasting.
  • MAYILAMU Musideke, GAO Yuxin, ZHANG Situo, FENG Ke, ABUDUKELIMU Abulizi, HALIDANMU Abudukelimu
    Accepted: 2025-04-08
    With the rapid advancement of general artificial intelligence technology, the application of foundational models in various fields has gained increasing attention. In the domain of image segmentation, the "Segment Anything Model" (SAM), as a core foundational model, has demonstrated significant advantages in improving both image understanding and processing efficiency. While SAM has shown strong performance in image segmentation tasks, there remains considerable room for optimization in areas such as power consumption, computational efficiency, and adaptability to diverse application scenarios. This paper provides an in-depth exploration of potential improvements to SAM across several key dimensions, including enhancing speed and computational efficiency, improving model accuracy and robustness, increasing adaptability and generalization, optimizing prompt engineering, and boosting data utilization and transfer learning capabilities. These enhancements aim to enable SAM to not only sustain high efficiency in more complex tasks but also better meet the requirements of various fields and application contexts. Additionally, this paper summarizes the practical applications of SAM in various fields, including medical imaging, remote sensing, and mechanical industries, demonstrating its suitability and challenges in different scenarios. Moreover, this paper provides a detailed overview of commonly used datasets and evaluation metrics in the field of image segmentation. Through experimental comparative analyses, the impact of Vision Transformer variants on SAM’s performance is assessed, alongside performance evaluations of enhanced models such as Efficient SAM, EfficientViT-SAM, MobileSAM, and Robust SAM. The challenges faced by SAM and its improved models in real-world applications are also discussed, and future research directions are proposed. The aim is to provide researchers with a comprehensive understanding of the advancements and applications of SAM and its variants, offering insights that may inform the development of new models.
  • Zhong Yihui, Ma Yin, Jiang Shaojin, Yang Fengyu
    Accepted: 2025-04-08
    Abstract In the field of industrial production, defect classification plays a crucial role in ensuring product quality and safety. However, industrial defect datasets are characterized by large intra-class variations and small inter-class differences, coupled with a limited number of defect samples, which leads to the poor performance of existing defect classification models in actual industrial environments. To address this issue, this paper innovatively proposes a defect classification algorithm for industrial applications based on variational feature disentanglement and sharpness-awareness. Firstly, a variational autoencoder is introduced to disentangle defect features into class-discriminative features and intra-class variance features. Then, the intra-class variance features are enhanced through a resampling strategy and combined with the original features to improve the discriminative power of feature representation. VFD enables the model to focus more on the class-discriminative features of defects while having a certain tolerance for details and backgrounds irrelevant to defect categories, thereby enhancing the defect classification performance of the model. Additionally, by introducing a sharpness-awareness training strategy, the geometric shape of the loss function is optimized, further improving the model's generalization ability. Experiments on the NEU-CLS steel rolling defect dataset, the GC10-DET metal defect dataset, and a self-made fastener defect dataset show that the accuracy of VFD-SA reaches 100%, 93.52%, and 99.48% respectively, significantly outperforming existing defect classification algorithms and fully meeting the defect classification requirements in various industrial scenarios.
  • WANG Xinxin , HU Haifeng, ZHANG Suofei, ZHOU Feifei and GONG Rui
    Accepted: 2025-04-07
    Previous studies on cross-view geo-localization have primarily focused on determining whether a query image matches a specific location within a predefined gallery. However, this approach often overlooks the rich multi-scale structural information present in geographic environments. For enhanced localization accuracy, it is crucial for a model to capture not only the fine-grained architectural details but also to comprehend the spatial relationships among various targets, including building clusters and environmental features, across different spatial scales. To tackle these challenges, we introduce a new task: multi-scale cross-view geo-localization. We also present the ML-Campus dataset, which is specifically designed for this purpose. The ML-Campus dataset comprises multi-view, multi-source building images, each annotated with detailed geographic labels at multiple levels. These annotations reflect the relationships and continuity across various spatial scales. Using this dataset, we perform an empirical evaluation of current cross-view geo-localization methods, establishing a benchmark for their performance in this novel context. To further enhance model performance, we use the proposed CV-HAPPIER method for training, aiming to improve the model feature representation across different spatial scales. Extensive experimental results on the ML-Campus dataset show that CV-HAPPIER significantly improves the spatial robustness of cross-view geo-localization retrieval rankings.
  • Wang Yaqi, Wang Mingwen
    Accepted: 2025-04-07
    To address the conflict between privacy protection and recognition performance in facial recognition technology, this paper proposes a facial privacy-preserving recognition method combining generator and embedding networks. The method first uses a convolutional neural network-based generator model to apply random perturbations to the pixel values of facial images, generating distorted facial images that are imperceptible to the human eye but recognizable by specific deep neural networks, thus forming cancelable facial templates. Then, features are extracted using a pre-trained FaceNet embedding network model for recognition. During the training process, to ensure the recognizability of the facial templates, a residual structure is employed in the generator network model to effectively extract key features from the original image, enhancing the image’s expressiveness and reducing information loss to some extent. To increase the difference between the original image and the generated image and improve the diversity of the generated images, a generative hybrid loss function and a diversity loss function are introduced. To improve recognition accuracy, an improved triplet loss function is used to optimize the model. Experimental results show that this method not only enhances the privacy security of facial templates but also strengthens the diversity between generated images, improving the model's robustness through the diversity loss function. Experiments on the Aberdeen, GT, and LFW datasets demonstrate that the improved triplet loss function achieves more representative feature representations in the cosine embedding space, with recognition accuracies reaching 99.87%, 99.29%, and 98.59%, respectively.
  • Jia Shuting, Wen Xin, Hao Yanrong, Cao Rui
    Accepted: 2025-04-07
    Steady-state visual evoked potential (SSVEP)-based brain-computer interfaces (BCIs) face classification performance bottlenecks due to individual variability and non-target stimulus interference. However, existing methodologies have not yet thoroughly investigated the quantitative correlation between visual distraction interference and inter-individual variability. To address this, we propose a multi-domain collaborative decoding algorithm with adaptive visual distraction compensation, comprising two core components: an adaptive label smoothing technique for visual distraction mitigation and a multi-domain joint decoding model. An adaptive quantification model based on visual crowding theory links signal amplitude to label noise. By dynamically adjusting label smoothing intensity, it quantifies individual distraction levels while reducing overfitting, non-target interference, and inter-subject variability. A multi-domain joint decoding model is proposed, establishing deep temporal-frequency-spatial synergy via hierarchical feature extraction and employing Bi-LSTM for global temporal dependencies. This hybrid architecture yields composite features with local receptive fields and long-range contextual awareness. Extensive experiments on three public SSVEP datasets under two time windows (0.5s and 1.0s) demonstrate the algorithm’s superiority. Results show consistent improvements in average classification accuracy and information transfer rate (ITR) across all settings. Ablation studies confirm the efficacy of the adaptive visual distraction compensation mechanism, particularly under short time windows (0.5s), where accuracy improves by up to 18 percentage points. This work provides a novel methodological framework for individualized adaptation and spatio-spectral-temporal feature fusion in neural decoding.
  • FU Wei, JI Qing-ran, CHEN Lu-cheng, CHU Dian-hui, TU Zhi-ying, QIN Cheng-gang, DONG Li-yang
    Accepted: 2025-04-07
    Aiming at the problem of resource waste caused by the existence of common operations between different processes in a multi-process layout workshop, a multi-objective optimization model for collaborative scheduling of multi-workshop job tasks is established with the goals of minimizing the makespan, minimizing the total processing cost, and minimizing the total processing energy consumption, so as to improve the utilization rate of workshop resources and achieve cost reduction and efficiency improvement. A new genetic fusion algorithm TSNSGA-Ⅱ, which combines tabu search and fast non-dominated sorting, is proposed. The chromosomes after the crossover of the genetic algorithm are used to generate new individuals using the tabu search mutation strategy to enhance the exploration ability of the search space. Finally, the hierarchical analysis method is used to weigh the three objectives from the perspective of the factory to select the optimal scheduling solution. The experimental part first verifies the effectiveness of the TSNSGA-Ⅱ algorithm on a simulated data set, and then compares the performance of the TSNSGA-Ⅱ algorithm with the MOGWO and ENSGA-Ⅱ metaheuristic algorithms on standard data sets of different sizes, and performs ablation comparison with a single NSGA-Ⅱ and a single TS module. The results show that when the total processing cost is the highest priority, the algorithm obtains the lowest total processing cost on 90% of the mk data sets in Brandimarte, and the solution time is shorter than that of the ENSGA-Ⅱ algorithm, which is 1.6% higher than that of the NSGA-II algorithm before improvement. When the makespan is the highest priority, the algorithm obtains the minimum makespan on 80% of the data sets, which is 2.2% higher than that of the NSGA-Ⅱ algorithm before improvement.
  • SHEN Sitong, WANG Yaowu, XIE Zaipeng, TANG Bin
    Accepted: 2025-04-01
    Multi-agent reinforcement learning (MARL) plays a crucial role in solving complex cooperative tasks. However, traditional methods face significant limitations in dynamic environments and information non-stationarity. To address these challenges, this paper proposes a role-based multi-agent reinforcement learning framework (RoMAC). The framework employs role division based on action attributes and uses a role assignment network to dynamically allocate roles to agents, thereby enhancing the efficiency of multi-agent collaboration. Additionally, the framework adopts a hierarchical communication design, including inter-role communication based on attention mechanisms and inter-agent communication guided by mutual information. In inter-role communication, RoMAC leverages attention mechanisms to generate efficient communication messages for coordination between role delegates. In inter-agent communication, mutual information is used to produce targeted information, improving decision-making quality within role groups. Experiments conducted in the StarCraft Multi-Agent Challenge (SMAC) environment show that RoMAC achieves an average win rate improvement of approximately 8.62%, a reduction in convergence time by 0.92 million steps, and a 28.18% average decrease in communication load. Ablation studies further validate the critical contributions of each module in enhancing performance, showcasing the robustness and flexibility of the model. Overall, experimental results indicate that RoMAC offers significant advantages in multi-agent reinforcement learning and cooperative tasks, providing reliable support for efficiently addressing complex challenges.
  • LI Weigan, GUI Ning, QIAN Yurong, CHEN Jiaying
    Accepted: 2025-04-01
    Multimodal Relation Extraction (MRE) method assists relation extraction tasks by utilizing multimodal information. To achieve good relation extraction performance, existing MRE models usually need to be effectively trained on a large amount of labeled data. However, they perform poorly in the case of few-shot samples. To address this issue, this paper fully utilizes the rich semantic and prior knowledge in relation labels and proposes a new prompt-tunning hierarchical network. Firstly, design and implement a text prompt module based on knowledge injection. Specifically, by utilizing the entity information hidden in relation labels and introducing virtual entity type words to construct prompt templates, the model can perceive the potential range of entity types in the sample, and continuously optimize the introduced virtual relation answer words using context to express the best semantic information, thereby improving the performance of the model in the case of few-shot samples. Secondly, by utilizing the mutual constraint relation between entity pairs and relations, design and implement an entity-relations collaborative optimization module to further improve the effect of relation extraction. Finally, in each self-attention layer of the text encoder, a visual prefix based attention mechanism is introduced to deeply integrate the layered multi-scale visual features with the text information, thereby generating more effective and robust text representations, significantly reducing the model's sensitivity to errors. The experimental results on the multimodal neural relation extraction dataset (MNRE) show that the precision, recall, and F1 score of the model reach 84.97%, 83.91%, and 84.43%, respectively, which are all improved compared to the benchmark model. Especially in the case of few-shot samples, the proposed model in this paper is significantly better than the benchmark model, demonstrating good relation extraction performance.
  • Xiao Bin, Xie Shan , Wang Min, Liu Deqi , Yao Ruiying , Li Yuru
    Accepted: 2025-04-01
    Electric load forecasting is a critical component of power grid optimization and scheduling. However, existing purely data-driven methods and strategies that incorporate domain knowledge often struggle to accurately capture long-term trends and periodic patterns in the context of complex dynamic environments and non-stationary load characteristics, thereby affecting forecasting accuracy and robustness. To address these challenges, this paper proposes a power load forecasting model based on the integration of domain prior knowledge, named DPK-ELF. The model utilizes a prior knowledge extraction module to deeply analyze the dynamic behavior characteristics of time series data, constructing domain-specific prior knowledge tailored to the data. It employs a dynamic segmented stacked moving average smoothing method to extract prior trends in power loads. Subsequently, the prior trend decomposition module decomposes the power load series into prior smoothed trends and residual local stochastic fluctuations, which are then predicted using the PatchTST data-driven model. Additionally, during model training, a soft constraint optimization technique is applied, treating domain prior knowledge as boundary constraints within the loss function to enhance model robustness. Validation on four public power load datasets demonstrates that DPK-ELF significantly outperforms advanced baseline models such as PatchTST, DLinear, Autoformer, and Informer in three key performance metrics: MSE, MAE, and RSE. Specifically, compared to the PatchTST model, DPK-ELF achieves improvements of up to 28.31% in MSE, 19.57% in MAE, and 14.94% in RSE on the Australian electricity price and load dataset; and 12.25% in MSE, 7.77% in MAE, and 6.29% in RSE on the PDB power demand dataset. These results strongly demonstrate the significant advantages of the DPK-ELF model in improving forecasting accuracy.
  • Li Yan, Sun Jiuxing, Chen Xin, Xue Yiming, Liu Jian
    Accepted: 2025-03-27
    The development of quantum computing has threatened the security of current public key cryptographic systems. To prevent "harvest now, decrypt latter" attacks, migration to post-quantum cryptographic systems is imminent. The Aigis-sig digital signature scheme and Aigis-enc key encapsulation scheme based on ideal lattices construction, which won the first prize in China's cryptographic algorithm design competition, are resistant to quantum attacks. In order to apply Aigis-sig/enc on limited hardware resources efficiently, this paper integrates the code of the two systems to improve resource utilization, and designs two sets of butterfly operation in the hardware module, and significantly improves the computational efficiency of the fast Number Theoretic Transform (NTT) through pipelined operations. Based on these, the hardware and software cooperative implementation of Aigis-sig and Aigis-enc schemes is proposed. The experimental results show that compared with pure software implementation, the design scheme in this paper has considerable performance improvement. Among them, the ROM space usage is reduced by 65%, the average digital signature/verification run time decreases by 29% and 11%; average key encapsulation/decapsulation run time is reduced by 13% and 21%, respectively. This research is of considerable referential importance to the practical application of post-quantum cryptography.
  • LI Jinru, PAN Qingxian, GAO Zhaolong, WEI Kai, FAN Zequn
    Accepted: 2025-03-27
    为了解决众包任务推荐存在的数据稀疏问题和提高众包任务推荐的准确性,本文提出了一种基于协同知识图谱与混合神经网络的众包任务推荐方法。该方法首先利用任务实体对齐融合工人-任务二分图与众包任务知识图谱形成工人-任务协同知识图谱,来缓解数据稀疏性的问题;其次,采用双向门控循环单元编码工人和任务之间的多条路径,在考虑路径之间关联性的情况下利用注意力机制将编码多条路径得到的信息加权聚合,以更准确地学习工人的偏好,从而更加准确地的推荐众包任务;同时,采用图卷积网络捕捉众包任务间的高相关性来充分考虑实体复杂的语义信息;最后,根据得到的工人和任务的嵌入表示向工人综合推荐。在MovieLens-1M、Yelp、Book-Crossing、Music、Zhu-Bajie和CHI六个公开数据集的实验结果显示,与基准模型相比,本文所提方法在AUC指标方面平均提升了5.8%、7.85%、5.75%、6.3%、5.47%和4.58%,在其他指标方面本文模型也均有提高。实验结果证明了本文方法的有效性与稳定性,可以为众包任务推荐领域提供一个研究思路。
  • Zheng Desheng, Zheng Shuntian, Li Xiaoyu, Yin Hao, Wang Cong
    Accepted: 2025-03-25
    Deep Neural Networks (DNNs) are vulnerable to adversarial examples, Simply adding a small perturbation to a clean image can cause the classifier to produce a misclassification. Decision-based attacks are a class of black-box attacks that rely only on the target model to predict hard labeled outputs. It considers the target model as a black box, and the attack simply queries the results of the target model without needing access to the internal structure or parameter information of the model. This feature poses a serious threat to real-world applications. Current decision-based attack methods usually utilize gradient estimation to launch attacks near the decision boundary of the target model, but it requires high query cost and generates poor quality of adversarial examples with more serious distortion. In this paper, we find that the low-frequency information in the frequency space of an image can effectively capture important features of the image. Performing decision attack in low-frequency space not only helps to reduce the number of queries, but also generates high-quality adversarial samples. To this end, this research proposes a black-box attack method based on the geometric properties of circles, called CBA. The method utilizes the discrete cosine transform to obtain the adversarial examples in the frequency space by using the geometric properties of circles near the decision boundary in a continuous iteration. Finally, the inverse discrete cosine transform transforms them back into the input space. It avoids gradient estimation and significantly reduces the number of queries while guaranteeing the success rate of the attack. Experimental results on the ImageNet dataset show that the attack success rate of CBA for generating adversarial examples is higher than that of the latest black-box attack methods that utilize the geometric nature of the decision boundaries for query volumes of 500,1000,2000, respectively. And also, CBA has a higher attack success rate under different constraints for the same query volume. The above results show that CBA reduces the amount of queries required to generate adversarial examples and generates adversarial examples with less distortion and better image quality. In addition to this, the effectiveness of CBA was tested in a real-world model.
  • LI Zhongwei, LI Keyi, LIU Xin
    Accepted: 2025-03-25
    In recommender systems, sequential recommendation aims to predict a user’s future interests based on their historical interaction sequences. However, existing deep learning models. However, existing deep learning models typically focus on capturing users' long-term behavior patterns, while neglecting fine-grained modeling of temporal information, which limits the improvement of recommendation performance. To address this issue, we propose a method that combines global and local self-attention mechanisms. This method processes user interaction sequences in chunks and introduces a weight decay mechanism, assigning differentiated weights to different sequence blocks based on temporal distance, thereby more accurately capturing changes in users' short-term and long-term interests. However, this approach introduces more parameters, increasing the model's complexity. While Stochastic Shared Embedding (SSE) technology can reduce the parameter count and mitigate overfitting, its random embedding approach may introduce noisy data, affecting the accuracy of recommendations. To solve this issue, we propose a strategy that combines Generative Adversarial Network (GAN) with SSE. By using GAN to generate high-quality interaction data that aligns with user interest distributions, and combining this with the random replacement mechanism in SSE for data augmentation, we enhance the data by randomly selecting generated data, thus reducing the risk of noise introduction, while retaining SSE's advantage in reducing overfitting. Experiments are conducted on three public datasets: Movielens-1M, Amazon Beauty, and Yahoo Music. The results show that the proposed method performs excellently in terms of Normalized Discounted Cumulative Gain (NDCG), Hit Rate (HR), and Mean Reciprocal Rank (MRR).
  • ZHAO Tao, DONG Lihong, QING Yi
    Accepted: 2025-03-25
    Under the background of intelligent construction of coal mines, real-time monitoring of underground personnel in coal mines is of great significance for ensuring mine safety. However, the common detection models currently deployed underground are difficult to meet the needs of real-time monitoring due to their large parameters. At the same time, due to the complex underground environment of coal mines, personnel detection is prone to missed detections and false detections. Therefore, an underground personnel detection algorithm that integrates attention and lightweight networks is proposed. Firstly, to solve the problem that the model parameters are too large and difficult to deploy, the C2f module of the model backbone network is replaced by the lightweight module C2f_RepGhost. Secondly, in order to improve the detection accuracy of the model, the EMA attention mechanism is added to the backbone network. Then, in order to enhance the model's small target personnel detection ability, the DyHead dynamic detection head is introduced. Finally, the original loss function is replaced by Inner-CIoU to optimize the target positioning accuracy. Comparative experiments are carried out on the PASCAL VOC 2012 dataset and the self-built coal mine underground dataset. The results show that on the VOC 2012 dataset, the improved model has an accuracy increase of 1.3% and a recall increase of 1.2% compared with the original model. On the self-built data set, the improved model parameters were reduced by 29.6%, and the accuracy and recall rates were increased by 2.4% and 3.5% compared with the original model, reaching 95.3% and 90% respectively. The improved model not only reduced parameters but also improved the model's missed detection and false detection situations, and can meet the actual requirements of underground personnel detection in coal mines.
  • LI Zheng, LI Zhixiao, QIN Jinlei, GUO Changzhen
    Accepted: 2025-03-21
    To address the challenges of high noise levels, strong volatility, and difficulties in extracting periodic information in the loads of integrated energy systems, a multivariate load forecasting method based on pattern cross-correlation and temporal patch association mechanisms is proposed. The method analyzes the cross-lag relationships between external influencing factors and multivariate loads using the cross-correlation function, determining the most relevant time lags for data reconstruction and embedding. Building on this, a pattern cross-correlation mechanism is introduced to abstract the data into patterns based on their variation trends, mitigating the effects of fluctuations and noise. This is followed by the identification and extraction of key moments and periodic information based on cross-correlation theory. A temporal patch association mechanism is designed to divide the sequence into multiple subsequences, using mutual information methods to analyze and filter subsequences, thereby enhancing the model's ability to capture local continuity information in sequences. Multiple ablation and comparison experiments were conducted on the Comprehensive Energy System dataset from Arizona State University, Tempe campus. The ablation experiment results show that the data reconstructed through cross-lag analysis effectively improved the model's prediction accuracy. The pattern cross-correlation mechanism and temporal patch association mechanism enhanced the model's ability to identify key moments and capture local information, respectively. The comparison experiment results indicate that the proposed method outperforms five mainstream prediction models in multiple evaluation metrics, demonstrating higher prediction accuracy.
  • Jia Junjie, Li Tianle, Liu Shilong
    Accepted: 2025-03-21
    With the rapid development of internet platforms, recommendation systems face the challenge of reduced recommendation accuracy due to shilling attacks, while still providing personalized services. Existing shilling attack detection algorithms mostly focus on a single or a few evaluation metrics from the perspective of user rating differences, and seldom consider the preference correlation of the items selected by users. This leads to insufficient modeling of user behavior characteristics, resulting in high misdetection rates or limited applicability to different attack patterns. To address this issue, a multi-perspective feature fusion shilling attack detection algorithm is proposed. Based on the latent features and distribution characteristics of user ratings obtained through a variational autoencoder, the algorithm learns the spatiotemporal distribution characteristics of user profiles from the perspectives of short- and long-term dependencies of ratings and probability density distributions. By combining the historical preference correlation of users, the algorithm uses a neural network model for multi-perspective feature fusion to form a comprehensive user profile representation with enhanced detection capability, thereby improving shilling attack detection accuracy. Experimental results show that the proposed algorithm significantly improves the detection accuracy of fake users, achieving over 95% accuracy in most cases. The algorithm also demonstrates good detection performance under different filling rates and attack scales, as well as strong robustness.
  • JIN Jing, HU Chudi, CHEN Gang
    Accepted: 2025-03-20
    In recent years, Transformer models have been widely applied in the field of medical image segmentation due to their outstanding global information capture capabilities and strong representation power, achieving significant success. However, these methods divide images into fixed-size patches during serialization, extracting only single-scale global features. This process somewhat fragments the semantic features of the images, leading to poor segmentation accuracy. To address this issue, this paper proposes a Multi-Scale Self-Attention Transformer architecture (MultiFormer). The architecture first processes images using sequential convolution and downsampling modules. Then, it replaces the original 1x1 projection module with a multi-scale convolutional projection module. Finally, deformable convolution is introduced into the feature maps generated by the self-attention module. Compared to the traditional Transformer image serialization process, this sequential convolution effectively enlarges the receptive field while producing features of the same resolution, preserving the spatial correlation of 2D images and avoiding semantic information loss caused by fixed-position and fixed-size patches. Additionally, the multi-scale convolutional projection module captures contextual information in images using four different sizes of convolutional kernels and fuses multi-scale features through channel concatenation, reflecting local-to-local interactions at varying scales rather than being limited to a single scale, enhancing the model's ability to aggregate semantic information of different scales and further alleviating the problem of semantic fragmentation. Moreover, deformable convolution introduces an additional convolutional layer to learn and generate an offset field, allowing the convolutional kernel to flexibly adjust its shape to adapt to morphologically diverse lesions or organs in the images, thereby improving the model's ability to handle complex medical images. This module is inserted into the SETR, TransUnet, and TransFuse network architectures and tested on the ACDC cardiac dataset and the ISIC2018 skin lesion dataset. The results show that the Dice coefficient increases by 3.63%, 1.06%, 2.30%, and 1.22%, 2.31%, 3.01%, respectively. MultiFormer is plug-and-play, enabling easy integration into various downstream medical image analysis tasks
  • Rao Dongning, Ma Zhuowen
    Accepted: 2025-03-20
    Event extraction is a crucial task in the field of information extraction, aimed at identifying and extracting specific event or factual information from natural language text. A key challenge in event extraction is event overlap, where a single word may act as a trigger for different event types or play varying roles across events. Existing methods for overlapping event extraction often fail to adequately consider the relationships between argument roles, which somewhat affects the performance of event extraction. To address the nested event challenge in the event extraction domain, this study proposes a cascading type event extraction model based on an argument association graph, called AAGCTEE (Argument Association Graph-based Cascading Type Event Extraction Model). AAGCTEE utilizes XLM-RoBERTa-base for deep encoding and enhances the representation between word pairs through conditional layer normalization and dilated convolution techniques. In the trigger recognition module, AAGCTEE's Cascade Type Predictor (CTP) accurately identifies triggers related to multiple event types, effectively solving the common issue of nested triggers in traditional event extraction. For the argument recognition and classification module, AAGCTEE employs an Argument Association Graph (AAG) and a Global Normalization Decoding (GND) strategy to efficiently handle complex nested argument structures. The average F1 scores for trigger identification, trigger classification, argument identification, and argument classification of AAGCTEE outperform those of comparative models by 9.67%, 9.22%, 18.95%, and 37.31% on Chinese datasets DUEE and FewFC, and English datasets PHEE and CASIE, respectively. Compared with ablation experiments lacking the cascade type predictor, argument association graph, and other components, AAGCTEE demonstrates an average increase of 6.92%, 6.68%, and 6.45% in the average F1 score across the four evaluation metrics, verifying its effectiveness in extracting complex events.
  • PENG-LI Xiangsong, ZHANG Zhuhong
    Accepted: 2025-03-18
    Aspect-based sentiment analysis aims to analyze the sentiment polarity of specific aspect terms in a sentence. The aspect-based sentiment analysis models, related the existing dependency trees and graph neural networks, easily encounters the low accuracy of sentiment polarity detection of specific aspect terms in a sentence, since not only the syntactic structures and deep semantic features cannot be well extracted, but also such models don’t include effective feature fusion mechanisms. Hereby, the current work develops a new-type and DeBERTa-related aspect-based sentiment analysis model named DeBERTa-ABSA. First, DeBERTa and an aspect-aware attention mechanism are exploited to generate word embedding vectors of the text and extract aspect term features, respectively. Second, the abstract meaning representation (AMR) is chosen to capture the syntactic structure of the text, to ensure that the subsequent sentiment analysis is not influenced by the current feature extraction incompleteness. Third, a new-type triangular multiplication mechanism is introduced to merge syntactic structures and deep semantic features. Finally, a triangular self-attention mechanism and a fully connected network map the sentiment polarity features of the aspect terms to the sentiment classification layer, to effectively avoid the interference of irrelevant noise and promote the accuracy of sentiment polarity detection.
  • FU Yirui, CHEN Haiyan, ZHOU Zhihui, YUAN Ligang
    Accepted: 2025-03-18
    Airspace traffic complexity is an important factor affecting the efficiency and safety of civil aviation operation. In order to further improve the accuracy of airspace traffic complexity assessment, this paper proposes a complexity evaluation method based on multi-scale airspace traffic spatio-temporal images and deep metric learning. Specifically, traffic flow data are mapped to a grid-based target airspace images in the form of pixels. Spatiotemporal interpolation is performed to capture the dynamic changes in traffic flow over both time and space, resulting in the generation of 20 sets of airspace traffic spatiotemporal images at different scales. Then, a airspace traffic complexity assessment model based on deep metric learning is proposed, which takes the multi-scaled airspace traffic image sets as input. The model uses a ranking proxy anchor loss function to optimize the distribution of the sample distances in the high-dimensional embedding space, so that the distances between same-class samples closer and distances between different-class samples more dispersed. Finally, distances between same-class samples closer and distances between different-class samples more dispersed. Experiments are conducted using real traffic data from the South-Central airspace to generate the multi-scale spatiotemporal image set, followed by a series of comparative experiments. The experimental results show that the spatio-temporal scale of the airspace traffic image sets have an important impact on the assessment results; Compared with existing assessment methods, the method proposed in this paper can significantly improve the assessment performance of airspace traffic complexity.
  • TAN Taizhe, GONG Zhiyuan, Yan Zhou
    Accepted: 2025-03-14
    Reconstructing high dynamic range (HDR) images from multiple low dynamic range (LDR) images with different exposures is a challenging task, especially when camera and object motion are present. In such cases, motion regions often introduce artifacts that degrade the quality of the final reconstructed image. The root cause of this issue lies in the poor alignment of content across the multiple LDR images, where geometric discrepancies between the images significantly affect the reconstruction results. To address this problem, this paper proposes a feature pre-alignment-based HDR image reconstruction network, designed to improve HDR reconstruction through the pre-alignment of features. The network consists of two main components: the feature pre-alignment module and the HDR reconstruction module. In the feature pre-alignment module, a feature alignment network is introduced, which guides the alignment of the input image features with those of a reference image, thereby reducing motion-induced artifacts. The reconstruction module models the global context of the pre-aligned features using a selective state-space model and generates the final HDR image via a simplified HDR recovery network. To evaluate the performance of the proposed network, extensive experiments were conducted on the Kalantari dataset. The results show that the network outperforms existing methods across multiple objective metrics and demonstrates superior subjective visual quality. Furthermore, to validate the generalization capability of the network, the proposed model was trained on the Kalantari training set and subsequently tested on the Sen dataset. The results indicate that the proposed network exhibits a certain degree of generalization ability.