Author Login Chief Editor Login Reviewer Login Editor Login Remote Office
Home Browse Just accepted

Just accepted

Please wait a minute...
  • Select all
    |
  • Guo Wei, Meng Qiaoqiao, Jin Haibo, Tian Congcong
    Accepted: 2025-11-20
    In the field of industrial quality inspection, there are common problems in the detection of steel surface defects, such as insufficient fusion of target features, missed detection of fine edge defects, and unbalanced sample classification. Therefore, a steel surface defect detection algorithm based on multi-scale interaction and dynamic collaboration is proposed. In the backbone network, by fusing the shifted sparse convolution and inverted residual structure, the interactive fusion of defect features under different receptive fields is strengthened, and the feature expression ability of multi-scale defects is improved. Introduce the large separation kernel attention mechanism to dynamically enhance the feature response to fine defect areas and reduce the missed detection rate of cracks and inclusions. In the neck network, by combining the DySample dynamic upsampling strategy, dynamic upsampling based on defect content is achieved, which not only improves the clarity of the defect contour of small targets but also reduces computational redundancy, adapting to the deployment of edge devices. In addition, an EMASlideLoss loss function integrating exponential moving average and sliding threshold mechanisms is designed to dynamically balance the learning weights of difficult and easy samples, thereby improving the detection deviation caused by the uneven distribution of defect samples. Experiments on the NEU-DET dataset show that the mean mAP50% of the average accuracy of this algorithm reaches 84.4%, which is 5.8% higher than that of the original YOLO11n. While the precision and recall rates increase by 5.2% and 4.8% respectively, the computational load decreases by 8%. This algorithm not only optimizes the computational efficiency but also improves the detection accuracy, and is more capable of meeting the detection requirements in industrial scenarios.
  • LIU Ying, ZHANG Runyu , YANG Chaoshu
    Accepted: 2025-11-20
    The Log-Structured Merge tree (LSM-tree) has been widely adopted in key-value storage systems due to its high write performance enabled by sequential write operations. However, it also suffers from issues such as high read/write amplification, significant compaction overhead, and data redundancy. Traditional optimization approaches aim to improve system performance by modifying tree structures, refining compaction strategies, and adopting key-value separation mechanisms. In the era of big data, the rapid growth of data volume leads to increasingly frequent write and compaction operations in LSM-tree systems, placing continuous pressure on CPU computing resources and gradually turning them into performance bottlenecks. Moreover, traditional solutions fail to fundamentally avoid the substantial I/O traffic between the host and storage devices, resulting in high overhead due to redundant data movement. Computational storage technology offers a promising solution to these challenges. By integrating computing resources at the storage layer, it enables task offloading to alleviate the CPU's workload and supports near-data processing to reduce the performance overhead caused by data migration. This survey focuses on optimization strategies for LSM-tree based on computational storage. First, the architecture of computational storage is reviewed. Then, in response to the major bottlenecks under the big data context, existing solutions are classified and compared from two perspectives: compaction optimization and data migration optimization. Finally, potential future research directions are suggested to provide insights in this field.
  • Gong tong , Lu Xiaoli, Sang yu, Li Siman, Yu Bowen
    Accepted: 2025-11-19
    Nighttime object detection presents significant challenges due to the low luminance of targets and the high cost of manually annotating large-scale nighttime datasets, making supervised training difficult. To address these issues, a domain adaptation method DTN-DETR for object detection tailored to nighttime imagery based on improved RT-DETR is proposed. First, a Photometric Consistency Matching is designed to generate a synthetic dataset resembling the nighttime domain by aligning the photometric properties of the daytime source domain with the nighttime target domain. Second, a backbone network improved Bidomain Refinement Module (BRM) is proposed, which comprises two key components: the Feature Refinement Module (FRM) and the Bidomain Information Interaction (BII) module. The FRM eliminates redundant information in the feature channels. The BII module leverages the interaction between the frequency and spatial domains to handle glare and noise with inconsistent frequency characteristics, addressing the coupling phenomena of multiple local light sources in nighttime scenes. Finally, a P2 detection head is introduced, which enhances the perception of small objects in nighttime scenes through multi-level feature fusion. Experimental results on the public datasets BDD100K, SODA10M and Foggy Cityscapes demonstrate that the proposed method significantly outperforms existing state-of-the-art approaches in object detection tasks, validating its effectiveness and robustness.
  • TAN Taizhe, YANG Yang, ZHAN Yinwei, YANG Zhuo
    Accepted: 2025-11-14
    The complex lighting environment underground in coal mines leads to low contrast and blurry details in images. Existing image enhancement algorithms have insufficient feature capture capabilities and inefficient fusion methods for semantic features at different levels. This paper proposes an underground coal mine image enhancement method (ICM) that combines convolution and MLLA (Mamba Like Linear Attention). In the convolution stage, multiple mixed expert modules with degradation perception are stacked to enable the model to adaptively restore local texture details lost during image enhancement, solving the problems of artifacts and unclear detail features. Using an MLLA module with background perception capability to model long-term dependencies in images to improve the global structural consistency and texture fidelity of output enhanced images. Introducing interactive fusion branches to encode the stage correlation between backbone features and reconstructed features, effectively utilizing local and global features to assist in image enhancement. The segmented loss function sets different loss objectives at different enhancement stages, enabling the network to adaptively optimize at each stage. Compared with recently excellent deep learning methods, the ICM method shows the best performance in evaluation metrics PSNR, SSIM, NIQE, and LPIPS, with values of 30.524dB, 0.946, 3.06, and 0.23, respectively. It can effectively improve the brightness, contrast, and clarity of low light images in coal mines, providing reliable visual support for mine safety monitoring and intelligent decision-making.
  • Jie DUAN, Lijuan SONG, Zirui MA
    Accepted: 2025-11-13
    Deep learning–based survival prediction has advanced the integration of whole-slide images (WSI) and genomics, yet the ultra–high resolution of WSIs and the high dimensionality of transcriptomics pose substantial challenges for feature extraction and cross-modal fusion. Although prototype aggregation reduces computational burden by compressing tiles and gene expressions into morphological and pathway prototypes, two key bottlenecks remain: accurately capturing fine-grained interactions between modality-specific prototypes, and addressing the pronounced representational heterogeneity between WSI morphological prototypes and genomic pathway prototypes. To tackle these issues, we propose a weakly supervised survival prediction model based on multi-level optimal transport (MOTSurv), comprising three synergistic innovations: first, a dual-modality prototype encoder—integrating a Pyramid Position Encoding Generator (PPEG) in the pathology encoder and modeling intra-pathway dependencies in the pathways encoder—to strengthen intra-modality structure while preserving modality specificity; second, a cascaded multi-level optimal transport fusion mechanism that performs coarse global alignment followed by refined matching with error correction, balancing alignment accuracy and information preservation; and third, an Orthogonal Disentanglement Module (ODM) that enforces multi-level constraints—inter-modal specificity orthogonality, intra-modal specificity–shared orthogonality, and global specificity–shared orthogonality—to achieve explicit feature disentanglement and enhance interpretability. Experiments on the TCGA BLCA, BRCA, and LUAD datasets demonstrate that MOTSurv improves C-index by an average of 4.22% over state-of-the-art methods. Ablation studies further validate the independent and synergistic contributions of each module, highlighting the model’s comprehensive advantages in multimodal alignment, structured representation, and biological interpretability.
  • WANG Zeyu , JI Genlin, ZHU Wei
    Accepted: 2025-11-13
    Zero-shot skeleton-based action recognition uses text label descriptions and skeleton action sequences to distinguish visible and unseen categories of actions. Existing methods are usually limited by the problem of low generation quality in visual feature, so we cannot accurately align semantic, resulting in poor performance in identifying similar actions. To address this issue, this paper proposes a method based on dual discriminators and spatiotemporal self-calibration (DD-STSC) to explore visual semantic alignment. This method combines variational autoencoders and generative adversarial networks, using discriminators and generators for adversarial training to mine the differential information among different features. At the same time, it better separates useful information from useless information during disentanglement, thereby further improving the quality of generated samples. In addition, this paper introduces action self- calibration module(ASCM). By learning the skeleton information in the spatiotemporal direction, the required key motion information can be obtained more effectively, so as to improve the accuracy of classification tasks. Experiments on several widely available datasets NTU60, NTU120, and pku51 demonstrate that the proposed method outperforms the existing mainstream methods.
  • XU Haizhe, HUANG Lingxiao, YAO Xinbo, GAO Yongzhan, ZHOU Kaiyuan
    Accepted: 2025-11-13
    The study addresses the critical challenges in weakly supervised semantic segmentation (WSSS) based on contrastive language-image pre-training (CLIP), such as inadequate fine-grained semantic alignment of images, limited perception of local details in text context, and insufficient local detail perception along with noise propagation in pseudo-label images. To tackle these issues, we propose the Feature Fusion Contrastive Learning framework (FFCLIP), a novel architecture that leverages a frozen CLIP model as the backbone and integrates three innovative modules—Panoramic Perception Attention (PPA), Rectangular Calibration Module (RCM), and Weighted Cross-modal Fusion (WFF)—to effectively enhance cross-modal semantic alignment, refine local boundary perception, and improve the quality of generated pseudo-labels. The multi-stage weakly supervised semantic segmentation training framework based on the CLIP backbone network achieved mIoU scores of 76.9% and 77.5% on the VOC2012 validation and test sets, respectively, surpassing the mainstream method CTI by 2.8% and 4.3%. On the COCO2014 dataset, it attains an mIoU of 47.1%, significantly outperforming baseline models like CPAL. Experimental results demonstrate that FFCLIP substantially enhances semantic segmentation accuracy under weak supervision while maintaining low computational overhead, with only 6M additional parameters and a peak GPU memory consumption of 6.2GB, thereby offering a novel direction for integrating multi-modal learning with weakly supervised segmentation. Code link: https://github.com/xuwudang/FFCLIP
  • SU Na, PEI Houqing, XU Li , WANG Jingjun , JI Shujuan
    Accepted: 2025-11-11
    Existing log anomaly detection techniques often neglect temporal contextual information in semantic modeling, exhibit insufficient modality fusion capabilities, and generally over-rely on log parsing. These limitations make it difficult for models to capture complex patterns where sudden semantic content changes coexist with temporal behavioral anomalies. To address these challenges, this paper proposes a model that operates without log parsing (Log Spatio-Temporal Fusion, LogSTF). This model employs a dual-branch architecture for semantic and temporal processing. The semantic branch extracts context-aware semantic features, while the temporal branch models both local bursts and global evolution through dual-granularity at temporal and sequence levels. Building upon this foundation, bidirectional cross-attention achieves modal fusion, explicitly establishing fine-grained dependencies between semantics and time. This enhances the model’s ability to represent and discern complex log behaviors. Experiments conducted on three public log datasets—HDFS, BGL, and Thunderbird— results show LogSTF achieves F1 scores of 99.64%, 98.45%, and 99.67% respectively across the three datasets. Compared to the two state-of-the-art models LAnoBERT and LogFormer, LogSTF demonstrates average relative F1 improvements of 5.20% and 2.03%. Ablation experiments validate the critical role of temporal information and modality collaboration in performance enhancement. Robustness testing under lightweight semantic perturbations validated LogSTF’s stability and generalization capabilities under suboptimal log conditions. This approach achieves high-precision detection of complex anomaly patterns without requiring log parsing.
  • Li Xu, Luo Dezhe, Wang Hongjun
    Accepted: 2025-11-10
    With the rapid development of global maritime transportation, ship trajectory prediction plays an important role in shipping safety and management. However, achieving high-precision and physically feasible continuous trajectory prediction remains a key challenge due to the large-scale ship trajectory data and the uncertainty of complex maritime environments. Traditional prediction methods have limitations in handling complex maritime environments and large-scale dynamic data. To address these challenges, this paper proposes a geographically constrained multi-method fusion ship trajectory prediction model. The model introduces a geographical constraint loss function to optimize the accuracy, heading stability, and physical feasibility of trajectory predictions. Additionally, a multi-method fusion network structure is designed, incorporating bidirectional gated recurrent units, attention mechanisms, and multi-scale convolutions, which enhances the ability to extract temporal features and integrate multi-scale information. Experimental results demonstrate that the proposed model achieves lower prediction errors across multiple maritime datasets, with particularly significant advantages in long-term predictions compared to existing models. The study confirms that this model offers high accuracy and stability in ship trajectory prediction, providing effective support for practical applications in the maritime field.
  • YAO Xun, HE Yuan, HU Xinrong, YANG Jie
    Accepted: 2025-11-10
    Sequential recommender systems excel at capturing users' dynamic interests, yet their open nature makes them highly vulnerable to data poisoning attacks. Attackers can effectively manipulate recommendation outcomes by altering the textual descriptions of items, posing a severe challenge to model robustness. Existing defense strategies, which primarily rely on static rules or fixed-intensity perturbations, struggle to counter the growing complexity and variability of semantic-level textual attacks.To address this challenge, we propose RADAR, a two-stage collaborative defense framework. This framework synergizes robustness enhancement at the training stage with real-time protection at the inference stage. First, during training, it employs dynamic adversarial training to bolster the model's intrinsic resilience against unknown textual perturbations. Second, at inference, it leverages a Large Language Model (LLM) for precise semantic-level anomaly detection and content restoration.Experimental results demonstrate the superior defense performance of RADAR. In attack tests on the Scientific dataset, compared to the strongest baseline model(Cert-LLM), RADAR reduces the exposure increase of malicious items from 3.1796% to just 0.9921%. This powerfully validates the framework's effectiveness in enhancing the security and robustness of sequential recommender systems.
  • GUO Yang, SUN Jing-yu
    Accepted: 2025-11-07
    With the development of quantum computing technology, traditional image encryption algorithms are facing the challenge of insufficient quantum attack resistance, while existing quantum image encryption algorithms have problems such as high quantum bit consumption and limited parameter space of chaotic systems. To address the above unsolved problems, this paper proposes a dual-quantum image encryption algorithm based on a chaotic system, aiming to achieve a balance between low resource consumption and high security. Firstly, a dual-bit-plane quantum image representation model (DBRQI) is proposed, which only requires 2n+4 quantum bits to store a grayscale image, reducing quantum bit consumption by 50% compared with the BRQI model. Secondly, a 3D hyperchaotic system (3D-CHCMM) is constructed: the parameter space of its 4 control parameters is increased by 33% compared with existing systems, and its 3 Lyapunov exponents are all positive. Moreover, the system has passed 15 NIST tests, enabling it to generate pseudorandom sequences with high randomness. The algorithm maps quantum states through DBRQI, scrambles pixel information via odd-even bit-plane scrambling and random row-column scrambling, and then performs an XOR operation with the pseudorandom sequences to generate ciphertext. Experimental results show that the horizontal correlation of the encrypted image is as low as 0.0041, the information entropy reaches 7.9993, and the NPCR is 99.6251%, indicating that the algorithm’s attack resistance and anti-interference capability are significantly enhanced. The algorithm in this paper provides an efficient solution for image encryption in current scenarios with limited quantum hardware.
  • Zhang Yao, Zhang Junsan, Ma Junpeng, Yao Zongquan, Liu Tianyi
    Accepted: 2025-11-07
    This paper proposes an improved YOLOv8-based model named CAFR-YOLO to address the issues of insufficient cross-level feature interaction and limited feature representation capability in multi-scale object detection under complex scenes. First, a novel cross-scale feature reorganization pipeline was designed, constructing the Channel Attention-guided Feature Reorganization (CAFR) module. By using a specific layer as the fusion backbone and incorporating scale alignment, attention-weighted fusion, and feature subset splicing strategies, it alleviates insufficient cross-level interaction in traditional feature pyramid structures. Secondly, at the local level, the method introduces the C2f_DCNv3 module into the backbone network, significantly enhancing the model's geometric adaptability by exploiting the dynamic sampling characteristics of deformable convolution. From a global perspective, the C2f_SAConv module is constructed by combining Switchable Atrous Convolution (SAC) with the C2f module, optimizing multi-scale semantic feature fusion through dynamic atrous rate adjustment. These two approaches enhance the model's robustness to complex scenes. Finally, SPDConv replaces traditional convolution structures, strengthening feature representation through spatial-channel reorganization while reducing computational complexity. Experimental results demonstrate that CAFR-YOLO achieves 86.3% mAP@0.5 and 67.2% mAP@0.5:0.95 on the PASCAL VOC dataset with comparable computational costs to the original model. On the MS COCO dataset, it improves mAP@0.5 and mAP@0.5:0.95 by 3.5% and 3.9%, respectively. Compared to existing state-of-the-art methods, CAFR-YOLO exhibits significant advantages across multiple metrics. The proposed CAFR-YOLO model substantially enhances multi-scale object detection accuracy and robustness while maintaining computational efficiency, providing a novel solution for real-time object detection tasks.
  • TIAN Hongpeng, LI Zhiqiang, YANG Sai
    Accepted: 2025-11-05
    In lightweight small UAV image object detection tasks,there are common challenges such as low detection accuracy, complex backgrounds, large variations in target scale, dense target distribution, and a relatively large number of model parameters. Therefore, this paper proposes a novel improved RT-DETR object UAV object detection algorithm. First, an enhanced C2f-Heat-Lsk module is developed through integrating the HeatBlock thermal conduction module and LskBlock spatial selective attention mechanism into the C2f structure. This modified module collaborates with the original C2f module to redesign the RT-DETR backbone network, which improves spatial feature extraction while reducing model parameters Second, a novel feature fusion structure SOFEP replaces the original feature pyramid to mitigate detail loss in small objects and enhance their feature representation. Third, a combined Focaler-MPDIoU loss function is constructed by integrating Focaler-IoU and MPDIoU loss mechanisms, which improves bounding box regression accuracy and reduces miss detection rates. Experimental results on the VisDrone test set show that the improved model reduces parameter count by 16.9% compared to RT-DETR, while achieving improvements of 2.6% in mAP0.5 and 1.9% in mAP0.5:0.9. The model also outperforms RT-DETR on the DOTAv1.0 and HIT-UAV datasets. These advancements demonstrate that the proposed method achieves higher detection accuracy with reduced computational complexity, effectively meeting the requirements for small object detection in UAV aerial images.
  • LiangShichao, WenWen, FengYali, ZhengJiabi, HaoZhifeng
    Accepted: 2025-11-05
    How to model and learn user’s behavior patterns is a crucial issue in temporal recommendation. However, the majority of existing research primarily centers on pattern learning within a single type of behavior. This limitation restricts the ability to take full advantage of the user's diversified behavior patterns revealed by various types of behaviors, such as clicking, purchasing, marking as favorite, and so on. As a result, the potential for enhancing recommendation performance remains underexplored. To address this gap, this research delves into the multi-seasonal sequential dependencies of individual behaviors and the intricate dependencies among different types of behaviors over time. Specifically, we propose a novel model, named multi-seasonal multi-behavior (MSMB) model, for learning temporal patterns across multiple behaviors. In the proposed model, a dual-channel sequence encoder is employed, which incorporates a multi-scale exponential moving average (EMA) mechanism to effectively capture the multi-seasonal temporal dependencies within individual behavior sequences. Additionally, a cross-behavior dependency module is introduced to account for different periodic granularities, thereby enabling the model to effectively capture the time-variant dependencies across various types of behaviors. Extensive experiments conducted on three benchmark datasets demonstrate the effectiveness and superiority of the proposed MSMB model in enhancing temporal recommendation performance.
  • CHEN Haozhi, CAI Ruichu, LI Zijian, HAO Zhifeng
    Accepted: 2025-11-05
    Time series segmentation, an important task in time series analysis, has been widely applied in fields such as biological behavior analysis and physical system analysis. However, most existing time series segmentation methods fail to account for the nonstationary dynamics of time series induced by distribution shifts, thereby limiting their ability to achieve accurate segmentation in nonstationary regimes. To solve this problem, this paper first proposes a data causal generation process hypothesis based on real-world scenarios. Under this hypothesis, the latent variables underlying the observed data can be decomposed into stationary and non-stationary latent variables. Here, the stationary variables represent information that is unchanged or changes periodically, while the nonstationary variables represent dynamically changing information. Secondly, based on this causal generation process hypothesis, a Stationary Nonstationary Disentangle Model (SNDM) is designed. This model disentangles stationary and nonstationary variables, thus enabling enhanced focus on non-stationary dependencies in the time series. Moreover, in order to accurately disentangle and extract variables, the evidence lower bound (ELBO) of variational inference is used to construct the loss function of the model. Leveraging this ELBO, this study introduces stationary and nonstationary prior neural network modules to improve latent variable disentanglement accuracy. Finally, through experiments, we validate that our model outperforms several state-of-the-art time series segmentation methods on various benchmark datasets, thereby highlighting its advantages in practical scenarios.
  • Zhao Weiyue, Wu Jingya, Lu Wenyan, Li Xiaowei and Yan Guihai
    Accepted: 2025-11-05
    Emerging applications in datacenters have introduced a significant amount of large-granularity RDMA communication requirements. RDMA relies on physical addresses, and, when accessing large-granularity data, the Page Table Entries (PTEs) required for address translation exceed the cache capacity of hardware devices. Current high-performance commercial solutions store PTEs in the host memory. However, this architecture requires large-granularity communication to be executed only after fetching the PTEs from the host memory, which introduces PCIe traversal and host memory access latency, severely degrading address translation efficiency and increasing host CPU overhead. To achieve efficient large-granularity RDMA, this paper designs a configurable high-performance address mapping structure: XiRang. XiRang efficiently extends the access granularity through a streaming prefetch mechanism and a hierarchical cache design, and implements flexible and high-throughput address translation performance through a configurable address translation array. The XiRang prototype is implemented based on a DPU. Experiments show that: 1) XiRang effectively offloads the address translation load of the RDMA data plane, decoupling it from the host CPU; 2) The streaming prefetch extension mechanism used by XiRang effectively reduces storage overhead, with cache consumption at only the 10-byte level under concurrent modes, and concurrent storage overhead being negligible; 3) Under a high number of concurrent memory access requests, XiRang maintains a translation table entry query hit rate close to 100%, reducing the idle time of the translation engine by 2 to 3 orders of magnitude compared to the RNIC architecture; 4) The translation throughput of XiRang is more than 60 times that of the RNIC translation architecture and more than 3.5 times that of the basic DPU address mapping structure; 5) In performance enhancement mode, XiRang's address translation speed can support a data transfer bandwidth of 1.4 TB/s.
  • Jia Xinglong, Qin Junping, Yan Kai, Liu Zheng, Wang Dan, Shao Xinran, Shao Zezhou
    Accepted: 2025-11-05
    In order to solve the problem of insufficient accuracy in identifying endangered animals in complex backgrounds in the wild, this study improved the YOLOv8 model. First, the Dynamic Snake Convolution (DSConv) was introduced in the backbone network to enhance the detection performance of the model under occlusion. Secondly, the global attention mechanism (GAM) was introduced in the neck network to improved the model's attention to information related to endangered animals, suppress irrelevant features such as the environment, and reduce redundant information. Then, a small target detection head was designed in the head network to fuse shallow feature maps to improved the network's perception and positioning capabilities for small targets. Finally, the bounding box loss regression function based on the minimum point distance (MPDIoU) was used to replace the traditional CIoU algorithm, thereby improving the convergence speed and positioning accuracy of the algorithm. The experimental results show that the detection accuracy and average precision of the proposed model for endangered animals in complex backgrounds are 96.2% and 97.2%, respectively, which are 2.1 and 2.4 percentage points higher than the basic YOLOv8n detection accuracy and average precision, respectively. Using the same data set to conduct comparative experiments on different target detection models, the average precision is increased by 28.7, 22.5, 3.5, and 2.4 percentage points compared with Faster-RCNN, SSD, YOLOv5, YOLOv7 and other models, respectively. The experiment proves that the improved YOLOv8 model can provide a theoretical basis for the detection of endangered animals in complex backgrounds.
  • WU Shixun, TANG Peiyao, LAN Zhangli, Xu Kai, ZHANG Miao
    Accepted: 2025-11-04
    WiFi fingerprint positioning based on received signal strength indication (RSSI) has gained wide attention due to its ease of deployment and cost-effectiveness. However, existing fingerprinting methods typically rely on large-scale training data, while data augmentation often produces virtual samples of uneven quality, thereby limiting positioning accuracy and generalization. To address these issues, this study proposes a multi-parameter optimization WiFi fingerprinting method driven by few-shot learning (FSL). The method integrates an attention-enhanced convolutional neural network (CNN) with a meta-learning framework to enable rapid adaptation under limited data, while particle swarm optimization (PSO) is employed for automated data selection and joint hyperparameter tuning under physical constraints. Experimental results demonstrate that the proposed method achieves average positioning errors of 0.52 m on the CJU dataset and 6.88 m on the public Tampere dataset, improving accuracy by at least 49.5% and 8.7% compared with baseline methods. In addition, a generalization test on the CJU-2024 dataset shows that the model adapts effectively to new environments with only a small amount of data, achieving an average positioning error of 2.17 m and an accuracy improvement of at least 26.7%. These results confirm that the proposed method significantly improves indoor positioning accuracy while maintaining strong generalization capability.
  • YANG Yingying , CHE Jin , BAI Xuebing, XIAO Long, JIAN Liqiong
    Accepted: 2025-11-04
    Existing unsupervised person Re-ID methods focus only on pedestrians’ global features, causing global feature bias and insufficient data diversity that impair recognition accuracy.To address this, this paper proposes an innovative ViT-based method(DAFP) integrating Multi-level Data Augmentation (MDAM) and Feature Purification (FP). Firstly, the MDAM—including geometric spatial transformations, appearance feature perturbations, and occlusion simulation—expands training sample diversity and enhances the model’s cross-camera robustness. Additionally, the FP module divides the local features output by the Transformer into upper and lower parts according to spatial positions, performs adaptive weighted fusion with global features via a multi-view distance matrix, and generates high-quality pseudo-labels with DBSCAN, effectively alleviating similar pedestrian misclustering caused by over-reliance on single global features in traditional methods. Finally, a global-local clustering contrastive loss dynamically updates global and local clustering centers to strengthen fine-grained feature learning. Experimental results on Market1501, DukeMTMC-reID, and MSMT17 show that its mAP/Rank-1 reaches 90.5%/96.0%, 77.6%/87.6%, and 64.5%/86.0%, respectively, significantly surpassing the current state-of-the-art methods and fully verifying the superior performance of this method.
  • Tang Weilin, Wang Junfeng, Ge Wenhan, Zhang Chengcheng, Zhan Weilu
    Accepted: 2025-11-04
    Cyber Threat Intelligence (CTI) plays a pivotal role in mitigating the asymmetry between cyber attacks and defenses. However, current extraction methods for Tactics, Techniques, and Procedures (TTPs) predominantly rely on supervised language models with manual annotation, which suffer from inefficiency and inconsistency issues. Although the MITRE ATT&CK framework has mitigated TTP description problems through standardized classification, existing NLP-based approaches still face three major challenges: insufficient generalization capabilities, delayed version adaptation, and poor interpretability. To address this, DetecTTive is proposed—a zero-shot learning-based TTP extraction method for large language models that combines the prior knowledge of large language models with external trustworthy knowledge. This framework innovatively utilizes the ATT&CK official knowledge base as an external knowledge source, combining vector-based semantic retrieval and graph-enhanced association reasoning, along with agent workflow to achieve automated white-box reasoning. This enhances zero-shot performance while ensuring result traceability. Experiments demonstrate that the proposed zero-shot approach achieves an F1 score of 80.02% and a recall of 83.46% in benchmark datasets. This method effectively addresses the data bias and version adaptation issues inherent in conventional models, providing an interpretable and cost-efficient solution for TTP extraction in dynamic threat environments.
  • Fan Qinlong, Sun Yepeng, Lu Jicang, Zhu Taojie and Liu Yilin
    Accepted: 2025-11-04
    With the popularization and development of the internet, the massive volume of user-generated comments on trending topics and their widespread dissemination profoundly influence the progression and development of real-world events. Consequently, mining public stances and attitudes toward trending topics holds significant practical value for domains such as online public opinion monitoring and social security governance. Stance detection technology aims to identify user attitudes toward specific targets from user-generated texts. Although numerous studies have proposed diverse task scenarios and technical methodologies, a unified classification framework for stance detection tasks remains elusive. First, this paper presents a comprehensive review of stance detection tasks from two dimensions: task scenarios and technical methodologies, systematically organizing the current research landscape and development trends. From the task scenario perspective, we classify stance detection into three paradigms: target-specific, target transfer, and target generalization, highlighting the field's evolution from domain-specific applications toward broader adaptability. From the methodological perspective, we categorize stance detection approaches into three primary classes: model-based engineering, knowledge-driven engineering, and data-centric engineering, analyzing the strengths and limitations of each. Additionally, we conduct statistical and experimental analyses of publicly available resources across multiple dimensions, revealing key characteristics and developmental trajectories of these benchmark datasets. Finally, the paper concludes with a summary and outlines prospective research directions and persistent challenges.
  • Wu Qiannan, Ding Weiping, Fan Xiaoxue, Ju Hongrong, Zhou Linlin, Wang Jing
    Accepted: 2025-10-31
    Feature selection can effectively identify informative features from complex data to improve information processing efficiency. However, in partially labeled data scenarios, traditional feature selection methods face significant challenges due to inherent label ambiguity, complex inter-sample relationships, and difficulties in feature importance evaluation. To address these challenges, this paper proposes MFG-FS, an effective feature selection framework for partially labeled datasets. First, to tackle label ambiguity, we design an end-to-end disambiguation method based on the MLP-Mixer model and contrastive learning, which optimizes the feature representation space to enhance discriminative power and obtain more reliable label confidence distributions. Second, to accurately characterize complex sample relationships in partially labeled data, we construct fuzzy similarity relations and information granules that integrate multi-source information, effectively combining local feature-space structures, global correlations from disambiguated labels, and label constraints. Subsequently, based on the constructed fuzzy information granules, we define and employ a fuzzy mutual information measure for feature evaluation, which quantifies the relevance between feature subsets and labels while assessing internal redundancy, thereby providing a robust basis for high-quality feature subset selection. Finally, extensive experiments on five synthetic and four real-world datasets demonstrate that MFG-FS can select more discriminative and robust feature subsets, achieving superior performance in partial label disambiguation and classification accuracy.
  • HUANG Yuqi, YANG Xiaoxia, YANG Ronghao , LIAO Fangzhou, YAN Le, GUO Junqiang, LI Minghan
    Accepted: 2025-10-30
    Object detection for autonomous driving perception aims to locate and identify traffic participants such as motor vehicles, non-motor vehicles, and pedestrians within onboard camera views in real time, providing accurate input for the environmental perception module to support decision-making and control in autonomous driving systems. The perception system suffers from false and missed detection rates due to complex road backgrounds, diverse object shapes, and large scale variations. Specific challenges include low accuracy in detecting deformed objects, insufficient multi-scale detection, and weak global perception. To address these issues, an improved algorithm named YOLOv8-DDL based on YOLOv8n is proposed. First, deformable attention is introduced to improve the C2f module in the backbone network, which dynamically learns feature offsets to enhance the capture capability for various object shapes in traffic scenes, improving the model's adaptability to complex spatial distributions and effectively reducing false detections. Second, large separable kernel attention is integrated to enhance the spatial pyramid pooling fast module, expanding the receptive field through large-kernel convolution to strengthen global context modeling and robustness in complex backgrounds. Finally, a dynamic multi-scale adaptive fusion module and a dynamic feature pyramid network are designed to reconstruct the neck network, dynamically fusing high-level and low-level features to enhance multi-scale feature representation and improve multi-scale object detection performance. Experimental results on the public SODA10M dataset show that compared to YOLOv8n, YOLOv8-DDL improves precision, recall, F1-score, and mean average precision by 5.9%, 1.3%, 3%, and 1.5%, respectively. Additional validation on the public BDD100K dataset confirms improvements of 2%, 0.6%, 1%, and 2% in these metrics, respectively.
  • CHEN Junhong, ZHOU Feng, TIAN Youliang, YANG Kedi, ZHANG Qijia
    Accepted: 2025-10-29
    As the demand for data training across industries increases, data has become a key factor of production. Data rights confirmation can clarify data ownership and allocate benefits, preventing unauthorized use. However, the existing schemes have problem such as uncontrollable rights and low efficiency of rights confirmation in rights collection, storage and use. To address in these challenges, this paper proposes a trapdoor hash-based data confirmation scheme for rights-controllable. First, in order to prevent the loss of data right during data transfer, this paper constructs a right confirmation model with the separation of holding, management, and usage rights, thus achieving a refined allocation of rights. Second, Aiming at the problem of uncontrollable generation of management rights of existing correlation algorithms, a data confirmation algorithms based on trapdoor hash is proposed, which realizes controllable generation of data management rights with changes and improves the efficiency of correlation at the same time. In addition, combined with blockchain technology, this paper designs a data transaction mechanism for authorization-traceable, which realizes the non-repudiation and traceability of data transactions by finely controlling the collection and access of data and uploading the corroboration information. Finally, through the security analysis and performance analysis, it is concluded that compared with the traditional scheme, the proposed scheme has advantages in terms of computation and storage overhead while ensuring that the rights signatures cannot be forged.
  • Gao Jianwei, Zhao Shutong, Huang Ningbo
    Accepted: 2025-10-28
    Under the background of rapid development of artificial intelligence, a group intelligent emergency decision-making method based on large language model and retrieval enhancement generation technology is proposed to address the problems of insufficient public participation and strong dependence on specialized knowledge in current emergency decision-making. It aims to integrate social media public data and domain knowledge base, construct a public-expert collaborative multi-attribute decision-making model, improve the scientific and response effectiveness of disaster response, and apply it to emergency management. Firstly, we use Python crawler tool to obtain public comments from microblogging platform to form the emergency disaster demand database; secondly, we integrate the emergency management professional database based on RAG technology to enhance the model generating ability, guide the topic classification through cue word engineering, construct the topic word co-occurrence network, adopt Louvain algorithm clustering, and combine with the expert checking and optimization, to generate attribute sets of emergency decision-making; and then, we integrate the importance and cohesiveness of the public-expert collaborative multi-attribute decision-making model, and apply it to the emergency management. , synthesize the importance and cohesion factors to construct the attribute weight measurement model; finally, consider the psychological behavior of decision makers, and use TODIM method to sort and optimize the alternative emergency solutions. Taking the 7-20 Henan rainstorm event as an example, the experimental results show that the method proposed in this paper is able to generate emergency decision-making topics that meet the public demand, and performs well in the consistency and diversity of the topics, which are 0.583 and 0.943, respectively, verifying the scientificity and effectiveness of the method proposed in this paper.
  • ZHAO Shuxu, CHEN Yanhong, WANG Xiaolong, JIANG Kaijun
    Accepted: 2025-10-28
    】To address issues such as resource mismatch, load bottlenecks, and service instability caused by demand fluctuations and large-scale bursty tasks in mobile edge computing, a cooperative supply strategy based on approximate Shapley values (ASVC) is pro posed. First, a task allocation model based on bidirectional preference matching is constructed, which considers both the performance requirements of user tasks and the resource status of edge nodes. The Gale-Shapley algorithm is used to achieve optimal supply-demand matching. Second, to reduce the computational complexity of Shapley value estimation during coalition formation, an adaptive sam pling-based optimization scheme is introduced. This approach significantly reduces the computation time of Shapley values while maintaining accuracy. Finally, task data is allocated according to the proportional contribution of each node, improving system fairness and resource utilization efficiency. Simulation results show that, compared with existing algorithms, the proposed ASVC algorithm improves service quality, delay control, task completion rate, and system load balancing by approximately 27.8%, 31.0%, 30.8%, and 21%, respectively.
  • Yanli Lv, Yiwen Jiang, Hanyu Feng, Zhenqi Guo, Sheng Xiang
    Accepted: 2025-10-28
    As generative AI technologies become increasingly integrated into sensitive industries, the over-reliance of large generative models on memorizing training data during fine-tuning poses a growing risk of privacy leakage, where user identities, behavioral traces, and other sensitive information may be reconstructed during inference. To address this issue, a novel fine-tuning approach combining Differential Privacy (DP) with Low-Rank Adaptation (LoRA) is proposed. This method freezes the parameters of the pre-trained model and updates only the inserted LoRA modules. Additionally, Differential Privacy Stochastic Gradient Descent (DP-SGD) is introduced, implementing gradient norm clipping and Gaussian noise injection on a per-sample basis to minimize the model’s dependence on individual training samples. Based on the Qwen2-1.5B language model, a task-specific fine-tuning dataset incorporating user profiles is constructed, and adversarial samples targeting typical sensitive fields—such as identity markers, behavioral characteristics, and location data—are developed to evaluate the anti-leakage capabilities of traditional full-parameter fine-tuning versus the DP-LoRA approach. Experimental results demonstrate that fully fine-tuned models exhibit a high sensitive-information match rate of 73.07% across 130 adversarial samples, indicating severe privacy vulnerabilities. In contrast, the DP-LoRA fine-tuned models achieve a significantly reduced match rate of only 1.5%, with generated content showing minimal correlation to original training data. This approach effectively mitigates the risk of sensitive information disclosure, offering a cost-efficient and highly adaptable training strategy for deploying generative models in real-world scenarios with stringent data security requirements.
  • Guozheng Yang, Dongzhen Qi, Pan Chen, Zhaobin Shen, Pengyu Yin, Yanlin Huo
    Accepted: 2025-10-27
    Resource Public Key Infrastructure (RPKI) is an important mechanism to safeguard BGP routing security, which realizes the legitimacy verification of BGP announcements by Route Origin Authorization (ROA) and Route Origin Validation (ROV). As RPKI continues to advance globally, its deployment status and actual defense effect have become the focus of research. In recent years, researchers have carried out a great deal of researches about ROA configuration problems and ROV deployment measurements, portraying the operational status and protection capability of RPKI in real networks from different dimensions. Current RPKI-related surveys mainly focus on the theoretical research of the RPKI system itself, emphasizing its architectural vulnerabilities, without systematically organizing and deeply summarizing the key challenges and related studies encountered in the actual deployment of RPKI. This review systematically summarizes recent studies on deployment issues of the RPKI system. It focuses on classifying common types of errors in ROA configuration, including benign ROA conflicts and loose ROA registrations, providing a systematic analysis that reveals their causes and impacts on routing security. Finally, this review outlines future research directions in the field of RPKI deployment issues, providing a theoretical foundation and methodological reference for subsequent research in the directions of RPKI deployment optimization, security assessment and strategy research. This will help promote the widespread adoption of RPKI and enhance the defense against BGP prefix hijacking.
  • Liu Meigui, Zhang Neng, Li Jiale, Zhao Yuqi, Li Zengyang
    Accepted: 2025-10-27
    Redundant dependencies in software projects can lead to increased build size, performance overhead, and long-term maintenance burden. Although existing studies have investigated redundant dependencies in the Maven ecosystem, there remains a lack of analysis regarding their distribution across different dependency scopes (e.g., compile and test), their evolutionary patterns, and their impact on project popularity. To address this gap, we select 2,214 Java Maven open-source projects from GitHub as our study subjects. We employ a mvn command to identify dependencies that are declared but not actually used, and conduct a quantitative analysis of redundancy ratios based on their scopes. Furthermore, we apply the Mann-Kendall non-parametric trend test on 3,817 historical versions from 698 projects to identify trends in the evolution of redundant dependencies. To assess the relationship between redundant dependencies and project popularity or community activity, we construct five GitHub-based popularity and activity metrics, including star growth rate, fork growth rate, and issue closing rate, and perform Pearson correlation analysis. Experimental results show that redundant dependencies are primarily concentrated in the compile and test scopes, with median redundancy ratios of 33.33% and 30.00%, respectively. In terms of evolutionary trends, 48.1% of the projects maintained a stable redundancy ratio, 36.2% exhibited fluctuations, and a small proportion showed an increasing or decreasing trend. In the correlation analysis, only the issue closing rate shows a significantly weak negative correlation with the redundancy ratio. These findings provide developers with a detailed perspective on dependency management and can help optimize project configurations and improve software maintainability.
  • GAO Song, GAO Bo-lin, LU Jian, WU Yue-long, WANG He, XU Yue-yun
    Accepted: 2025-10-27
    Quantifying the discrepancy between different sensor perception algorithms' mapping of the physical world and identifying boundary data is a key challenge in automating the extraction of high-value boundary data. This paper proposes a discrepancy engine based on multi-source sensor data for the autonomous discovery of boundary data. The engine consists of two main modules: the discrepancy cognition module and the discrepancy rate calculation module. In the discrepancy cognition module, a discrepancy rate was defined, and an association model linking the discrepancy rate with perception mapping discrepancies was established. The average discrepancy rate of a dataset was used as the baseline discrepancy rate to quantify mapping discrepancies and identify boundary data. In the experiments, the baseline discrepancy rates of LiDAR, millimeter-wave radar, and vision-based perception algorithms were calculated as 0.17, 0.23, and 0.19, respectively. In the discrepancy rate calculation module, a 2D pixel distance matching strategy combining the chi-square distribution and Welsh loss was used to match camera-detected objects with those detected by LiDAR, millimeter-wave radar, and other cameras. Compagred to a fusion algorithm that used only a 3D distance matching strategy, the proposed approach achieved discrepancy rates of 0.16 and 0.14 relative to the ground truth on the test dataset, demonstrating that the improved matching strategy significantly enhanced the accuracy of the fusion algorithm. The results indicate that the discrepancy engine achieves average recognition accuracies of 0.85, 0.74, and 0.82 for the boundary data of LiDAR, millimeter-wave radar, and vision-based perception algorithms. Validation in real-world road scenarios, including straight urban roads, simple intersections, and complex intersections, confirms the engine's effectiveness in identifying perception boundary data.
  • Yu Chengwen, Xie Bin, Zhou BoBo, Li Xiang
    Accepted: 2025-10-27
    Extremely Large-scale Multiple-Input Multiple-Output (XL-MIMO) systems are considered as one of the key technologies to realize 6G communications. However, due to the significant increase in the number of antennas in XL-MIMO systems, the channel exhibits hybrid field characteristics, thus posing a great challenge to channel estimation. To address this problem, this paper proposes a deep learning-based Adaptive Frequency Filter Parallel Joint Convolutional Network (AFF-PJCN) channel estimation algorithm. Firstly, the received signal is processed by the adaptive frequency filter network, which is equipped with learnable filters that can automatically optimize the filtering parameters according to the input data, enabling adaptive signal analysis and modeling within the frequency domain, and effectively filtering out noise interference. Then, through the parallel joint convolutional network, the multi-scale convolutional operation of the parallel structure can effectively capture the global and local features of the received signal, further enhancing the channel estimation performance. To enhance the generalization ability of the model, a segmented hybrid data training strategy is adopted. The training set is constructed by independently sampling randomly in different signal-to-noise ratio intervals, ensuring that the model maintains robust performance under diverse channel conditions. The experimental results show that the proposed AFF-PJCN algorithm not only achieves superior estimation accuracy but also demonstrates stronger generalization and robustness compared with other existing channel estimation schemes in the hybrid field channel model of XL-MIMO systems.
  • FAN Zhengwei, CHANG Daofang, MAN Xingyu, WANG Chongwen
    Accepted: 2025-10-21
    X-ray inspection, as an intuitive means of nondestructive testing (NDT) of pipeline weld defects, plays a key role in the prevention of pipeline safety accidents. However, it remains challenging to accurately identify tiny defects in low-grayscale, low-contrast, and dark-toned X-ray images. Therefore, an innovative method is proposed to optimize the display effect of X-ray images of pipe welds under low-light conditions, and to achieve a certain improvement in the accuracy of defect detection. Firstly, the improved network framework of Retinex-Net is introduced, and the attention mechanism residual block is added to the network to restore illumination and enhance details of low-light X-ray images, suppress noise and artifacts, and output natural and obvious distortion enhancement images, providing high-quality input for subsequent detection. Secondly, a weld positioning and feature extraction algorithm based on drift Gaussian algorithm is designed, which adaptively tracks irregular long welds and automatically crops the weld area, which significantly reduces background interference and improves processing efficiency. Finally, the welding defect detection algorithm based on cross-layer feature fusion is optimized, and a feature codec architecture based on RSU module is constructed, and the attention mechanism is integrated in the feature extraction stage to strengthen cross-layer multi-scale feature fusion, so as to improve the detection accuracy and reduce the missed detection rate.The results show that the proposed method significantly improves the performance indicators in the public GDXray dataset, which not only effectively enhances the image quality, but also realizes the high degree of automation and fast response ability of weld defect detection, which proves its efficiency and accuracy in practical application scenarios.
  • ZHANG Bin, LI Run-hao, FENG Chao
    Accepted: 2025-10-20
    Automatic heap memory layout manipulation is the core technology for realizing exploit code generation of software memory corruption vulnerabilities, with the goal of constructing the necessary memory layout conditions for vulnerability exploitation by precisely controlling the allocation state of heap memory. However, existing memory automatic layout manipulation methods based on search and solving exhibit significant limitations in terms of efficiency. To address these challenges, this paper innovatively proposes a Large Language Model (LLM)-based approach for automatic memory layout manipulation. This method first leverages LLMs to automatically learn from the target heap manager's public documentation, source code comments, and analysis materials to acquire the allocator's operational mechanisms and key characteristics. Building on this foundation, the approach employs the powerful reasoning and feedback-driven thinking capabilities of LLMs to adopt an iterative layout strategy of "plan-verify-replan." By continuously incorporating feedback from debugger execution results to refine the layout planning strategy, it ultimately achieves automated memory layout. Experimental validation demonstrates that this solution successfully achieves precise memory layout in 12 real-world Linux user-space vulnerabilities and attains a 94.54% layout success rate on a benchmark comprising 3,735 test samples across six different heap managers. Compared to the search-based Gollum system, it improves layout manipulation speed by 2.33 times. Relative to the solving-based MAZE and BAGUA systems, it reduces the heap allocator behavior learning time from weeks to an average of 7.3 minutes without significantly compromising layout speed. These results verify that the proposed solution balances high efficiency and scalability, offering a new technical paradigm for LLM-based research on automated vulnerability exploitation.
  • Bojia Chen, Tingnian He, Lianjie Zhang, Shu'an Chen
    Accepted: 2025-10-20
    Cross-domain recommendation systems are widely applied in e-commerce and content platforms. Although the dual-target cross-domain recommendation (DTCDR) proposed in recent years has achieved a breakthrough in simultaneously improving the performance of both domains, it still faces two major challenges: 1) the generated user-item representations lack sufficient correlation and diversity; 2) the semantic noise mixed in the shared preferences leads to negative transfer problems. To address these issues, a dual-target cross-domain recommendation model based on heterogeneous graph and hierarchical preference disentanglement (HGPD-DTCDR) is proposed. Its core innovations include: 1) a heterogeneous graph collaborative learning framework is proposed to integrate user-item interactions, user social networks, and item attribute similarities, constructing a multi-relation heterogeneous graph, and generating high-order semantic representations through a relation graph convolutional network (R-GCN) to enhance the diversity and correlation of the representations; 2) a two-stage decoupling process is designed, first separating domain-specific and shared preferences through a variational graph encoder, and then introducing a semantic filtering network to optimize the quality of shared preferences. Experiments on five real cross-domain datasets show that the performance improvement of this model stems from the synergistic effect of heterogeneous graph modeling and hierarchical decoupling mechanisms. Compared with the best baseline, it achieves average improvements of 3.55%, 7.27%, and 15.57% in hit rate, normalized discounted cumulative gain, and mean reciprocal rank, respectively. In data-sparse scenarios, the performance improvement is even more significant, with an average gain of 10.35%. Ablation studies further verify the effectiveness of each technical component and their synergistic effects.
  • Xu Haoyu, Zhang Jing, Zhang Jiamin
    Accepted: 2025-10-20
    To address the challenges of small target scale, complex background, and insufficient feature representation in the detection of potential hazards on high-voltage overhead transmission lines, this paper proposes an improved lightweight real-time detection model, LG-DETR. First, a lightweight backbone network, ResNet-WT, is designed by introducing wavelet transform convolution to enhance multi-scale feature extraction while reducing computational complexity. Meanwhile, a frequency-separated self-attention mechanism is adopted in the feature fusion stage to improve the feature interaction module HL-AIFI, thereby mitigating background interference. Then, a cross-level multi-scale information aggregation feature pyramid network CMIAFPN is proposed to optimize feature transmission paths, combined with a gating module to improve feature retention efficiency and prevent detail loss in high-level features. Furthermore, by incorporating the scaling factor of Focal Loss into Wise-IoU, a novel Focal-WIoU loss function is developed to dynamically adjust the weighting of hard and easy samples, thereby enhancing the detection accuracy of small targets. Experimental results demonstrate that LG-DETR achieves a 6.94 percentage point improvement in and 23.9% reduction in parameters on a high-voltage overhead transmission line hazard dataset, verifying the effectiveness of the proposed improvements.
  • Wang Ruixuan, Li Yan, Zhong Jinghua, Yao Dengfeng, Xu Cheng, Ren Tianyu
    Accepted: 2025-10-17
    hinese Braille is a kind of script used by people with visual impairment in China and it is an important part of the National Commonly-Used Language and Script. At present, although there are some methods have been developed for the automatic translation from Chinese text to Braille text, there are still shortcomings. Braille word segmentation is a crucial step in Chinese-Braille translation, which seriously affects the final translation result. It is also an important task in the research of Braille informationization. Although pre-trained models have been widely used in the field of Chinese natural language processing, they are currently less commonly used in the study of Braille informationization. Braille and Chinese characters are expressions of the same language in different writing systems, and there are similarities and transferability between the two. Pre-trained models have great potential for development in the field of Braille informationization.This paper introduces the BERT pre-trained model into Braille word segmentation task. We used BERT to extract feature vectors and decoded them using CRF combining the whole-word masking strategy. A word segmentation model BERT-CRF-wwm of encoder-decoder structure is implemented. To address the issue that the original Chinese word segmentation information of the BERT model may interfere with Braille word segmentation, a new Braille embeddings is concatenated at the embedding layer and finally the BeBERT-CRF-wwm model is implemented. On the Chinese-Braille Corpus, it ultimately achieves a precision rate of 98.80% and a recall rate of 98.71%. Compared with existing Braille word segmentation methods, it achieves better results in various evaluation.
  • Huang Yinglai, Xiong Xueshan, Wan Langyi, He Yang, Yang Liusong
    Computer Engineering. https://doi.org/计 算 机 工 程
    Accepted: 2025-10-17
    Accurate classification of brain tumors is essential in medical imaging diagnosis. However, conventional approaches that heavily rely on expert experience suffer from low efficiency, while existing deep learning approaches struggle with modeling long-range dependencies and balancing global modeling with local feature extraction, resulting in suboptimal recognition accuracy. To address these issues, a Hierarchical Collaborative Residual Transformer Network (HCR-TNet) is proposed. First, a Conv-Pool-Transformer Composite Block (CPT-Block) is introduced to enhance local feature extraction and cross-level contextual modeling, thereby improving the representation of heterogeneous tumor regions. Second, the High-frequency Feature Extraction module (HFFE) module is incorporated to better capture textual details at tumor boundaries and subtle lesion characteristics while effectively suppressing noise. Finally, a Multi-scale residual block (MSRB) is designed to perform residual fusion with the CPT-Block, enabling cross-scale feature optimization from macro to micro structures. Experimental results on a public brain tumor MRI dataset show that the proposed method achieves a classification accuracy of 98.26%, a Kappa coefficient of 97.52%, and an MCC score of 97.52%. Compared to the ViT model, the accuracy is improved by 1.48% and the Kappa coefficient by 2.08%. Ablation studies and comparative experiments confirm the effectiveness of HCR-Net in brain tumor classification tasks, providing valuable methods and ideas for medical image analysis and automatic diagnosis systems.
  • Lin Hai, Yu Guo, Yin Zeming, Xu Xianchong, Liu Yuhai
    Accepted: 2025-10-17
    In long-context and high-concurrent scenarios, large language models (LLMs) encounter significant challenges during inference due to the quadratic growth of memory footprint caused by key-value (KV) cache in self-attention mechanisms, leading to excessive GPU memory consumption and limited throughput. Although KV cache sparsification have been proposed to address this issue, existing approaches still suffer from deficiencies in memory footprint, complexity of sliding window design, and computation-memory access overhead. This paper proposes DoubleSparse++, a triple-optimization framework that addresses these limitations through three innovative techniques: (1) A ring buffer-based sliding window decouples KV cache size from text length while reducing buffer update complexity from O(L) to O(1); (2) An exponential decay sparse equilibrium strategy dynamically allocates token sparsity according to layer indices, achieving progressive sparsification across layers; (3) Optimize the sparse inference kernel by implementing operator fusion and asynchronous device stream pipelines, achieving overlapped computation and memory access in long-context inference scenario, which significantly enhances computational intensity while reducing memory access frequency. Experimental validations conducted on domestic accelerators and mainstream LLMs (including OPT-6.7B, Vicuna-7B-v1.5, LLaMA-2-7B, LLaMA-3.1-8B, Qwen-2.5-7B) demonstrate that DoubleSparse++ achieves 1.31X inference speedup and 0.72X memory footprint reduction compared to DoubleSparse for 4K token generation tasks. Especially, in 13K token scenarios, the memory footprint further reduces to 0.56X of the baseline. Comprehensive performance analysis confirms that DoubleSparse++ constitutes an efficient KV cache sparse method, demonstrating strong applicability for LLM long-context inference and streaming deployment.
  • Li Shiyou, Lian Demeng, Zhou Xin, Han Mengzhi
    Accepted: 2025-10-17
    The CUDA sparse matrix template library (CUTLASS-Sparse) in the CUDA linear algebra template library (CUTLASS) is used to build customizable and high-performance sparse matrix-dense matrix multiplication (SpMM) kernels, which play an important role in many fields such as scientific computing and deep learning. However, it is only implemented and optimized for NVIDIA GPUs and cannot be applied to domestic accelerators. To solve this problem, a transplantation and optimization scheme for CUTLASS-Sparse for domestic accelerators is proposed. In the transplantation stage, the data access module, data computation module and data write-back module are adapted to the hardware architecture of domestic accelerators. In the optimization stage, two shared memory data reordering algorithms, a data pipeline strategy based on data prefetching and register double buffering, and a data write-back strategy based on data aggregation are proposed to address the problems of high conflict rate of shared memory physical storage units (bank), low shared memory bandwidth utilization, low data pipeline parallelism and low data write-back efficiency. Experimental results show that all three optimization methods significantly improve the performance of the transplanted CUTLASS-Sparse. For TF32 and FP16 data types, the overall performance of the optimized CUTLASS-Sparse increases by an average of 30% and 115% compared to the unoptimized version, respectively. It reaches an average of 76% and 60% of the performance of CUTLASS-Sparse on NVIDIA GPU L20, respectively. Under two hardware versions, the performance of the transplanted and optimized CUTLASS-Sparse is on average 2.36 times and 3.09 times that of the SPARSE math library on domestic accelerator platforms, respectively. The experimental results verify the effectiveness of the transplantation and optimization scheme.
  • Yue Minghui, He Yuxuan, Ren Yuanxin, ZHANG Liye
    Accepted: 2025-10-16
    Video understanding tasks face two major challenges: insufficient computational resources and video datasets scarcity. Current video models are massive and computationally intensive, relying on expensive equipment support and lengthy training period, the scarcity dataset also restricts models to train and generalize adequately. To address these problems, an efficient transfer learning method is introduced: the adapter training strategy. By freezing all the weights of the pre-trained Vision Transformer (ViT) model and only fine-tuning the parameters in the adapter, resource consumption can be significantly reduced while fully retaining the representational advantages of the pre-trained model. Based on the adapter training strategy, a hierarchical adapter and ViT backbone network are designed to jointly construct the Video ViT Adapter (VVA) model. The hierarchical adapter employs three spatiotemporal convolutions with different dimensions, which helps to balance the spatiotemporal relationships between details and the global context. Additionally, the Contrastive Language–Image Pre-training (CLIP) model, which possesses strong cross-modal learning capabilities, is introduced as the pre-trained model. This provides the VVA model with rich feature representations, facilitating effective fusion across different data modalities. VVA achieved excellent results on three standard action recognition datasets, with only 9.50M training parameters. Accuracy rates of 79.32% on Kinetics-400, 97.77% on UCF101, and 81.78% on HMDB51 were obtained. Such performance fully demonstrates that the adapter's efficiency and convenience can effectively address and properly resolve the challenges faced.
  • DING Lin, YANG Yang, GUO Caili, GUO JianZhang, LI Zheng
    Accepted: 2025-10-16
    The text-to-SQL task aims to automatically convert natural language queries into structured query language (Structured Query Language), serving as a key technology to enable non-technical users to access databases efficiently, thereby significantly improving data utilization.To address the challenge of large language models insufficiently understanding database schema information in prompts for text-to-SQL tasks, this paper proposes a table creation information-based fine-tuning method for large language models. Existing approaches often rely on complex, lengthy prompt templates or extensive fine-tuning data, facing two major bottlenecks: (1) The inclusion of complete prompt content in the templates dilutes the few critical cues, leading to attention dispersion in long-context understanding and consequently reducing inference performance; (2) The method requires manual collection and processing of tens of thousands of samples for large-scale fine-tuning to enable the model to achieve stable comprehension capability in text-to-SQL tasks after fine-tuning. To mitigate these issues, we propose a hybrid text-to-SQL generation strategy that integrates prompt engineering with fine-tuning. This method selects semantically relevant table creation information based on question similarity and combines it with concise prompt templates to construct a lightweight, manually curated fine-tuning dataset. Through supervised fine-tuning, the dataset guides large language models to better comprehend table schema information in prompts, enhancing their ability to capture relationships between tables and queries, thereby generating more accurate SQL statements. Experimental results demonstrate that the proposed method effectively reduces the model's reliance on extraneous information in prompt templates and mitigates attention dispersion during reasoning. The generated SQL queries achieve an execution accuracy of 83.37% , representing a 0.49 percentage point improvement over the baseline approach.
  • He Guangcheng, Li Deshi
    Accepted: 2025-10-16
    With the development of the industrial Internet, the traditional best-effort forwarding mode can no longer meet the needs of deterministic delay communication, and the IEEE 802.1 working group proposes the cyclic queue forwarding mechanism to achieve deterministic transmission. However, due to fixed-granularity slot forwarding, there are problems such as excessive resource occupation and limited deterministic delay range. Therefore, for time-triggered traffic scheduling with strict latency requirements, a hierarchical cyclic queuing and forwarding mechanism is proposed to reduce the time-triggered traffic delay and reduce resource occupation through fast forwarding. An optimization model to maximize network throughput was constructed to determine the forwarding mode and the injection time slot of the flows. Due to the NP-hard nature, a heuristic priority iterative incremental scheduling algorithm is proposed, which adopts traffic clustering, priority order update and incremental scheduling to realize the calculation of large-scale deterministic traffic. Experimental results show that compared with the CQF mechanism, the scheduling ability of this proposed mechanism is enhanced, and the lower bound of deterministic delay is reduced by half compared with the original mechanism. Resource occupation decreased by 25.77% on average. In multiple sets of experiments involving various topologies, different traffic characteristics and scales, the proposed algorithm is better than the four comparison schemes in terms of network throughput, and the average increase is 3.52%、2.04% and 51.77% compared with the Tabu Search、IRFS and Naive.
  • Yang Hongju , Liu Na , Li Yao Cao Fuyuan
    Accepted: 2025-10-16
    Sketch-guided image inpainting holds significant application value in photo restoration and creative editing but faces dual challenges of scarce user sketch data and restoration distortion caused by geometric deviations. Existing methods rely on edge detection to generate pseudo-sketches while neglecting user-drawn deviations (e.g., hand tremors, stroke breaks), leading to structural misalignment and detail blurring in complex scenes. To address these challenges, this study proposes an innovative framework combining a deformable sketch generation network with dual-stage guided inpainting. First, a deformable sketch generation network is constructed to model typical hand-drawn deviations, generating a large-scale sketch-image paired dataset with realistic geometric deformation features, effectively alleviating data scarcity. Second, a two-stage inpainting framework is designed: the first stage corrects geometric misalignment and repairs structural breaks in input sketches to optimize the sketches, while the second stage effectively integrates the optimized sketch information into the inpainting network to achieve collaborative optimization of global structural constraints and local texture generation. Experiments on benchmark datasets validate the method's effectiveness, achieving a peak signal-to-noise ratio (PSNR) of 25.78 dB and a structural similarity index (SSIM) of 0.852 on the CelebA-HQ dataset. The results fully demonstrate that this method effectively addresses the challenges of scarce user sketch data and geometric deviations while significantly improving the structural accuracy and perceptual quality of sketch-guided image inpainting.
  • SUN Wei, CHEN Jun Jie
    Accepted: 2025-10-13
    Maize is a vital economic crop, widely used in industry, animal husbandry, and grain-oil processing. Timely identification of maize diseases is crucial for ensuring stable yield. Currently, deep learning methods such as Convolutional Neural Networks (CNNs) have been widely applied to disease recognition. However, most existing methods rely solely on image information, overlooking features from other modalities. Moreover, their large parameter sizes and high deployment costs hinder practical applications. To address these challenges, we propose a lightweight image-text multimodal cache model, MF-cache, which contains only 0.061M parameters, achieving both low computational cost and high recognition accuracy. The model leverages the multimodal pre-trained model CLIP to extract image and text features, which are fused in parallel to form a key-value cache structure enriched with domain knowledge. Additionally, a weighted two-stage fusion mechanism is introduced to dynamically adjust the contribution of each modality to the classification outcome, enhancing both stability and interpretability. To improve robustness, various data augmentation strategies are employed to increase sample diversity and mitigate overfitting in low-data scenarios. Experimental results on a self-constructed dataset CornI&T and the public PlantVillage dataset demonstrate the effectiveness of the proposed method, achieving 99.72% and 98.80% accuracy, respectively. These results indicate that the method achieves excellent recognition performance while maintaining low computational overhead, offering an efficient and practical solution for crop disease detection. Furthermore, it highlights the potential of combining multimodal pre-trained models with few-shot learning in intelligent agricultural applications.
  • JIANG Yuhong, JIANG Qingquan, Zhang Rui, XI Huijuan, WU Jiongtao
    Accepted: 2025-10-13
    In e-commerce platforms, the volume of user click data is experiencing a rapid increase. Accurately modeling long-term behavior sequences of e-commerce users is crucial for capturing their preferences in recommendation systems. Currently, two-stage Click-through Rate (CTR) prediction models are widely used to forecast the CTR of users with long behavioral sequences. Specifically, the first stage employs approximate retrieval to filter subsequences related to the target item from massive historical behaviors, while the second stage performs fine-grained interest modeling on these subsequences. However, the two-stage model has two key issues: first, the second-stage process pays insufficient attention to the trend characteristics of user behavior; second, there exists a cross-stage semantic mismatch, which causes the second-stage subsequences to fail in fully conveying the users’ true interest structure. To address these issues, we propose a trend-aware probabilistic attention architecture. This model captures temporal trends in user behaviors and unifies interest representations across stages, significantly improving CTR prediction accuracy for long sequences. Experiments on two real-world e-commerce datasets show that our model outperforms state-of-the-art baselines, achieving up to 1.14% improvement in AUC and 4.2% in Logloss. This demonstrates that the model not only can identify the trend characteristics and dynamic preference structures in user behavior, but also verifies the optimization value of cross-stage semantic consistency.
  • YANG Chunxia, WANG Xin'ao, WANG Yulong
    Accepted: 2025-10-11

    High-accuracy air pollution prediction is crucial for environmental management and public health protection. To address the issues of spatiotemporal heterogeneity and multi-feature coupling in prediction tasks, this paper proposes a Multi- Decoupled Spatio-Temporal Dynamic Graph Convolutional Network (MD-STDGCN). The model aims to precisely capture the specific temporal patterns of local pollutant emissions and the dynamic interactions of cross-regional pollutant transport. The model first employs a dual-path self-supervised masked pretraining strategy for feature enhancement. The temporal path improves the ability to extract temporal features through local subsequence reconstruction, while the spatial path captures spatial heterogeneity via node sequence reconstruction. This mitigates the issue of representation degradation caused by distribution shift and heterogeneity. Second, the model introduces a multi-level residual decomposition and hierarchical prediction framework to progressively extract global temporal patterns, local spatiotemporal patterns, and short-term disturbances from the spatiotemporal series. The framework integrates channel-independent convolutions and multi-scale causal temporal attention for long-term trend modeling, an adaptive weight gating with dynamic graph convolution for directional and lagged transport, and GRUs for short-term fluctuations. Finally, multi-branch predictions are fused with dual-path enhanced representations to achieve end-to-end multi-step forecasting. Experimental results show that MD-STDGCN outperforms all baseline models with significant improvements in prediction accuracy across all datasets: on KnowAir, Yangtze River Delta, and KnowAir_V2, the average MAE is reduced by 7.34%、1.88% and 12.57%, and the RMSE is reduced by 7.64%、2.44% and 11.29%, respectively. By leveraging dual-path feature enhancement, multi-level decoupling, and dynamic graph learning, MD-STDGCN effectively alleviates the impact of feature entanglement and heterogeneity, improving both prediction accuracy and robustness. It can provide reliable support for air quality monitoring and governance decision-making.

  • FENG Guoping, CHEN Zhijian, Lin Zhiyu, HONG Liang
    Accepted: 2025-10-11
    This study explores automatic term recognition in the electric power domain, addressing challenges faced during its digital transformation, such as data silos and knowledge utilization. To improve the identification of specialized and new terms, a dynamic graph-assisted method combining large and small models is proposed. The approach enhances recall and precision through candidate term extraction and term classification. An initial knowledge graph is built using existing term databases. Target text-related nodes are queried and filtered with term features. A retrieval-augmented large language model extracts candidate terms, followed by adversarial training to develop a deep learning model for term classification. The dynamic term knowledge graph is iteratively updated based on classification results, forming a positive feedback loop. Experimental results show that the method's accuracy, recall, and F1 score improve over iterations, reaching 0.8647, 0.8565, and 0.8542, respectively, demonstrating superior performance compared to other term recognition methods.
  • LI Guang , ZHOU Yiqiang, GAO Xindan
    Accepted: 2025-09-29
    RGB-T (RGB-Thermal) semantic segmentation is a solution that enables reliable semantic scene understanding under poor lighting conditions or in complete darkness. Thermal imaging captures object infrared radiation features, providing stable edge detection under low-light conditions. This effectively compensates for the loss of texture details in RGB images under such environments. However, existing RGB-T semantic segmentation methods fail to fully utilize effective cross-modal information during multi-level interactions, leading to inaccurate predictions. To address this issue, this work constructs CMFANet (Cross-Modal Fusion Attention Network). First, it designs a cross-modal fusion module to establish complementary relationships between RGB and thermal features. Second, considering the importance of multi-dimensional and multi-scale information, a multi-dimensional attention module is introduced at the encoder to enhance deep feature extraction, while a multi-scale feature aggregation module is added at the decoder to capture texture details and contour information. Finally, the decoder integrates wavelet transforms with convolutional operations to improve segmentation accuracy. On the MFNet dataset, CMFANet achieves 73.8% in mean accuracy (mAcc) and 59.0% in mean intersection-over-union (mIoU). On the PST900 dataset, it attains 90.71% mAcc and 85.15% mIoU. Compared with existing cutting-edge methods, the model performs particularly well on key targets (such as cars, persons and bikes in MFNet, and survivors and backpacks in PST900). Visualization results verify its ability to effectively fuse RGB and thermal imaging modality information, restore texture details and target contours in low-light scenarios, and demonstrate better segmentation performance and strong generalization capabilities.
  • Xu Dai, Zhang Xiuzai, Yang Changjun, Zhong Yang, Guo Lin
    Accepted: 2025-09-29
    Accurate identification of water bodies in plateau lakes with high-resolution remote sensing images is of great significance for regional ecological protection and water resources management. Aiming at the insufficient multi-scale feature fusion and high-frequency detail attenuation caused by the low proportion of water bodies and easy loss of detailed features in the plateau scene, which leads to boundary blurring, omission of fine water bodies and mis-segmentation of complex scenes, we propose a two-branch multilevel fusion network based on the frequency domain-space domain synergy (Wavelet-ResNet-Swin Network (WRS-Net)). The low-frequency contour and high-frequency detail features of the water body are extracted by Adaptive Wavelet Decomposition, while a multi-stage ResNet50 is used to enhance the texture response by high-frequency gating units at the end of each stage to capture the spatial semantic information, Then the Cross Attention Fusion Module is designed to achieve the cooptimization of multi-scale semantics and details, combined with the Feature Alignment Module to solve the cross-layer feature misalignment problem; finally, the global context modeling is performed by Swin Transformer. Experiments on the self-constructed plateau lake dataset show that the Acc and mIoU metrics of WRS-Net are 96.52% and 93.44%, respectively, which are better than other comparative networks, and improve the accuracy of recognizing the water bodies of plateau lakes in remote sensing images.
  • LI Jie, LI Linsen
    Accepted: 2025-09-29
    With the development of logistics business, the collaborative delivery of unmanned aerial vehicle (UAV) swarms has become a key solution for cost reduction and efficiency improvement. In response to the demands of traditional delivery services and the constraints of UAVs themselves, a green collaborative delivery mechanism for UAV swarms under time window constraints is proposed. Firstly, a multi-task point delivery scenario is constructed, with parameters such as task time windows, task priorities, UAV payload capacity, and flight attitude-related energy consumption set. A multi-constraint model is established with the optimization goals of maximizing task benefits and minimizing energy consumption. Then, by discretizing the Zebra Optimization Algorithm, it is adapted to the discrete problems of UAV swarm path planning and task allocation. An individual coding rule is designed to guide the population to efficiently search in the solution space and generate delivery plans. Finally, simulation environments are built under different task scales and constraint conditions to systematically test and comparatively verify the proposed mechanism. Experimental results show that the proposed mechanism significantly outperforms IGCPA, AGA, and ACO algorithms in terms of energy consumption control, task benefits, and convergence speed. It can enhance delivery efficiency and reduce energy consumption while meeting complex task constraints, demonstrating promising engineering application prospects.