Author Login Chief Editor Login Reviewer Login Editor Login Remote Office
Home Browse Just accepted

Just accepted

Please wait a minute...
  • Select all
    |
  • Guozheng Yang, Dongzhen Qi, Pan Chen, Zhaobin Shen, Pengyu Yin, Yanlin Huo
    Accepted: 2025-10-27
    Resource Public Key Infrastructure (RPKI) is an important mechanism to safeguard BGP routing security, which realizes the legitimacy verification of BGP announcements by Route Origin Authorization (ROA) and Route Origin Validation (ROV). As RPKI continues to advance globally, its deployment status and actual defense effect have become the focus of research. In recent years, researchers have carried out a great deal of researches about ROA configuration problems and ROV deployment measurements, portraying the operational status and protection capability of RPKI in real networks from different dimensions. Current RPKI-related surveys mainly focus on the theoretical research of the RPKI system itself, emphasizing its architectural vulnerabilities, without systematically organizing and deeply summarizing the key challenges and related studies encountered in the actual deployment of RPKI. This review systematically summarizes recent studies on deployment issues of the RPKI system. It focuses on classifying common types of errors in ROA configuration, including benign ROA conflicts and loose ROA registrations, providing a systematic analysis that reveals their causes and impacts on routing security. Finally, this review outlines future research directions in the field of RPKI deployment issues, providing a theoretical foundation and methodological reference for subsequent research in the directions of RPKI deployment optimization, security assessment and strategy research. This will help promote the widespread adoption of RPKI and enhance the defense against BGP prefix hijacking.
  • Liu Meigui, Zhang Neng, Li Jiale, Zhao Yuqi, Li Zengyang
    Accepted: 2025-10-27
    Redundant dependencies in software projects can lead to increased build size, performance overhead, and long-term maintenance burden. Although existing studies have investigated redundant dependencies in the Maven ecosystem, there remains a lack of analysis regarding their distribution across different dependency scopes (e.g., compile and test), their evolutionary patterns, and their impact on project popularity. To address this gap, we select 2,214 Java Maven open-source projects from GitHub as our study subjects. We employ a mvn command to identify dependencies that are declared but not actually used, and conduct a quantitative analysis of redundancy ratios based on their scopes. Furthermore, we apply the Mann-Kendall non-parametric trend test on 3,817 historical versions from 698 projects to identify trends in the evolution of redundant dependencies. To assess the relationship between redundant dependencies and project popularity or community activity, we construct five GitHub-based popularity and activity metrics, including star growth rate, fork growth rate, and issue closing rate, and perform Pearson correlation analysis. Experimental results show that redundant dependencies are primarily concentrated in the compile and test scopes, with median redundancy ratios of 33.33% and 30.00%, respectively. In terms of evolutionary trends, 48.1% of the projects maintained a stable redundancy ratio, 36.2% exhibited fluctuations, and a small proportion showed an increasing or decreasing trend. In the correlation analysis, only the issue closing rate shows a significantly weak negative correlation with the redundancy ratio. These findings provide developers with a detailed perspective on dependency management and can help optimize project configurations and improve software maintainability.
  • GAO Song, GAO Bo-lin, LU Jian, WU Yue-long, WANG He, XU Yue-yun
    Accepted: 2025-10-27
    Quantifying the discrepancy between different sensor perception algorithms' mapping of the physical world and identifying boundary data is a key challenge in automating the extraction of high-value boundary data. This paper proposes a discrepancy engine based on multi-source sensor data for the autonomous discovery of boundary data. The engine consists of two main modules: the discrepancy cognition module and the discrepancy rate calculation module. In the discrepancy cognition module, a discrepancy rate was defined, and an association model linking the discrepancy rate with perception mapping discrepancies was established. The average discrepancy rate of a dataset was used as the baseline discrepancy rate to quantify mapping discrepancies and identify boundary data. In the experiments, the baseline discrepancy rates of LiDAR, millimeter-wave radar, and vision-based perception algorithms were calculated as 0.17, 0.23, and 0.19, respectively. In the discrepancy rate calculation module, a 2D pixel distance matching strategy combining the chi-square distribution and Welsh loss was used to match camera-detected objects with those detected by LiDAR, millimeter-wave radar, and other cameras. Compagred to a fusion algorithm that used only a 3D distance matching strategy, the proposed approach achieved discrepancy rates of 0.16 and 0.14 relative to the ground truth on the test dataset, demonstrating that the improved matching strategy significantly enhanced the accuracy of the fusion algorithm. The results indicate that the discrepancy engine achieves average recognition accuracies of 0.85, 0.74, and 0.82 for the boundary data of LiDAR, millimeter-wave radar, and vision-based perception algorithms. Validation in real-world road scenarios, including straight urban roads, simple intersections, and complex intersections, confirms the engine's effectiveness in identifying perception boundary data.
  • Yu Chengwen, Xie Bin, Zhou BoBo, Li Xiang
    Accepted: 2025-10-27
    Extremely Large-scale Multiple-Input Multiple-Output (XL-MIMO) systems are considered as one of the key technologies to realize 6G communications. However, due to the significant increase in the number of antennas in XL-MIMO systems, the channel exhibits hybrid field characteristics, thus posing a great challenge to channel estimation. To address this problem, this paper proposes a deep learning-based Adaptive Frequency Filter Parallel Joint Convolutional Network (AFF-PJCN) channel estimation algorithm. Firstly, the received signal is processed by the adaptive frequency filter network, which is equipped with learnable filters that can automatically optimize the filtering parameters according to the input data, enabling adaptive signal analysis and modeling within the frequency domain, and effectively filtering out noise interference. Then, through the parallel joint convolutional network, the multi-scale convolutional operation of the parallel structure can effectively capture the global and local features of the received signal, further enhancing the channel estimation performance. To enhance the generalization ability of the model, a segmented hybrid data training strategy is adopted. The training set is constructed by independently sampling randomly in different signal-to-noise ratio intervals, ensuring that the model maintains robust performance under diverse channel conditions. The experimental results show that the proposed AFF-PJCN algorithm not only achieves superior estimation accuracy but also demonstrates stronger generalization and robustness compared with other existing channel estimation schemes in the hybrid field channel model of XL-MIMO systems.
  • FAN Zhengwei, CHANG Daofang, MAN Xingyu, WANG Chongwen
    Accepted: 2025-10-21
    X-ray inspection, as an intuitive means of nondestructive testing (NDT) of pipeline weld defects, plays a key role in the prevention of pipeline safety accidents. However, it remains challenging to accurately identify tiny defects in low-grayscale, low-contrast, and dark-toned X-ray images. Therefore, an innovative method is proposed to optimize the display effect of X-ray images of pipe welds under low-light conditions, and to achieve a certain improvement in the accuracy of defect detection. Firstly, the improved network framework of Retinex-Net is introduced, and the attention mechanism residual block is added to the network to restore illumination and enhance details of low-light X-ray images, suppress noise and artifacts, and output natural and obvious distortion enhancement images, providing high-quality input for subsequent detection. Secondly, a weld positioning and feature extraction algorithm based on drift Gaussian algorithm is designed, which adaptively tracks irregular long welds and automatically crops the weld area, which significantly reduces background interference and improves processing efficiency. Finally, the welding defect detection algorithm based on cross-layer feature fusion is optimized, and a feature codec architecture based on RSU module is constructed, and the attention mechanism is integrated in the feature extraction stage to strengthen cross-layer multi-scale feature fusion, so as to improve the detection accuracy and reduce the missed detection rate.The results show that the proposed method significantly improves the performance indicators in the public GDXray dataset, which not only effectively enhances the image quality, but also realizes the high degree of automation and fast response ability of weld defect detection, which proves its efficiency and accuracy in practical application scenarios.
  • ZHANG Bin, LI Run-hao, FENG Chao
    Accepted: 2025-10-20
    Automatic heap memory layout manipulation is the core technology for realizing exploit code generation of software memory corruption vulnerabilities, with the goal of constructing the necessary memory layout conditions for vulnerability exploitation by precisely controlling the allocation state of heap memory. However, existing memory automatic layout manipulation methods based on search and solving exhibit significant limitations in terms of efficiency. To address these challenges, this paper innovatively proposes a Large Language Model (LLM)-based approach for automatic memory layout manipulation. This method first leverages LLMs to automatically learn from the target heap manager's public documentation, source code comments, and analysis materials to acquire the allocator's operational mechanisms and key characteristics. Building on this foundation, the approach employs the powerful reasoning and feedback-driven thinking capabilities of LLMs to adopt an iterative layout strategy of "plan-verify-replan." By continuously incorporating feedback from debugger execution results to refine the layout planning strategy, it ultimately achieves automated memory layout. Experimental validation demonstrates that this solution successfully achieves precise memory layout in 12 real-world Linux user-space vulnerabilities and attains a 94.54% layout success rate on a benchmark comprising 3,735 test samples across six different heap managers. Compared to the search-based Gollum system, it improves layout manipulation speed by 2.33 times. Relative to the solving-based MAZE and BAGUA systems, it reduces the heap allocator behavior learning time from weeks to an average of 7.3 minutes without significantly compromising layout speed. These results verify that the proposed solution balances high efficiency and scalability, offering a new technical paradigm for LLM-based research on automated vulnerability exploitation.
  • Bojia Chen, Tingnian He, Lianjie Zhang, Shu'an Chen
    Accepted: 2025-10-20
    Cross-domain recommendation systems are widely applied in e-commerce and content platforms. Although the dual-target cross-domain recommendation (DTCDR) proposed in recent years has achieved a breakthrough in simultaneously improving the performance of both domains, it still faces two major challenges: 1) the generated user-item representations lack sufficient correlation and diversity; 2) the semantic noise mixed in the shared preferences leads to negative transfer problems. To address these issues, a dual-target cross-domain recommendation model based on heterogeneous graph and hierarchical preference disentanglement (HGPD-DTCDR) is proposed. Its core innovations include: 1) a heterogeneous graph collaborative learning framework is proposed to integrate user-item interactions, user social networks, and item attribute similarities, constructing a multi-relation heterogeneous graph, and generating high-order semantic representations through a relation graph convolutional network (R-GCN) to enhance the diversity and correlation of the representations; 2) a two-stage decoupling process is designed, first separating domain-specific and shared preferences through a variational graph encoder, and then introducing a semantic filtering network to optimize the quality of shared preferences. Experiments on five real cross-domain datasets show that the performance improvement of this model stems from the synergistic effect of heterogeneous graph modeling and hierarchical decoupling mechanisms. Compared with the best baseline, it achieves average improvements of 3.55%, 7.27%, and 15.57% in hit rate, normalized discounted cumulative gain, and mean reciprocal rank, respectively. In data-sparse scenarios, the performance improvement is even more significant, with an average gain of 10.35%. Ablation studies further verify the effectiveness of each technical component and their synergistic effects.
  • Xu Haoyu, Zhang Jing, Zhang Jiamin
    Accepted: 2025-10-20
    To address the challenges of small target scale, complex background, and insufficient feature representation in the detection of potential hazards on high-voltage overhead transmission lines, this paper proposes an improved lightweight real-time detection model, LG-DETR. First, a lightweight backbone network, ResNet-WT, is designed by introducing wavelet transform convolution to enhance multi-scale feature extraction while reducing computational complexity. Meanwhile, a frequency-separated self-attention mechanism is adopted in the feature fusion stage to improve the feature interaction module HL-AIFI, thereby mitigating background interference. Then, a cross-level multi-scale information aggregation feature pyramid network CMIAFPN is proposed to optimize feature transmission paths, combined with a gating module to improve feature retention efficiency and prevent detail loss in high-level features. Furthermore, by incorporating the scaling factor of Focal Loss into Wise-IoU, a novel Focal-WIoU loss function is developed to dynamically adjust the weighting of hard and easy samples, thereby enhancing the detection accuracy of small targets. Experimental results demonstrate that LG-DETR achieves a 6.94 percentage point improvement in and 23.9% reduction in parameters on a high-voltage overhead transmission line hazard dataset, verifying the effectiveness of the proposed improvements.
  • Wang Ruixuan, Li Yan, Zhong Jinghua, Yao Dengfeng, Xu Cheng, Ren Tianyu
    Accepted: 2025-10-17
    hinese Braille is a kind of script used by people with visual impairment in China and it is an important part of the National Commonly-Used Language and Script. At present, although there are some methods have been developed for the automatic translation from Chinese text to Braille text, there are still shortcomings. Braille word segmentation is a crucial step in Chinese-Braille translation, which seriously affects the final translation result. It is also an important task in the research of Braille informationization. Although pre-trained models have been widely used in the field of Chinese natural language processing, they are currently less commonly used in the study of Braille informationization. Braille and Chinese characters are expressions of the same language in different writing systems, and there are similarities and transferability between the two. Pre-trained models have great potential for development in the field of Braille informationization.This paper introduces the BERT pre-trained model into Braille word segmentation task. We used BERT to extract feature vectors and decoded them using CRF combining the whole-word masking strategy. A word segmentation model BERT-CRF-wwm of encoder-decoder structure is implemented. To address the issue that the original Chinese word segmentation information of the BERT model may interfere with Braille word segmentation, a new Braille embeddings is concatenated at the embedding layer and finally the BeBERT-CRF-wwm model is implemented. On the Chinese-Braille Corpus, it ultimately achieves a precision rate of 98.80% and a recall rate of 98.71%. Compared with existing Braille word segmentation methods, it achieves better results in various evaluation.
  • Huang Yinglai, Xiong Xueshan, Wan Langyi, He Yang, Yang Liusong
    Computer Engineering. https://doi.org/计 算 机 工 程
    Accepted: 2025-10-17
    Accurate classification of brain tumors is essential in medical imaging diagnosis. However, conventional approaches that heavily rely on expert experience suffer from low efficiency, while existing deep learning approaches struggle with modeling long-range dependencies and balancing global modeling with local feature extraction, resulting in suboptimal recognition accuracy. To address these issues, a Hierarchical Collaborative Residual Transformer Network (HCR-TNet) is proposed. First, a Conv-Pool-Transformer Composite Block (CPT-Block) is introduced to enhance local feature extraction and cross-level contextual modeling, thereby improving the representation of heterogeneous tumor regions. Second, the High-frequency Feature Extraction module (HFFE) module is incorporated to better capture textual details at tumor boundaries and subtle lesion characteristics while effectively suppressing noise. Finally, a Multi-scale residual block (MSRB) is designed to perform residual fusion with the CPT-Block, enabling cross-scale feature optimization from macro to micro structures. Experimental results on a public brain tumor MRI dataset show that the proposed method achieves a classification accuracy of 98.26%, a Kappa coefficient of 97.52%, and an MCC score of 97.52%. Compared to the ViT model, the accuracy is improved by 1.48% and the Kappa coefficient by 2.08%. Ablation studies and comparative experiments confirm the effectiveness of HCR-Net in brain tumor classification tasks, providing valuable methods and ideas for medical image analysis and automatic diagnosis systems.
  • Lin Hai, Yu Guo, Yin Zeming, Xu Xianchong, Liu Yuhai
    Accepted: 2025-10-17
    In long-context and high-concurrent scenarios, large language models (LLMs) encounter significant challenges during inference due to the quadratic growth of memory footprint caused by key-value (KV) cache in self-attention mechanisms, leading to excessive GPU memory consumption and limited throughput. Although KV cache sparsification have been proposed to address this issue, existing approaches still suffer from deficiencies in memory footprint, complexity of sliding window design, and computation-memory access overhead. This paper proposes DoubleSparse++, a triple-optimization framework that addresses these limitations through three innovative techniques: (1) A ring buffer-based sliding window decouples KV cache size from text length while reducing buffer update complexity from O(L) to O(1); (2) An exponential decay sparse equilibrium strategy dynamically allocates token sparsity according to layer indices, achieving progressive sparsification across layers; (3) Optimize the sparse inference kernel by implementing operator fusion and asynchronous device stream pipelines, achieving overlapped computation and memory access in long-context inference scenario, which significantly enhances computational intensity while reducing memory access frequency. Experimental validations conducted on domestic accelerators and mainstream LLMs (including OPT-6.7B, Vicuna-7B-v1.5, LLaMA-2-7B, LLaMA-3.1-8B, Qwen-2.5-7B) demonstrate that DoubleSparse++ achieves 1.31X inference speedup and 0.72X memory footprint reduction compared to DoubleSparse for 4K token generation tasks. Especially, in 13K token scenarios, the memory footprint further reduces to 0.56X of the baseline. Comprehensive performance analysis confirms that DoubleSparse++ constitutes an efficient KV cache sparse method, demonstrating strong applicability for LLM long-context inference and streaming deployment.
  • Li Shiyou, Lian Demeng, Zhou Xin, Han Mengzhi
    Accepted: 2025-10-17
    The CUDA sparse matrix template library (CUTLASS-Sparse) in the CUDA linear algebra template library (CUTLASS) is used to build customizable and high-performance sparse matrix-dense matrix multiplication (SpMM) kernels, which play an important role in many fields such as scientific computing and deep learning. However, it is only implemented and optimized for NVIDIA GPUs and cannot be applied to domestic accelerators. To solve this problem, a transplantation and optimization scheme for CUTLASS-Sparse for domestic accelerators is proposed. In the transplantation stage, the data access module, data computation module and data write-back module are adapted to the hardware architecture of domestic accelerators. In the optimization stage, two shared memory data reordering algorithms, a data pipeline strategy based on data prefetching and register double buffering, and a data write-back strategy based on data aggregation are proposed to address the problems of high conflict rate of shared memory physical storage units (bank), low shared memory bandwidth utilization, low data pipeline parallelism and low data write-back efficiency. Experimental results show that all three optimization methods significantly improve the performance of the transplanted CUTLASS-Sparse. For TF32 and FP16 data types, the overall performance of the optimized CUTLASS-Sparse increases by an average of 30% and 115% compared to the unoptimized version, respectively. It reaches an average of 76% and 60% of the performance of CUTLASS-Sparse on NVIDIA GPU L20, respectively. Under two hardware versions, the performance of the transplanted and optimized CUTLASS-Sparse is on average 2.36 times and 3.09 times that of the SPARSE math library on domestic accelerator platforms, respectively. The experimental results verify the effectiveness of the transplantation and optimization scheme.
  • Yue Minghui, He Yuxuan, Ren Yuanxin, ZHANG Liye
    Accepted: 2025-10-16
    Video understanding tasks face two major challenges: insufficient computational resources and video datasets scarcity. Current video models are massive and computationally intensive, relying on expensive equipment support and lengthy training period, the scarcity dataset also restricts models to train and generalize adequately. To address these problems, an efficient transfer learning method is introduced: the adapter training strategy. By freezing all the weights of the pre-trained Vision Transformer (ViT) model and only fine-tuning the parameters in the adapter, resource consumption can be significantly reduced while fully retaining the representational advantages of the pre-trained model. Based on the adapter training strategy, a hierarchical adapter and ViT backbone network are designed to jointly construct the Video ViT Adapter (VVA) model. The hierarchical adapter employs three spatiotemporal convolutions with different dimensions, which helps to balance the spatiotemporal relationships between details and the global context. Additionally, the Contrastive Language–Image Pre-training (CLIP) model, which possesses strong cross-modal learning capabilities, is introduced as the pre-trained model. This provides the VVA model with rich feature representations, facilitating effective fusion across different data modalities. VVA achieved excellent results on three standard action recognition datasets, with only 9.50M training parameters. Accuracy rates of 79.32% on Kinetics-400, 97.77% on UCF101, and 81.78% on HMDB51 were obtained. Such performance fully demonstrates that the adapter's efficiency and convenience can effectively address and properly resolve the challenges faced.
  • DING Lin, YANG Yang, GUO Caili, GUO JianZhang, LI Zheng
    Accepted: 2025-10-16
    The text-to-SQL task aims to automatically convert natural language queries into structured query language (Structured Query Language), serving as a key technology to enable non-technical users to access databases efficiently, thereby significantly improving data utilization.To address the challenge of large language models insufficiently understanding database schema information in prompts for text-to-SQL tasks, this paper proposes a table creation information-based fine-tuning method for large language models. Existing approaches often rely on complex, lengthy prompt templates or extensive fine-tuning data, facing two major bottlenecks: (1) The inclusion of complete prompt content in the templates dilutes the few critical cues, leading to attention dispersion in long-context understanding and consequently reducing inference performance; (2) The method requires manual collection and processing of tens of thousands of samples for large-scale fine-tuning to enable the model to achieve stable comprehension capability in text-to-SQL tasks after fine-tuning. To mitigate these issues, we propose a hybrid text-to-SQL generation strategy that integrates prompt engineering with fine-tuning. This method selects semantically relevant table creation information based on question similarity and combines it with concise prompt templates to construct a lightweight, manually curated fine-tuning dataset. Through supervised fine-tuning, the dataset guides large language models to better comprehend table schema information in prompts, enhancing their ability to capture relationships between tables and queries, thereby generating more accurate SQL statements. Experimental results demonstrate that the proposed method effectively reduces the model's reliance on extraneous information in prompt templates and mitigates attention dispersion during reasoning. The generated SQL queries achieve an execution accuracy of 83.37% , representing a 0.49 percentage point improvement over the baseline approach.
  • He Guangcheng, Li Deshi
    Accepted: 2025-10-16
    With the development of the industrial Internet, the traditional best-effort forwarding mode can no longer meet the needs of deterministic delay communication, and the IEEE 802.1 working group proposes the cyclic queue forwarding mechanism to achieve deterministic transmission. However, due to fixed-granularity slot forwarding, there are problems such as excessive resource occupation and limited deterministic delay range. Therefore, for time-triggered traffic scheduling with strict latency requirements, a hierarchical cyclic queuing and forwarding mechanism is proposed to reduce the time-triggered traffic delay and reduce resource occupation through fast forwarding. An optimization model to maximize network throughput was constructed to determine the forwarding mode and the injection time slot of the flows. Due to the NP-hard nature, a heuristic priority iterative incremental scheduling algorithm is proposed, which adopts traffic clustering, priority order update and incremental scheduling to realize the calculation of large-scale deterministic traffic. Experimental results show that compared with the CQF mechanism, the scheduling ability of this proposed mechanism is enhanced, and the lower bound of deterministic delay is reduced by half compared with the original mechanism. Resource occupation decreased by 25.77% on average. In multiple sets of experiments involving various topologies, different traffic characteristics and scales, the proposed algorithm is better than the four comparison schemes in terms of network throughput, and the average increase is 3.52%、2.04% and 51.77% compared with the Tabu Search、IRFS and Naive.
  • Yang Hongju , Liu Na , Li Yao Cao Fuyuan
    Accepted: 2025-10-16
    Sketch-guided image inpainting holds significant application value in photo restoration and creative editing but faces dual challenges of scarce user sketch data and restoration distortion caused by geometric deviations. Existing methods rely on edge detection to generate pseudo-sketches while neglecting user-drawn deviations (e.g., hand tremors, stroke breaks), leading to structural misalignment and detail blurring in complex scenes. To address these challenges, this study proposes an innovative framework combining a deformable sketch generation network with dual-stage guided inpainting. First, a deformable sketch generation network is constructed to model typical hand-drawn deviations, generating a large-scale sketch-image paired dataset with realistic geometric deformation features, effectively alleviating data scarcity. Second, a two-stage inpainting framework is designed: the first stage corrects geometric misalignment and repairs structural breaks in input sketches to optimize the sketches, while the second stage effectively integrates the optimized sketch information into the inpainting network to achieve collaborative optimization of global structural constraints and local texture generation. Experiments on benchmark datasets validate the method's effectiveness, achieving a peak signal-to-noise ratio (PSNR) of 25.78 dB and a structural similarity index (SSIM) of 0.852 on the CelebA-HQ dataset. The results fully demonstrate that this method effectively addresses the challenges of scarce user sketch data and geometric deviations while significantly improving the structural accuracy and perceptual quality of sketch-guided image inpainting.
  • SUN Wei, CHEN Jun Jie
    Accepted: 2025-10-13
    Maize is a vital economic crop, widely used in industry, animal husbandry, and grain-oil processing. Timely identification of maize diseases is crucial for ensuring stable yield. Currently, deep learning methods such as Convolutional Neural Networks (CNNs) have been widely applied to disease recognition. However, most existing methods rely solely on image information, overlooking features from other modalities. Moreover, their large parameter sizes and high deployment costs hinder practical applications. To address these challenges, we propose a lightweight image-text multimodal cache model, MF-cache, which contains only 0.061M parameters, achieving both low computational cost and high recognition accuracy. The model leverages the multimodal pre-trained model CLIP to extract image and text features, which are fused in parallel to form a key-value cache structure enriched with domain knowledge. Additionally, a weighted two-stage fusion mechanism is introduced to dynamically adjust the contribution of each modality to the classification outcome, enhancing both stability and interpretability. To improve robustness, various data augmentation strategies are employed to increase sample diversity and mitigate overfitting in low-data scenarios. Experimental results on a self-constructed dataset CornI&T and the public PlantVillage dataset demonstrate the effectiveness of the proposed method, achieving 99.72% and 98.80% accuracy, respectively. These results indicate that the method achieves excellent recognition performance while maintaining low computational overhead, offering an efficient and practical solution for crop disease detection. Furthermore, it highlights the potential of combining multimodal pre-trained models with few-shot learning in intelligent agricultural applications.
  • JIANG Yuhong, JIANG Qingquan, Zhang Rui, XI Huijuan, WU Jiongtao
    Accepted: 2025-10-13
    In e-commerce platforms, the volume of user click data is experiencing a rapid increase. Accurately modeling long-term behavior sequences of e-commerce users is crucial for capturing their preferences in recommendation systems. Currently, two-stage Click-through Rate (CTR) prediction models are widely used to forecast the CTR of users with long behavioral sequences. Specifically, the first stage employs approximate retrieval to filter subsequences related to the target item from massive historical behaviors, while the second stage performs fine-grained interest modeling on these subsequences. However, the two-stage model has two key issues: first, the second-stage process pays insufficient attention to the trend characteristics of user behavior; second, there exists a cross-stage semantic mismatch, which causes the second-stage subsequences to fail in fully conveying the users’ true interest structure. To address these issues, we propose a trend-aware probabilistic attention architecture. This model captures temporal trends in user behaviors and unifies interest representations across stages, significantly improving CTR prediction accuracy for long sequences. Experiments on two real-world e-commerce datasets show that our model outperforms state-of-the-art baselines, achieving up to 1.14% improvement in AUC and 4.2% in Logloss. This demonstrates that the model not only can identify the trend characteristics and dynamic preference structures in user behavior, but also verifies the optimization value of cross-stage semantic consistency.
  • YANG Chunxia, WANG Xin'ao, WANG Yulong
    Accepted: 2025-10-11

    High-accuracy air pollution prediction is crucial for environmental management and public health protection. To address the issues of spatiotemporal heterogeneity and multi-feature coupling in prediction tasks, this paper proposes a Multi- Decoupled Spatio-Temporal Dynamic Graph Convolutional Network (MD-STDGCN). The model aims to precisely capture the specific temporal patterns of local pollutant emissions and the dynamic interactions of cross-regional pollutant transport. The model first employs a dual-path self-supervised masked pretraining strategy for feature enhancement. The temporal path improves the ability to extract temporal features through local subsequence reconstruction, while the spatial path captures spatial heterogeneity via node sequence reconstruction. This mitigates the issue of representation degradation caused by distribution shift and heterogeneity. Second, the model introduces a multi-level residual decomposition and hierarchical prediction framework to progressively extract global temporal patterns, local spatiotemporal patterns, and short-term disturbances from the spatiotemporal series. The framework integrates channel-independent convolutions and multi-scale causal temporal attention for long-term trend modeling, an adaptive weight gating with dynamic graph convolution for directional and lagged transport, and GRUs for short-term fluctuations. Finally, multi-branch predictions are fused with dual-path enhanced representations to achieve end-to-end multi-step forecasting. Experimental results show that MD-STDGCN outperforms all baseline models with significant improvements in prediction accuracy across all datasets: on KnowAir, Yangtze River Delta, and KnowAir_V2, the average MAE is reduced by 7.34%、1.88% and 12.57%, and the RMSE is reduced by 7.64%、2.44% and 11.29%, respectively. By leveraging dual-path feature enhancement, multi-level decoupling, and dynamic graph learning, MD-STDGCN effectively alleviates the impact of feature entanglement and heterogeneity, improving both prediction accuracy and robustness. It can provide reliable support for air quality monitoring and governance decision-making.

  • FENG Guoping, CHEN Zhijian, Lin Zhiyu, HONG Liang
    Accepted: 2025-10-11
    This study explores automatic term recognition in the electric power domain, addressing challenges faced during its digital transformation, such as data silos and knowledge utilization. To improve the identification of specialized and new terms, a dynamic graph-assisted method combining large and small models is proposed. The approach enhances recall and precision through candidate term extraction and term classification. An initial knowledge graph is built using existing term databases. Target text-related nodes are queried and filtered with term features. A retrieval-augmented large language model extracts candidate terms, followed by adversarial training to develop a deep learning model for term classification. The dynamic term knowledge graph is iteratively updated based on classification results, forming a positive feedback loop. Experimental results show that the method's accuracy, recall, and F1 score improve over iterations, reaching 0.8647, 0.8565, and 0.8542, respectively, demonstrating superior performance compared to other term recognition methods.
  • LI Guang , ZHOU Yiqiang, GAO Xindan
    Accepted: 2025-09-29
    RGB-T (RGB-Thermal) semantic segmentation is a solution that enables reliable semantic scene understanding under poor lighting conditions or in complete darkness. Thermal imaging captures object infrared radiation features, providing stable edge detection under low-light conditions. This effectively compensates for the loss of texture details in RGB images under such environments. However, existing RGB-T semantic segmentation methods fail to fully utilize effective cross-modal information during multi-level interactions, leading to inaccurate predictions. To address this issue, this work constructs CMFANet (Cross-Modal Fusion Attention Network). First, it designs a cross-modal fusion module to establish complementary relationships between RGB and thermal features. Second, considering the importance of multi-dimensional and multi-scale information, a multi-dimensional attention module is introduced at the encoder to enhance deep feature extraction, while a multi-scale feature aggregation module is added at the decoder to capture texture details and contour information. Finally, the decoder integrates wavelet transforms with convolutional operations to improve segmentation accuracy. On the MFNet dataset, CMFANet achieves 73.8% in mean accuracy (mAcc) and 59.0% in mean intersection-over-union (mIoU). On the PST900 dataset, it attains 90.71% mAcc and 85.15% mIoU. Compared with existing cutting-edge methods, the model performs particularly well on key targets (such as cars, persons and bikes in MFNet, and survivors and backpacks in PST900). Visualization results verify its ability to effectively fuse RGB and thermal imaging modality information, restore texture details and target contours in low-light scenarios, and demonstrate better segmentation performance and strong generalization capabilities.
  • Xu Dai, Zhang Xiuzai, Yang Changjun, Zhong Yang, Guo Lin
    Accepted: 2025-09-29
    Accurate identification of water bodies in plateau lakes with high-resolution remote sensing images is of great significance for regional ecological protection and water resources management. Aiming at the insufficient multi-scale feature fusion and high-frequency detail attenuation caused by the low proportion of water bodies and easy loss of detailed features in the plateau scene, which leads to boundary blurring, omission of fine water bodies and mis-segmentation of complex scenes, we propose a two-branch multilevel fusion network based on the frequency domain-space domain synergy (Wavelet-ResNet-Swin Network (WRS-Net)). The low-frequency contour and high-frequency detail features of the water body are extracted by Adaptive Wavelet Decomposition, while a multi-stage ResNet50 is used to enhance the texture response by high-frequency gating units at the end of each stage to capture the spatial semantic information, Then the Cross Attention Fusion Module is designed to achieve the cooptimization of multi-scale semantics and details, combined with the Feature Alignment Module to solve the cross-layer feature misalignment problem; finally, the global context modeling is performed by Swin Transformer. Experiments on the self-constructed plateau lake dataset show that the Acc and mIoU metrics of WRS-Net are 96.52% and 93.44%, respectively, which are better than other comparative networks, and improve the accuracy of recognizing the water bodies of plateau lakes in remote sensing images.
  • LI Jie, LI Linsen
    Accepted: 2025-09-29
    With the development of logistics business, the collaborative delivery of unmanned aerial vehicle (UAV) swarms has become a key solution for cost reduction and efficiency improvement. In response to the demands of traditional delivery services and the constraints of UAVs themselves, a green collaborative delivery mechanism for UAV swarms under time window constraints is proposed. Firstly, a multi-task point delivery scenario is constructed, with parameters such as task time windows, task priorities, UAV payload capacity, and flight attitude-related energy consumption set. A multi-constraint model is established with the optimization goals of maximizing task benefits and minimizing energy consumption. Then, by discretizing the Zebra Optimization Algorithm, it is adapted to the discrete problems of UAV swarm path planning and task allocation. An individual coding rule is designed to guide the population to efficiently search in the solution space and generate delivery plans. Finally, simulation environments are built under different task scales and constraint conditions to systematically test and comparatively verify the proposed mechanism. Experimental results show that the proposed mechanism significantly outperforms IGCPA, AGA, and ACO algorithms in terms of energy consumption control, task benefits, and convergence speed. It can enhance delivery efficiency and reduce energy consumption while meeting complex task constraints, demonstrating promising engineering application prospects.
  • ZHANG Lina, ZHANG Chenyu, WANG Boyi, JIANG Tian, SHEN Tengfei
    Accepted: 2025-09-29
    The global spread of cardiovascular disease has made electrocardiogram (ECG) signal analysis a key tool for clinical diagnosis. However, the multi-label classification of ECG signals relies on the complete 12 leads, and faces challenges such as insufficient fusion of spatio-temporal features between leads and category imbalance.. To this end, an end-to-end deep learning model based on a few leads is proposed. The time domain features of ECG signals are extracted by a lightweight multi-scale inverse residual feature extraction module, and the time sequence dependence in the signals is captured by a sequential convolutional network and a bidirectional gated loop unit to improve the modeling ability of the model for complex spatio-temporal features. In order to optimize the feature fusion process, a bidirectional time-temporal cross-attention module is designed, which adaptively fuses multi-lead spatio-temporal information. To solve the problem of class imbalance, a dynamic weighted focus loss function is designed to enhance the ability of minority class recognition by dynamically adjusting sample weights. Experimental results on the CPSC-2018 dataset showed that the mean F1-score of the model reached 0.841 when only I, II and V1 lead signals were used, among which F1-score for atrial fibrillation and left/right bundle branch block were 0.942, 0.906 and 0.951, respectively. The experimental results on the PTB-XL dataset also perform well, confirming its application potential in resource-constrained environments and providing new ideas for ECG multi-label classification under reduced leads.
  • Zhang Dong, Peng Changgen, Tan Weijie, Cai Chuanda
    Accepted: 2025-09-29
    The proposal of searchable encryption provides an effective solution for encrypted search of cloud data, effectively alleviating the problem of limited local storage and computing resources. However, most current solutions mainly rely on keyword frequency statistics or single semantic retrieval, and cannot support retrieval tasks with both keywords and semantics; and most solutions generally adopt a tree storage structure, which is not efficient for retrieval of large-scale data sets. Therefore, this paper proposes an efficient hybrid ciphertext retrieval scheme based on the Milvus vector database and its built-in Hierarchical Navigable Small World (HNSW) data structure. The scheme uses the third-generation general text embedding model (BAAI General Embedding Model v3, BGE-M3) launched by Beijing Zhiyuan Research Institute to extract high-quality document semantic vectors and keyword vectors, encrypts the original vectors through cryptographic techniques such as AES, HMAC-based Extract-and-Expand Key Derivation Function (HKDF) and random matrix transformation, and uses the encrypted vectors to construct HNSW indexes and store them in the Milvus vector database. During retrieval, the semantic and keyword retrieval results are reordered through dynamic weighted fusion sorting, achieving real-time and efficient ciphertext retrieval in a large-scale data environment. At the same time, the scheme supports dynamic insertion, update and deletion operations and has good scalability. Experimental results on real data sets show that the proposed scheme improves retrieval efficiency and retrieval accuracy while ensuring data security and reducing computational overhead.
  • LIU Haonan, ZHOU Gang, LIU Jiangtao, JIA Zhenhong, WANG Jiajia
    Accepted: 2025-09-25
    Population dynamics of various insects during cotton growth directly impact agricultural decisions, making accurate population density data for different insect types a key basis for scientific cotton farming and pest management. In the pest detection task, Although the current small object detection algorithms can effectively detect small object insects , they often fail when dealing with larger insects. For this reason, this study proposes the MSDSR-YOLO(Multi-scale Dynamic Super-Resolution Reconstruction YOLO) object detection model, which utilizes the organic combination of image super-resolution technology and dynamic convolution to enhance the detection capability of small objects while further optimizing the detection performance of other scale objects. The model designs a new feature map super-resolution reconstruction network named SMAR-SRNet (Self-Modulated Attention-Residual Super-Resolution Network) and embeds it into the YOLOv11 model in conjunction with the P5-to-P3 feature fusion strategy, which realizes the accurate reconstruction of the deep features of the backbone and cross-layer fusion with the original shallow features, and enhances the detection ability of small object samples as well as the capture ability of both local and non-local features. Then, in this paper introduced omni-dimensional dynamic convolution (ODConv) into the backbone and neck structures of the network, and constructed the C3K2-OD module by combining with the C3K2 block, which improves the model's ability to capture rich contextual cues through an omni-dimensional dynamic convolution kernel and enhances the robustness of the network to multi-scale insect detection. Finally, this study constructed a yellow sticky board cotton field insect dataset XJ-CottonPest2024 in Xinjiang region containing seven different scales of cotton field insects. Experiments show that the proposed method achieves the best mAP50 values on both the self-built dataset and public dataset. And the comparative analysis of insect detection effects at different scale, it is further proved that the proposed network has the advantages in insect detection with small objects as the main focus and multi-scale coexistence, which is conducive to its application in the field of smart agriculture.
  • JiaKun LI, YanQing LIU, Fang DU, ZhenHua YU, Yu FENG, Hui Wang, XianHao HUO
    Accepted: 2025-09-25
    To address the challenges faced by general-purpose medical large language models (LLMs) in the field of brain tumor care—namely the scarcity of domain-specific data, limited clinical adaptability, and insufficient accuracy of generated content—this paper proposes BrainTumorLLM, a specialized large language model tailored for brain tumor diagnosis and treatment. Built upon the Meta-Llama-3-8B-Instruct foundation model, BrainTumorLLM is optimized through Supervised Fine-tuning (SFT) and Reinforcement Learning with Human Feedback (RLHF), and trained using a self-constructed, high-quality dataset named BrainTumorQA. This dataset comprises 11,000 question-answer pairs, encompassing both macro-level medical knowledge (symptoms, diagnostic methods, treatment strategies) and micro-level clinical cases, including 1,252 de-identified real-world brain tumor MRI reports, with privacy safeguarded via anonymization and information constraint strategies. From a technical perspective, Low-Rank Adaptation (LoRA) is employed to enhance training efficiency. A two-tier prompting framework is designed to guide the model in generating domain-specific responses at both macro and micro levels. Furthermore, human feedback learning is integrated through an expert preference-driven optimization mechanism and the Proximal Policy Optimization (PPO) algorithm, reinforcing the clinical consistency of the generated content. Experimental results demonstrate that BrainTumorLLM significantly outperforms both general-purpose and medical-domain models on brain tumor-related question answering tasks. In automatic evaluations, it achieves BLEU-1 and BLEU-2 scores of 0.3383 and 0.2684, respectively, and ROUGE-1, ROUGE-2, and ROUGE-L scores of 0.3237, 0.1466, and 0.2611. Moreover, the model’s perplexity is substantially reduced from 20.362 (base model) to 7.674, highlighting its domain-specific precision, professional accuracy, and potential for clinical application. BrainTumorLLM offers a robust AI-powered tool to support brain tumor diagnosis, treatment planning, and medical research.
  • GENG Yongkang, PANG Chunying, Li Jia, Zhou Weikun, Ma Shengzhe
    Accepted: 2025-09-23
    In recent years, multimodal magnetic resonance imaging (MRI) technology has demonstrated significant advantages in brain disease diagnosis and brain network analysis. However, challenges remain in effectively optimizing and correlating multi-modal data such as rs-fMRI (resting-state functional connectivity) and DTI (white matter fiber structure), while extracting low-dimensional brain network features with strong topological representation capabilities. To address the issues of feature optimization and topological information capture and utilization in multimodal brain network analysis (rs-fMRI/DTI), this paper proposes a joint optimization framework. First, to mitigate feature distribution shifts and modal heterogeneity, we propose a SPAMS-based multimodal dictionary learning data enhancement strategy. By jointly optimizing functional connectivity brain networks and diffusion tensor brain networks, we construct shared sparse dictionaries to generate anatomically-functional consistent enhanced data, thereby improving inter-group similarity and feature quality. Second, to effectively capture complex topological information in brain networks, we introduce a Riemannian manifold-constrained loss autoencoder (RM-Loss AE). This model constructs feature space as a positive definite matrix manifold and incorporates optimization reconstruction losses such as the log-euclidean metric. Comprehensive experiments on the ADNI (Alzheimer's Disease) and ABIDE-II (Autism Spectrum Disorder) datasets demonstrate that our proposed method significantly enhances key metrics including feature separability (Fisher Score), classification performance (AUC), and the coupling strength between rs-fMRI and DTI modalities. This breakthrough establishes a novel paradigm for multimodal brain network representation learning, advancing its application in precision medicine.
  • YAN Ping, YANG Jielong, HUANG Daoyuan, ZHONG Shifeng
    Accepted: 2025-09-19
    Reinforcement learning faces the challenge of difficulty in designing reward functions in robot control, while imitation learning, although it avoids the problem of reward engineering, relies on high-cost expert motion data. To this end, the research proposes a zero-motion imitation learning framework for robotic arms based on Predictive-Collaborative Optimization. This method integrates model predictive control (MPC) with Bayesian correction of the maximum a posteriori (MAP), achieving precise control of the robotic arm through multi-step action sequence optimization, while eliminating the reliance on expert action data and manual reward design. The core of the framework is to utilize the rolling optimization mechanism of MPC, aiming to minimize multi-step state errors, dynamically adjust the action sequence, and enhance robustness against noise and prediction uncertainties. During this process, the MAP method is introduced into single-step optimization, where each action is corrected through prior distribution and likelihood, thereby enhancing the local rationality and efficiency of action optimization. Unlike traditional methods, this framework relies only on expert states rather than expert actions. It generates the target state through a prediction model, avoiding the difficulty of collecting expert action data and simultaneously overcoming the problem of accumulated prediction errors. The experimental results show that this method outperforms the existing baseline methods in various robotic arm simulation tasks, with an average return increase of approximately 45.8% and a prediction error reduction of approximately 50.7%. It demonstrates higher action execution accuracy and adaptability to complex environments, and has achieved stable control on a real robotic arm platform, verifying the potential for cross-platform engineering.
  • CHEN Ziliang, ZHONG Yuan, LI Ping
    Accepted: 2025-09-19
    Under the federated learning framework, participants collaborate to train global models by sharing model parameters instead of raw data, and this distributed training approach brings new security challenges while protecting data privacy. Because distributed local training is difficult to supervise, federated learning systems are more vulnerable to model poisoning attacks. Most existing model poisoning attack methods operate on all parameters of the model, and significant changes to the model can be detected more easily through statistical similarity checking. In order to further analyze the possible stealthy ways of this type of attack methods, a model poisoning attack method (FedMSP) for federated learning sensitive parameter perturbation is investigated. This method accurately identifies the sensitive parameters that have a significant impact on the model performance by analyzing the gradient change of the model parameters and applies perturbations to these sensitive parameters to improve the anti-detectability of locally-poisoned models and reduce the overall model performance. In addition, an attack mechanism based on distance and direction invariance is proposed. By keeping the distance and direction of the attack vectors invariant, this mechanism enables the attacker to effectively circumvent the existing defense mechanisms and significantly improves the success rate of the model poisoning attack. The experimental results show that, constructing the federal prediction model for Fashion-MNIST and CIFAR-100 datasets, when there is no defense condition, the attack method reduces the test accuracy of the model from the original 99.48% and 61.37% to 14.43% and 8.27%, respectively; after adding the defense mechanism, the accuracy of the model is rebounded to 15.75%, 10.87%, but still significantly lower than the normal level. In addition, FedMSP demonstrates optimal or near-optimal attack effects in multiple security aggregation algorithms, which fully proves its ability to reduce model performance and slow down convergence speed, and provides new perspectives and challenges for the security research of federated learning.
  • YAN Yan, WANG Long, KOU Xinyu
    Accepted: 2025-09-19
    Addressing the issues of low trajectory utility and inadequate privacy protection in existing trajectory privacy protection methods, this paper proposes a generative adversarial network-based trajectory privacy protection scheme utilizing Peephole LSTM. This scheme designs a generator model that integrates a peephole link mechanism, enabling each gate unit to adaptively adjust based on the real-time values of cell states, thereby more effectively perceiving contextual information and capturing dependencies within trajectory sequences; the discriminator uses a long short-term memory network to determine the authenticity of synthesized trajectories. Through adversarial training between the generator and discriminator, trajectory data that aligns with the original statistical features is generated, reducing the probability of attackers identifying users and thereby enhancing the privacy protection of user trajectory information. Given the multidimensional nature of trajectory generation tasks, a new trajectory loss function is designed to measure the similarity loss between synthetic and real trajectories in terms of spatial, temporal, and point-of-interest category dimensions. Experiments conducted on the real-world semantic trajectory dataset Foursquare NYC, including trajectory-user linking tasks, demonstrate that compared to models such as LSTM-TrajGAN and TCAC-GAN, the synthetic trajectories generated by this approach not only reduce the probability of re-identification but also better preserve the spatial, temporal, and POI category attribute features of the original trajectories. This effectively balances the privacy and utility of trajectory data, ensuring its effectiveness in spatio-temporal analysis and geospatial applications.
  • Li wei , Li xiaoling, Liu ziqiong, Huang ying
    Accepted: 2025-09-19
    The most critical aspect of dealing with constrained multi-objective optimization is how to balance the diversity and convergence of the algorithm while satisfying the constraints and minimizing the objective function. Existing constrained multi-objective optimization algorithms based on decomposition cannot make good use of infeasible solution information when facing problems with complex constraint fronts, and it is difficult to balance the convergence and diversity of populations. To address this problem, a constraint decomposition multi-objective evolutionary algorithm based on reinforcement learning dual population is proposed. The algorithm uses a reinforcement learning ε-based constraint adaptive strategy and a dual-population cooperative information learning strategy to help the population converge to the true constraint frontier. The former utilizes the Q-learning adaptive selection of ε-constraints method of reinforcement learning, which allows the population to determine the optimal ε-constraints method according to the real-time evolutionary state by introducing reinforcement learning into the adaptive selection of ε-constraints method, in order to enhance the global searching ability and enable the algorithm to better approximate the true front. The latter designs a dual-population cooperative information learning strategy to balance the convergence and diversity of the populations by guiding the algorithm to make full use of the information of infeasible solutions to find the true constraint front through the cooperative information exchange learning of the dual populations and different offspring generation and progeny selection strategies. Finally, the proposed algorithm is also compared with six state-of-the-art constrained multi-objective optimization algorithms in 33 test problems and applied to the real problem of four-bar truss for simulation experiments, and the experimental results show that the proposed algorithm has a better performance than the other algorithms in solving the theoretical and practical problems.
  • Lide Xue, Mingzheng Wang, Chuqiao Xiao, Hao Yang
    Accepted: 2025-09-19
    With the rapid penetration of distributed energy resources, smart grids face increasingly severe challenges in terms of energy transaction efficiency, practicality, and security. Existing blockchain solutions struggle to meet the requirements of resource-constrained grid edge devices due to excessive storage/computational overhead and scalability limitations, particularly under dynamic load scenarios. This paper proposes a lightweight and highly scalable blockchain system, E-chain, to optimize the performance of distributed energy trading systems, reduce node operational costs, and enhance grid resource utilization. E-chain introduces a spatiotemporal optimization-driven two-layer blockchain architecture that leverages the spatiotemporal aggregation characteristics of energy transactions. The system integrates innovative lightweight data structures on-chain and off-chain transaction verification mechanisms. This dual mechanism effectively alleviates the burden on the main chain while achieving system-level load balancing and high security for off-chain transactions, making it highly suitable for edge grid devices. Through formal analysis and large-scale prototype experiments, E-chain demonstrates grid resource utilization≥90% under 10,000-node scenarios, stable transaction confirmation latency within the 10-second magnitude, and node communication/computational costs decoupled from network scale and system runtime factors, maintaining near-constant levels. These results represent significant improvements over existing blockchain protocols for distributed energy trading. By decoupling spatiotemporal optimization, E-chain resolves the core contradiction between resource constraints of grid edge devices and system scalability, providing an innovative solution for constructing large-scale dynamic energy networks.
  • Zhou Jieqin, Feng Yixiong, Jin Kebing, Tang Jianhang, Wu Xuanyu, Xiao Xi, Tan Jianrong
    Accepted: 2025-09-18
    UAV edge-computing systems deploy UAVs as mobile edge servers for cost-effective, low-profile services, but uneven user geography and limited onboard resources make placement critical: misdeployment causes coverage holes, inflated cooperative-communication latency, and load imbalance. We pursue an optimal balance among coverage, communication quality, and energy efficiency by integrating dynamic collaborative offloading with hybrid intelligent algorithms that couple discrete placement with continuous offloading. Tasks are intelligently partitioned with dynamic offloading ratios for real-time load balancing. Subject to latency constraints, we jointly optimize deployment, cooperative offloading, and compute/communication allocation within a nonconvex mixed-integer framework. Placement is handled by a hybrid metaheuristic with adaptive mutation/crossover for faster convergence, while offloading/resource control uses an enhanced DDPG (DP-Hybrid) for coordinated decisions. Simulations demonstrate a superior energy–latency trade-off and substantial reductions in overall system cost versus state-of-the-art baselines.
  • Junhan Deng , Bin Wang, Zehua Zhang
    Accepted: 2025-09-18
    In complex intelligent decision-making tasks, domain annotation bias can lead to degradation of the quality of model training data, which in turn affects the generalization ability and decision-making performance of the system. Such bias usually stems from two reasons: (1) the sparsity of expert labeling data due to the scarcity of relevant expert resources, which leads to the performance limitation of traditional supervised learning methods, and (2) the heterogeneity of expert knowledge due to the different tendencies of experts (including the differences in professional backgrounds, the diversity of risk preferences, etc.) triggering decision-making conflicts. Existing studies have not yet effectively solved the uncertainty problems caused by the expert labeling sparsity problem, the expert multi-tendency problem, and the expert knowledge fusion conflict. To this end, this paper proposes a multi-expert multi-perspective approach (Decision Making with MoE, DM-MoE) for the domain labeling bias problem, which integrates the method theory of Mixture of Expert strategy (MoE,Mixture of Experts) and uncertainty reasoning to construct a collaborative decision-making framework. The method constructs a multi-intelligent body system through prompt engineering, so that LLMs (including DeepSeek, GPT-4, and Literally One Mind) construct cross-domain multi-experts for different domains through prompt engineering, and dynamically generate decision annotations according to the real-time tendency changes of the experts. And the dynamic three-way decision-making mechanism is used to model the multi-propensity expert decision-making information. Finally, a two-stage optimization strategy is designed to specify the multi-criteria weights for the uncertainties in the decision to-be-determined domain by AHP hierarchical analysis based on LLMs, and combined with the TOPSIS method for the iterative multi-criteria optimization. Experiments show that DM-MoE has superior accuracy and stability compared with traditional decision-making methods.
  • Xu Xiong, Yang Xinyu , Zhu Xuekang, Du Bo, Su Lei, Tong Bingkui, Lei Zeyu, Zhou Jizhe
    Accepted: 2025-09-18
    In the field of image manipulation localization (IML), the small quantity and poor quality of existing datasets consistently pose major issues. A dataset that contains various types of manipulations significantly improves the accuracy of IML models. Images available on public forums, such as those in online image modification communities, frequently undergo manipulation with diverse techniques. Researchers create a dataset from these images and greatly enhance the diversity of manipulation types in the data. However, due to resolution and clarity issues, images sourced from the internet often carry noise, which complicates efforts to obtain clean masks by simply subtracting the manipulated image from the original. This noise proves difficult to eliminate, and as a result, the masks remain unusable for IML models. Drawing inspiration from the field of change detection, researchers treat the original and manipulated images as temporal changes of the same image and approach the data generation task as a change detection challenge. Due to clarity differences between images, traditional change detection models perform poorly. To address this, researchers introduce a super-resolution module and propose the Manipulation Mask Manufacturer (MMM) framework, which enhances the resolution of both original and tampered images to facilitate better comparison. At the same time, the framework transforms the original and tampered images into feature embeddings and combines them, effectively capturing the context. Additionally, researchers employ the MMM framework to develop the Manipulation Mask Manufacturer Dataset (MMMD), which encompasses a broad spectrum of manipulation techniques. Through MMM and MMMD, researchers aim to contribute to the fields of image forensics and manipulation detection by supplying more realistic manipulation data.
  • LU Kaiwen, YANG Yating, DONG Rui, MA Bo, WANG Lei, ZHOU Xi, MA Rong
    Accepted: 2025-09-18
    Reinforcement learning methods based on direct preference optimization have shown excellent results in many downstream tasks of large language models. However, when applied directly to machine translation, this approach often leads to over-optimization problems due to the global reward maximization strategy. Specifically, it causes the model to overly focus on consistency with the distribution of reference translations, thereby losing the potential for local translation diversity and global optimization. To address the aforementioned issues, the problem of performance degradation of direct preference optimization methods in large language model machine translation was investigated. Based on this, a large language model machine translation method based on local preference optimization was proposed. This method identifies frequently mistranslated low-frequency phrases in translations through dynamic temperature sampling and reference-free evaluation of the large language model. Furthermore, a preference data construction method that combines global differences and local key differences is introduced. Considering both the overall translation quality of the model and the local translation diversity, global loss and local loss functions at the token level are proposed. Finally, a two-phase curriculum learning strategy is employed to gradually adjust the model's output preference for low-frequency phrases. The proposed method was validated on the FLORES-200 dataset, selecting fourteen multilingual translation tasks with complex morphologies for testing. The experimental results showed that the scores of the proposed method on XCOMET, COMET-22, and BLEU were 80.7, 89.9, and 30.2, respectively. By comparing with several strong baselines in multilingual machine translation, the proposed method outperformed the baseline models across all translation directions, confirming the effectiveness of the method.
  • Wang Xuguang, Liu Wangjie, Jiao Qiantian, Zhang Mi
    Accepted: 2025-09-16
    Reliable short-term prediction of photovoltaic power generation power is very important for the dispatch and safety of new energy power, and the planning and operation of energy storage systems. However, there is often a time-domain alignment bias between PV power and related meteorological factors, which makes it difficult for the prediction model to learn a stable quantitative relationship between future PV power and historical meteorological factors, which leads to the problem of low accuracy of PV power prediction. In this paper, the delay embedding model is used to describe the quantitative relationship between PV power and historical meteorological factors in the future, and the time-domain alignment bias between PV power and related meteorological factors is often described based on delay parameterization. The simulation and real data experimental results show that the correction of alignment bias can effectively improve the prediction accuracy.
  • Chen Ran, Han Jinyu, He Wenwen, Du Weiwen
    Accepted: 2025-09-16
    Cardiovascular diseases seriously threaten human health, and applying deep learning to ECG analysis can significantly improve diagnostic accuracy. However, existing ECG classification algorithms often lack effective modeling for multi-resolution temporal features and channel coordination. This paper proposes TCC-ResNeXt, a multi-scale temporal and channel-coordinated ECG classification algorithm. The method combines a Period-Adaptive Module (PAM) for extracting complex temporal features and an ECG-ACmix module for adaptively fusing multi-head attention with convolutional features across channels. Experiments on CPSC-2018, Chapman, and DS-COM datasets show that the proposed approach achieves superior performance, with average F1 scores of 0.798, 0.968, and 0.751, respectively, outperforming methods like MobileNetV3, MVMSNet, and EcgTransformer in AUC, Recall, and F1. These results confirm the effectiveness of TCC-ResNeXt for automated ECG classification and intelligent cardiovascular disease diagnosis.In addition, the framework demonstrates strong generalization and robustness across datasets. It provides a promising direction for practical clinical ECG analysis and real-world deployment.
  • Wang Zhiyuan, Zhang Wei, Guan Bingzheng, Yang Huili
    Accepted: 2025-09-16
    In the field of industrial production, taking the tire molding process as an example, this paper constructs a high-performance question-answering system for private data security in a low-cost environment. Existing RAG methods, such as Self-RAG, will increase computational complexity, and Corrective-RAG will lead to excessively long contexts, making them unsuitable for use in low-cost environments. A Multi-Agent Sequential Collaboration Graph Retrieval Augmented Generation (MSCG-RAG) method is proposed. Each agent performs a single task and uses structured data as contextual information, which avoids excessively long contexts and reduces the difficulty for large models to understand the context, ultimately realizing the question-answering service for the tire molding process. The performance of the MSCG-RAG method in terms of general RAG metrics, namely Context Relevance, Faithfulness, and Answer Correctness, is 75.0%, 75.8%, and 85.7% respectively. In the evaluation where high-performance large models are used as domain experts for scoring, the method scores 7.833, 7.826, and 8.301 respectively under the scoring of three large language models: DeepSeek-R1, Qwen-plus, and Qwen-turbo, all of which are higher than those of the Basic Graph RAG (BG-RAG) method and the Graph-Vector Hybrid RAG (GVH-RAG) method. The results of the ablation experiment show that link filtering has the greatest impact on context relevance; the loss of filtering ability will reduce the context relevance by 18.5 percentage points. The result correction part mainly affects the faithfulness of the generated results; the loss of result correction ability will reduce the faithfulness of the generated results by 12.6 percentage points. The results of the base model replacement experiment show that the MSCG-RAG method performs stably on different combinations of large models, with high practicability and feasibility.
  • Zhang Jiaqing, Ma Xiujuan, Ma Fuxiang, Zhou Bin, Yin Jun
    Accepted: 2025-09-09
    Addressing the limitations of traditional Graph Neural Networks (GNNs) in modeling higher-order relationships and multi-way interactions, this paper proposes a novel heterogeneous hypergraph recommendation model, termed HNSGCN, which integrates node similarity associations with a hypergraph attention mechanism. Within this framework, users are abstracted as hyperedges and items as nodes. Leveraging contextual semantic features of both users and items, the model constructs user-user and item-item similarity matrices utilizing cosine similarity and Jaccard similarity coefficients. This process effectively transforms the conventional dyadic interaction network into a heterogeneous hypernetwork. Building upon this hypergraph structure, the model incorporates hypergraph convolutional operations and a hierarchical attention mechanism. This enables the adaptive aggregation of structural information across different levels, thereby effectively capturing complex higher-order latent relationships between users and items and significantly enhancing recommendation accuracy. To rigorously validate the model's efficacy, comprehensive comparative experiments were conducted on two real-world datasets, Amazon and Yelp-1K. Comparisons against multiple state-of-the-art recommendation baselines demonstrate that the proposed HNSGCN model achieves significantly superior performance across all three evaluation metrics: Recall@K, Precision@K, and NDCG@K. Furthermore, ablation studies confirm that both the incorporation of node similarity associations and the multi-layer attention aggregation mechanism play crucial roles in driving the model's performance gains.
  • Huang Keming, Liu Miao
    Accepted: 2025-09-09
    Federated Learning (FL), as a distributed edge training framework, enables model training without centralizing clients' private data, thus offering significant advantages in terms of data privacy and security. However, in practical applications, clients not only face communication constraints but, more commonly, suffer from performance degradation due to inconsistent data distributions (Non-Independent and Identically Distributed, Non-IID). To address this challenge, this paper proposes a Multi-stream Feature-aware Network, FedMFP. Specifically, the method employs a dual-stream feature decoupling architecture to separately extract global features and fine-grained features from clients: The global stream network utilizes feature perturber/ compensator mechanisms to capture inter-sample correlations from a holistic perspective; The fine-grained stream network adopts a multi-stream architecture to extract personalized multi-scale information. Concurrently, distinct loss functions are designed to effectively decouple these two types of features, minimizing mutual interference between them. Extensive experimental results demonstrate that FedMFP achieves average test accuracy improvements of 13.27% and 14.41% compared to nine baseline algorithms on classic Non-IID datasets including Cifar100 and Tiny-ImageNet, significantly enhancing the model's generalization capability and robustness under Non-IID data distributions.
  • Yang Yu, Shijie Hu, Kangkang Fan, Wei Guo, Yazhou Hu, Dawei Zhang
    Accepted: 2025-09-09
    Traditional visual perception methods can only capture information about objects within the line of sight and are unable to detect objects obscured by obstacles in the scene. Non-line-of-sight (NLOS) methods, on the other hand, reconstruct information about these occluded objects by analyzing light or electromagnetic signals reflected or projected onto visible relay surfaces. However, after years of research, existing NLOS methods still face a significant challenge in capturing faint signal components that have undergone multiple reflections in outdoor environments. This poses considerable challenges for non-line-of-sight (NLOS) perception applications in complex, dynamic outdoor real-world scenarios. To address this, this paper proposes the use of cost-effective millimeter-wave radar to detect and track hidden targets in large-scale dynamic scenes. Such radar has been widely adopted in the automotive industry and supports low-cost mass production. After converting radar point clouds into pseudo-images, we apply the proposed two-stage attention network (TSAN) for the detection and tracking of hidden targets. Experiments show that the TSAN network model significantly improves detection performance across multiple categories under various Intersection over Union (IoU) thresholds, achieving a mean average precision (mAP) of 75.62%. Compared with existing results, the TSAN network yields a 5.99% improvement in mAP, outperforming current state-of-the-art methods. In addition, the prototype built based on the method described in this paper provides a low-cost solution for NLOS target detection and tracking systems, and verifies its effectiveness in achieving cost-effective real-time NLOS target detection and tracking.
  • LIU Tianquan, LU Cunyue, WANG Xiaolong, LUO Runshu
    Accepted: 2025-09-09
    Underwater image generation technology acts as a crucial solution for filling data gaps in marine exploration. The authenticity and diversity of generated images directly affecting the reliability of subsequent analytical studies. Existing models typically possess enormous parameter quantities, with prolonged training and inference processes; the generated underwater images suffer from insufficient clarity, and distortions exist in the structures and edges of image subjects; the inference process has not adequately considered the unique optical properties of underwater environments, so the authenticity of the generated images remains to be optimized. To resolve these issues, this paper proposes UW-ControlNet (Underwater ControlNet), a novel network built upon the ControlNet architecture performing parameter fine-tuning on a pretrained Stable Diffusion model. It combines structural constraints from conditional images with semantic guidance from textual prompts, achieving cross-modally controllable generation of underwater images. A lightweight feature extraction network is introduced to optimize the feature extraction process of conditional images, thereby enhancing the convergence speed and inference speed of the model. A correlation matrix-based channel attention model is designed to decouple and couple global channel features corresponding to the background with local channel features corresponding to the subject, optimize text-image multimodal alignment in the generation process, and enhance the credibility of the generation results. A Structure-Semantics Constraint Enhancement Module is constructed to prevent constraint information loss caused by the downsampling process, ensuring structure consistency between generated images and conditional images to be guaranteed. Experimental results confirm that UW-ControlNet surpasses existing methods in both quantitative metrics and qualitative evaluations, demonstrating significant application potential.
  • Wu Jiang, Li Ziqi, Zhang Yonghong
    Accepted: 2025-09-05
    The joint classification of hyperspectral imagery (HSI) and light detection and ranging (LiDAR) data can fully leverage their complementary advantages in spectral and spatial-structural information, and has become an important research focus in the field of remote sensing. However, due to significant differences in their imaging mechanisms, HSI and LiDAR exhibit a high degree of heterogeneity in terms of data dimensionality and feature distribution, which poses severe challenges for semantic representation and efficient fusion of multimodal data. To address these challenges, we propose a Multi-Scale Hybrid Convolution Mamba Network (MHCMNet) for joint HSI and LiDAR data classification. Specifically, the framework first employs a Multi-Scale Feature Extraction Module (MFEM) to extract spectral, spatial, and elevation features from the two modalities. Subsequently, the parallel Feature Tokenization Module (FTM) transforms the features of both modalities into unified feature tokens. To further enhance the collaborative representation of multimodal features, MHCMNet innovatively introduces a Mamba-based Feature Fusion Module (MFFM), which leverages its powerful long-range dependency modeling capability to achieve deep intra- and inter-modal feature interaction and efficient fusion. Experimental results demonstrate that MHCMNet achieves the highest overall accuracy (OA) of 99.03%, 90.71%, and 91.47% on the Trento, Houston2013, and MUUFL datasets, respectively, while maintaining low model complexity. In addition, ablation studies validate the effectiveness of each module in performance improvement, further confirming the superiority of the proposed method in multi-source remote sensing data classification.
  • ZHANG Wei, ZHENG Hao, ZHU Shiyi, XIAO Yimei, ZENG Xinyao
    Accepted: 2025-09-03
    Course recommendation is crucial for enhancing learners' learning efficiency and engagement, and modeling learners' learning sequences is a key part of course recommendation, because these sequences not only contain learners' dynamic learning interests, but also imply the evolutionary law of learning behaviors. However, existing methods focus on sequential relationships in sequences and fail to consider the impact of the time interval between courses on the dynamic evolution of learners' interests. In addition, most models characterize learners' behaviors with a single vector, failing to portray the dynamic evolution process of their multidimensional learning interests and the associations between different interests, resulting in biased interest modeling. To address these issues, this paper proposes a time interval-enhanced multi-interest dynamic evolution network for course recommendation(TIMIR). The method treats learners’ interaction history as sequences with different time intervals and designs a spatio-temporal dual self-attention mechanism to capture the persistence and transfer patterns of learners' interest dynamic evolution by designing a dual-path mechanism to differentiate the differential effects of long and short-term time intervals; generating multiple learner interest vectors by combining with the dynamic routing mechanism in the capsule network; and constructing a multi-interest dynamic evolution network, modeling the temporal evolution of learners' multi-interests and the associations between multiple interests, so as to improve the prediction accuracy of long-term learning behaviors and recommendation coverage in complex interest scenarios. Experimental results on the MOOCCourse dataset demonstrate that TIMIR outperforms other advanced recommendation models by 2.56% on HT@20 and 4.18% on NDCG@20; on the MOOCCube dataset, the two metrics outperform other advanced recommendation models by 1.27% and 1.71%, respectively, validating its effectiveness in enhancing recommendation accuracy.
  • Jin Kexin, Chen Donglin
    Accepted: 2025-09-03
    Existing models often fail to handle trend, seasonality, and nonlinear disturbances simultaneously in seasonal time series forecasting, leading to limited adaptability in complex scenarios. To address this issue, this study proposes a novel STL-ARIMA-Prophet-LSTM hybrid model. First, The method applies STL to decompose the original time series into trend, seasonal, and residual components. Then, different models are used for each component based on its characteristics: the ARIMA model captures linear trends in the trend component, the Prophet model handles periodic patterns and holiday effects in the seasonal component, and the LSTM model nonlinear variations in the residual. Finally, according to the strategy adopted during STL decomposition, the predicted results of each component are reconstructed to obtain the final forecasting outcome. The model is evaluated on three real-world datasets: financial reimbursement volume from universities, transaction volume from e-commerce platforms, and regional power load. Experimental results show that the hybrid model achieves superior or second-best forecasting performance compared with five baseline models across all datasets. Further ablation studies confirm that the integration of STL decomposition and multi-model collaboration significantly enhances both accuracy and robustness. The results demonstrate that the STL-ARIMA-Prophet-LSTM hybrid model provides excellent forecasting performance and holds strong potential for application in seasonal time series prediction tasks.
  • YU Xiaosheng , LI Sheng , LI Songpu
    Accepted: 2025-09-02
    】Noise interference and low resolution degrade feature expression, causing key detail loss and semantic information degradation, which limits model robustness and generalization in complex scenes. To address this problem, a visual language model-driven dual-branch anomaly detection network MSRA-CLIP (Multi scale and Residual Attention-CLIP) was constructed.First, two parallel branches are used to process the image. The upper branch designs a combined attention unit of multi-scale attention, which balances computational complexity and performance while improving the quality of image super-resolution. The lower branch uses a residual attention module that includes residual attention and skip connections. Through a large number of residual attention and skip connections, rich global and local features are captured, and then the image features processed by the two branches are spliced. Finally, the processed image features are mapped to the joint embedding space using an image-text multi-level alignment module and then compared with the text features to generate anomaly maps. Experiments on five medical anomaly detection datasets (Brain MRI, Liver CT, etc.)demonstrate MSRA-CLIP's superiority over MVFA, with average AUC improvements of 5% in zero-shot anomaly classification, 1.1% in anomaly segmentation, and 0.93% in few-shot classification.
  • KONG Yulong, LIN Suzhen, JIN Zanxia
    Accepted: 2025-09-02
    Video captioning aims to deeply analyze video content and accurately and fluently describe it in natural language. Concepts, corresponding to objects, actions, and attributes in video content, can serve as a medium for video captioning. Although some studies have explored concept-guided video captioning, two main issues remain, limited concept detection accuracy and insufficient concept utilization. To address these issues, this paper proposes a multimodal video captioning approach guided by global and local concepts (CGMVC) to improve the quality of generated descriptions. First it extracts multimodal features of videos using different backbone networks. It leverages HMMC model via hierarchical matching video-to-text retrieval to provide textual information from videos. Then, it uses multimodal feature fusion and concept detection network to precisely detect concepts. To fully utilize the detected concepts, concept projection module is employed to uncover the latent themes of videos to globally guide decoding, while semantic attention module and cross attention module are used to locally optimize decoding by leveraging concepts and multimodal features of videos. By fully utilizing concepts and information from different modalities, more natural and accurate descriptions are generated. Experiments on the MSVD and MSR-VTT datasets show that the CGMVC model achieves CIDEr scores of 111.2% and 64.1%, and BLEU@4 scores of 57.1% and 51.2%, respectively. Comparative and ablation studies demonstrate the superiority of the CGMVC method over baseline approaches and other state-of-the-art methods.
  • Qi Hui, Zhang SiQi, Shi Ying, Qi XiaoBo
    Accepted: 2025-09-02
    With the rapid development of socio-economics, resident happiness has emerged as a critical indicator for measuring social progress. Accurate prediction of resident happiness is essential for policy formulation and social resource allocation. However, existing methods exhibit systematic limitations in cross-group applicability and policy interpretability. To address these challenges, this paper proposes a Feature Interaction-Optimized Dynamic Weighted Ensemble Model (FIO-DWEM) for happiness prediction. First, a feature interaction optimization mechanism is constructed by generating second-order interaction features through polynomial expansion, combined with correlation filtering and Recursive Feature Elimination (RFE) to extract high-information features. Subsequently, randomized search integrated with leave-one-out cross-validation is employed for hyperparameter tuning of base models, dynamically adjusting their weights based on error ratios, and integrating probabilistic outputs through a soft voting mechanism. Experimental results demonstrate the superior performance of FIO-DWEM across multiple datasets: it achieves performance improvements of 0.54%–39.86% on the Somerville dataset and maintains cross-domain validation accuracy ranging from 89.57% to 98.89%. SHAP analysis reveals the impact mechanisms of key features (e.g., urban service information availability) on happiness, providing interpretable technical support for policy-making and individualized assessment.