Author Login Chief Editor Login Reviewer Login Editor Login Remote Office
Home Browse Just accepted

Just accepted

Please wait a minute...
  • Select all
    |
  • PANG Ruixiang, WAN Li, ZHANG Zhi, WU Lulu, ZHOU Youlong
    Accepted: 2026-01-09
    】To address the demand for multi-algorithm adaptability, a reconfigurable application-specific instruction set processor is designed to efficiently support block ciphers and hash functions. The architecture adopts a Very Long Instruction Word (VLIW) structure, combined with symmetric clustered execution units and a cross-cluster register access mechanism, enabling parallel processing of logic operations, shifts, and table lookups. In instruction set design, fused logic–lookup instructions, multi-mode shift instructions, and vector operations are introduced to reduce pipeline stalls and enhance instruction density. The pipeline is organized into three stages—fetch, decode, and execute—while a bypass mechanism is employed to resolve data hazards and shorten the critical path. In algorithm mapping and optimization, block ciphers such as SM4 and AES leverage T-box lookups and four-cluster parallel scheduling, reducing each round to 4 and 7 cycles, respectively; hash functions such as SHA-256 and SM3 utilize multi-mode shift and fused Boolean logic instructions, achieving 8 cycles per round; SHA-3 is mapped through a three-phase strategy that reorganizes its five steps into three pipelined stages, effectively mitigating dependency-induced stalls. For hardware implementation, synthesis is carried out on the Xilinx Kintex-7 FPGA (XC7K325TFFG676-2), consuming 11,105 look-up tables (LUTs), 1,564 flip-flops (FFs), and 25 block RAMs (BRAMs), operating at a frequency of 125 MHz. Under these conditions, the processor achieves throughputs of 125 Mbps for SM4, 228.6 Mbps for AES, 125 Mbps for SHA-256, 125 Mbps for SM3, and 75.6 Mbps for SHA-3. The experimental results demonstrate that this architecture achieves unified acceleration of multiple algorithms with low resource overhead, outperforming general-purpose processor extensions, while offering high flexibility and scalability.
  • Xunsheng Ji, Kaixuan Fu
    Accepted: 2026-01-09
    Time series forecasting has been widely applied in fields such as finance, meteorology, and transportation. In multi-resolution forecasting scenarios, the demand for predictions at different temporal granularities is increasingly prominent. Traditional deterministic models struggle to capture the uncertainty of future sequences, while existing generative models, such as variational autoencoders, often suffer from limitations in generation quality and modeling flexibility. To address these challenges, we propose a Covariate-Conditioned Diffusion Model for Multi-Resolution Time Series Forecasting (MrC⊃2;DM). The model takes historical time series and future covariates as conditional inputs, introduces a resolution-category embedding to control prediction granularity, and employs a diffusion-based generative mechanism to progressively denoise and reconstruct future sequences from noise, thereby enabling uncertainty modeling and high-quality forecasting of future dynamics. Experimental results show that MrC⊃2;DM outperforms the best deterministic baselines by 5.4% (MAE) and 13.5% (MSE), and surpasses the best generative models by 28.1% (CRPS) across seven public datasets. Moreover, MrC⊃2;DM maintains higher stability and generalization ability in cross-resolution forecasting tasks.
  • Yang Chunxia, Wang Yulong , Wang Xin'ao
    Accepted: 2026-01-06
    With the rapid advancement of urbanization and industrialization, air pollution has become an increasingly severe issue. Accurate prediction of the Air Quality Index (AQI) is of great significance for public health and environmental protection. However, existing spatiotemporal graph neural network-based methods for air quality prediction still exhibit notable limitations. On one hand, due to structural constraints, these models struggle to effectively capture the influence of other stations on target stations through complex spatiotemporal propagation paths over long-term historical data. On the other hand, current dynamic graph learning approaches primarily rely on short-term sequences, failing to extract more representative spatial dependency patterns from long-term observational data. To address these issues, this paper proposes a Spatiotemporal Context-Aware Graph Network (ST-CAGN). The model incorporates a long-sequence spatiotemporal context extraction module based on a pre-trained encoder, which encodes lengthy historical data into low-dimensional representations rich in semantic information and efficiently captures long-range spatiotemporal dependencies across stations. Additionally, a multi-scale dynamic graph learning mechanism based on long sequences is introduced to overcome the limitations of constructing dynamic graphs solely from short-term sequences. This mechanism extracts steady-state spatial dependency features from low-dimensional representations of long-term historical sequences and adaptively integrates them with transient spatial correlations captured from recent fluctuations, thereby more accurately modeling the complex dynamic spatial dependencies between stations. Experimental results demonstrate that ST-CAGN significantly outperforms mainstream baseline models on three real-world air quality datasets. For 6-hour, 12-hour, and 24-hour prediction tasks, the MAE decreased by an average of 4.19%, 5.47%, and 6.53%, respectively, while the RMSE was reduced by an average of 2.10%, 3.14%, and 3.95%, validating the effectiveness and superiority of the proposed model in long-sequence spatiotemporal forecasting tasks.
  • WANG Qijun , LIU Qingcheng , GU Yang , YU Yanheng
    Accepted: 2026-01-06
    Object detection in Unmanned Aerial Vehicle (UAV) imagery is severely challenged by tiny object sizes and strong background clutter, where traditional algorithms often suffer from feature degradation and information loss, leading to a decline in accuracy. To address these challenges, this paper proposes a tiny object detection algorithm based on hybrid dynamic reparameterization, termed HDR-YOLO. First, to overcome the limitations of conventional convolutions in extracting features from tiny objects, the C3K2-PC module is reconstructed by incorporating the Pinwheel-shaped Convolution (PConv), which significantly enhances the backbone network's ability to perceive and capture fine-grained details. Second, to tackle the problem of information degradation during multi-scale fusion, this work designs the Hybrid Dynamic Reparameterization Module (HDRep), which achieves high-fidelity multi-scale feature reconstruction through a combination of low-distortion scale transformation and deep feature refinement. Building upon this, a Multi-Scale Feature Fusion Neck (MSFPN) is introduced, which optimizes cross-scale information flow to effectively boost the model's robustness in complex backgrounds. Experimental results on the VisDrone-2019 dataset demonstrate that HDR-YOLO achieves an mAP@50 of 43.7% and an mAP@50:95 of 26.5%, outperforming the YOLOv11n baseline by 10.2% and 7.0%, respectively. Furthermore, experiments on the public AI-TOD dataset and a self-built HVL-Cond dataset validate the superior generalization and stability of the proposed algorithm.
  • ZHAI Jie, MENG Tian-xin, RUAN Tong, LIU Jing-Ping, LI Bin-bin
    Accepted: 2026-01-05
    Online "Light Consultation" decision trees are designed to provide patients with minor health issues guidance on appropriate departments, preliminary diagnoses, or treatment suggestions. However, constructing such decision trees based solely on medical literature fails to meet the diverse needs of patients in real-world light consultation scenarios and suffers from delays in reflecting the latest advances in specific disease areas. While medical experts can manually construct these trees, the process is inefficient and lacks standardized representation due to reliance on individual experience. To address these limitations, this paper proposes a novel task: generating decision trees from online light consultation dialogue texts (DTGOLC). For this task, we introduce two methods: a large language model-based approach for light consultation decision text summarization generation (LCDTSG-LLM), and a medical decision path fusion method for decision tree generation (MDPFDT). Our study constructs 5,547 decision paths and generates nearly 30 light consultation decision trees. Finally, these decision trees are subsequently integrated as an external knowledge base in a retrieval-augmented generation (RAG) framework. Experimental results demonstrate that the proposed decision trees significantly outperform baseline models in assisting lightweight diagnostic decision-making tasks, achieving an average improvement in F1-score of 27.58% compared to the baseline using original consultation dialogues as the knowledge base.
  • GUO Wei, FAN Zixi, QU Haicheng
    Accepted: 2026-01-05
    This paper proposes a time–frequency collaborative attention algorithm for insulator defect detection in UAV power inspection. The method is used to address the problems of high missed detection rate of small defects caused by large differences in target scales and low detection accuracy under complex backgrounds.First, a Wavelet Transform Convolution Module (WTCM) is integrated into the backbone network to enlarge the receptive field and enhance the extraction of low-frequency information. Building on this, a Multi-scale Convolutional Attention Augmentation Module (MCAAM) is designed. It combines channel and spatial attention mechanisms to further suppress interference from complex backgrounds. Second, a Frequency-domain Modulation Attention Mechanism (FMAM) is introduced to improve the model’s robustness in complex environments. This mechanism fuses frequency and spatial information, enabling the model to perceive image features more comprehensively and ensuring detection stability. ly, an Adaptive Weighted Feature Fusion (AWFF) module is designed. It dynamically adjusts feature fusion weights to enhance cross-dimensional feature interaction, which further improves the network's representation capability. Experimental results show that the proposed algorithm achieves 92.4% in mAP50, an improvement of 4.8% over the baseline model. The recall rate for small defects increases by 5.2%. The inference speed (FPS) increases from 112 to 132.Furthermore, the AP values for three defect categories—insulator damage, hammer, and flashover—improve by 7.6%, 1.7%, and 9.8%, respectively. Compared to the original YOLO11n model, the improved model demonstrates superior performance in both detection accuracy and inference efficiency.
  • XU Zhixia, WANG Rui, SHEN Xiaowei, HE Bing, KANG Weijie
    Accepted: 2026-01-05
    The networked radar jamming resource allocation is a typical NP-hard problem and also a significant challenge, requiring the use of various optimization algorithms to solve it. To address the issues of slow computational speed and poor adaptability in traditional jamming resource allocation optimization algorithms, progress of intelligent optimization algorithms in this field was reviewed. Firstly, a mathematical model and solution framework for networked radar jamming resource allocation were constructed, the difficulties in solving this model were analyzed, and the obvious advantages of intelligent optimization algorithms in terms of computational efficiency, global optimization capability and robustness were emphasized. Then, taking the genetic algorithm, particle swarm optimization, ant colony optimization, and the various improved algorithms as typical examples, the implementation processes, solution effectiveness, and strengths and weaknesses of intelligent optimization algorithms in networked radar jamming resource allocation were analyzed in detail. Additionally, the application of fusion algorithms and other bionic/machine learning-based intelligent optimization algorithms in this field was summarized, the advantages and disadvantages of various algorithms were compared and analyzed from aspects such as adaptability, convergence, and global search capability, fully demonstrating the current development status of intelligent optimization algorithms in this application direction. Finally, combined with the multiple challenges currently encountered in networked radar jamming resource allocation, future development directions of intelligent optimization algorithms were prospected from four aspects: algorithm comparison, optimization speed, fusion innovation and dynamic adaptability. It provides a valuable reference for the research and engineering practice of intelligent optimization algorithms in networked radar jamming resource allocation.
  • Hu jing, Zhao xinyu, Peng mingchao
    Accepted: 2026-01-05
    Cross-modal image-text retrieval, a core task in multimodal understanding, faces inherent heterogeneity between images and texts in modal expression, semantic abstraction levels, and structural organization. How to achieve high-precision semantic alignment and bridge the cross-modal gap is a key challenge in current research. To address this, this paper proposes DPNet, an image-text retrieval model based on cross-domain feature decoupling and semantic prototype guidance, aiming to enhance fine-grained image-text matching and retrieval robustness in complex scenarios.The model is designed with frequency-spatial joint decoupling, hierarchical semantic enhancement, and dual-modal interactive attention mechanisms, realizing the structured reconstruction of cross-modal features and the enhancement of discriminative expression. To tackle the modeling flaw that traditional methods struggle to balance spatial structure and frequency-domain texture modeling, the proposed frequency-spatial decoupling module adopts a heterogeneous multi-head attention mechanism. It preserves local spatial semantics while mining global periodic patterns, achieving multi-dimensional collaborative expression of visual features. To compensate for the imbalance between local vocabulary and global semantic alignment, the semantic enhancement module integrates part-of-speech tagging and depthwise separable convolution, guiding the model to focus on key semantic regions and improving its ability to model semantic patterns like factual descriptions and subjective evaluations.Additionally, to address imbalanced training samples and noise sensitivity, the proposed dynamic boundary triplet loss adaptively adjusts the similarity discrimination boundary. Combined with semantic prototype contrastive learning, it further enhances intra-class compactness and inter-class separability. Experimental results on Flickr30K and MSCOCO show that the proposed method achieves 1.0%, 0.1%, 0.2% and 1.4%, 0.6%, 0.3% improvements in R@1, R@5, R@10 metrics respectively on MSCOCO for fine-grained image-text retrieval, significantly outperforming existing state-of-the-art methods. This study provides an efficient and feasible solution for high-precision and real-time retrieval in complex cross-modal scenarios.
  • Ouyang Ling, Li Hui, Lan Ju Long, Wu Jiang Xing
    Accepted: 2026-01-05
    The Dynamic Heterogeneous Redundancy Architecture (DHR) uses multi-dimensional dynamic reconfiguration to achieve heterogeneity and redundancy of executors, and closed-loop iteration based on policy adjudication realizes dynamic update of the system, giving the system inherent security genes and making it have natural proactive defense capabilities. However, DHR usually requires large heterogeneity of the executor to avoid attack and escape caused by shared vulnerabilities. However, the differences caused by heterogeneity can lead to inconsistent application state transitions and inconsistent encryption output of the executor, resulting in the problem that the output results cannot be adjudicated. Aiming at the above problems, this paper proposes a hidden leader distributed consensus algorithm based on distributed consensus theory. The algorithm adopts a relative time-based program process synchronization method to solve the problem of out-of-step running state of heterogeneous executors, and adopts a secret source normalization strategy to solve the problems of data encryption and random number differences in messages of heterogeneous executors. The operating mechanism of the algorithm is introduced in detail and the algorithm flow is given. Finally, a verification platform is built to compare and test the effectiveness of the algorithm. Test results show that in complex process scheduling scenarios, compared with existing algorithms, this method can improve the process synchronization success rates by 0.82% and 5.65% respectively, and can achieve correct adjudication of encrypted data. Compared with the ciphertext adjudication method based on encryption and decryption, the throughput can be improved by approximately 68.38%.
  • Xu Chongcong, Zhou Zhifeng
    Accepted: 2026-01-05
    The diagnosis of scoliosis relies on the precise measurement of the Cobb Angle. Traditional manual measurement has problems such as strong subjectivity, low efficiency and poor consistency, which is difficult to meet the clinical standardization and efficiency requirements. To solve this problem, this study proposes an automatic measurement method for spinal Cobb angles based on geometric constraint hybrid attention SwinUNet (GHA-SwinUNet). This method is based on the U-Net architecture, introduces the Swin Transformer module to enhance the ability of global structural modeling, combines the Hybrid Local Channel Attention (MLCA) to improve the perception of local details of the vertebral body, and designs a geometric constraint post-processing strategy to solve the problem of vertebral adhesion. In the Cobb Angle calculation stage, the end plate straight line fitting method is adopted to avoid the geometric deviation of the traditional midpoint method. The experimental results show that this method has excellent segmentation performance on the self-built spinal X-ray dataset: The Dice similarity coefficient (DSC) reached 0.9483, the Precision was 0.9504, and the average cross-union ratio (mIoU) was 0.9483, which was 1.11% higher than that of the traditional U-Net DSC and 0.27% higher than that of the MA-Net DSC. Meanwhile, in the cross-validation of the Synapse and AASCE2019 public datasets, the model maintained stable performance (DSC values were 0.9512 and 0.9425, respectively). The consistency correlation coefficient (ICC) between the automatic measurement and manual measurement of Cobb angles is greater than 0.90, and the mean absolute deviation (MAD) is approximately 3°, indicating good consistency. In conclusion, this method not only ensures the accuracy of segmentation and measurement but also takes into account efficiency. Moreover, it has strong generalization ability of multi-source images, providing reliable technical support for the quantitative assessment and clinical auxiliary diagnosis of scoliosis.
  • LAN Chenxi, SHEN Zongliang, FENG Jianzhou, ZHANG Hua
    Accepted: 2025-12-30
    Large Language Models exhibit powerful in-context learning and text generation capabilities, showing significant potential in tasks such as information retrieval and presentation writing. However, their ability is often insufficient when dealing with tasks that demand high timeliness, truthfulness, and specificity/format requirements, such as generating formatted documents in specific domains, where effective methods are still lacking. Consequently, it is necessary to integrate both agent technology and model fine-tuning techniques. This paper proposes a formatted document generation method that combines an LLM-based Agent architecture with Large Language Model fine-tuning. The LLM-based Agent architecture is utilized to acquire and verify real-time news information, which in turn is used to construct a domain-specific LLM fine-tuning dataset. Subsequently, fine-tuning techniques are employed to enhance the model's ability to generate style-compliant (normative) text. The method was tested, optimized, and validated using datasets from different domains. Experimental results demonstrate that the proposed method outperforms baseline approaches across evaluation metrics such as semantic similarity and text similarity. This indicates that the proposed method effectively strengthens the model's understanding and text generation capabilities for specific domains, and provides reliable guarantees for the timeliness and truthfulness of the generated text.
  • GAO Liulong, HUANG Zhengkun, JIANG Xiaowei, SUN Gongxing, LI Jiafeng
    Accepted: 2025-12-30
    In recent years, deep learning has achieved tremendous success in application fields such as computer vision and natural language processing. This has led researchers in high-energy physics to also turn their attention to deep learning technologies and explore their application in hadronic jet tagging tasks. Initially, researchers converted jet data into image and sequence data, and used convolutional neural networks and recurrent neural networks to tag jets. However, these approaches suffered from problems such as low computational efficiency and poor interpretability. To address these issues, researchers have made improvements to network architectures from multiple perspectives and conducted training on various constructed jet tagging datasets, thereby enhancing the classification performance of the models. This paper provides an in-depth analytical review of the key modules of new network models, including methods for representing jets based on sets, the application of equivariant neural networks, and the exploration of jet foundation models. Meanwhile, the paper analyzes and compares various tagging classifiers, evaluates the performance of different network architectures, analyzes and summarizes the current status of relevant models, and discusses the application prospects of deep learning models in jet tagging tasks.
  • Zhen Han , Li Yu
    Accepted: 2025-12-30
    Small object detection in remote sensing images faces challenges because of weak feature representation, complex background interference, and multi-scale variations. These challenges are more severe in resource-constrained environments, where both detection accuracy and model efficiency are required. This paper proposes an efficient detection framework named Multi-Scale Spatial Attention YOLO (MSSA-YOLO). The framework uses three lightweight modules to improve performance. The Hierarchical Feature Block (HFBlock) enhances small object features by dynamic scale selection and dual-axis multi-scale convolution. The Lightweight Downsampling Module (LDSample) applies efficient downsampling with residual connections to retain critical information. The Focal-WIoU Loss refines bounding box regression by adaptive weighting and gradient suppression. Experiments are conducted on three public datasets, VEDAI, VisDrone2019, and AI-TD. MSSA-YOLO achieves mAP50 values of 0.754, 0.436, and 0.519. Compared with YOLOv11s, the parameter count is reduced by 8.9%, while detection accuracy improves by 7%, 4.4%, and 18.5%. The framework also outperforms advanced models such as SP-YOLOv8s and SMN-YOLO. The results show that MSSA-YOLO achieves a balanced trade-off between accuracy and efficiency. The method is suitable for real-time small object detection and generalizes well to objects of different scales in remote sensing scenarios.
  • XIAO Xiang, ZHONG Yongyan, YAN Wen, PAN Wenyi
    Accepted: 2025-12-30
    Dense pedestrian detection constitutes a critical component in smart city systems for crowd monitoring and behavioral analysis. To address limitations in existing models, such as low accuracy in small-object detection, excessive parameter size, and deployment constraints, this paper proposes DRS-YOLO—an improved lightweight dense pedestrian detection algorithm based on YOLO11. A DualConv module is introduced into the neck network of YOLO11 to replace the standard convolution structure, enhancing cross-scale feature fusion and spatial modeling capabilities while mitigating the insufficient extraction of contextual information by traditional convolutions in dense scenes. This modification reduces computational redundancy while improving detection accuracy. Additionally, an RSBlock is designed to strengthen semantic feature reconstruction and global information modeling, thereby enhancing model robustness and generalization under complex occlusion scenarios while effectively reducing parameter count. A SASP module is constructed to alleviate the loss of small-object details during downsampling, reinforcing the model's focus perception and contextual understanding of small targets. Experimental results demonstrate that the improved algorithm achieves increases of 1.8%, 2.7%, 1.4%, and 0.6% in Precision, Recall, mAP50, and mAP50:95 respectively on the WiderPerson dataset; 1.7%, 1.7%, 1.2%, and 0.8% on CrowdHuman; and 2.1%, 1.0%, 1.0%, and 0.5% on BDD100K, while the model size is reduced to 4.9 MB. Deployed on an RK3588-based embedded device, the algorithm achieves an average inference time of 61.4 ms per frame with an mAP50 of 80.3%, indicating an optimal balance between lightweight design, detection accuracy, and real-time performance.
  • Le Chen, Zhongliang Xiao, Jia Chen, Lihua Chen, Xiaolei Chen , Peng Wang , Wei Wang
    Accepted: 2025-12-30
    Text-to-SQL technology aims to lower the barrier to database querying, enabling non-technical users to interact with databases through natural language. However, existing approaches face two major challenges: first, large language models have limited capability in generating complex SQL queries; second, in real-world production environments, databases are often large-scale, and directly inputting the complete database structure leads to excessively long prompts, increased computational costs, and reduced generation accuracy. The simplicity of traditional benchmark datasets compared with the complexity of real-world scenarios further exacerbates this issue. To address these problems, this study proposes a Text-to-SQL method based on hierarchical entity indexing. The core idea is to enhance retrieval-augmented generation by dynamically filtering database information relevant to user queries, thereby enriching the contextual knowledge provided in prompts. Experiments conducted on open-source datasets and production data verify the effectiveness of the proposed approach. The results show that the SQL generation accuracy of this method is only 0.4% lower than the top-ranked (undisclosed) approach on the Spider leaderboard, while outperforming the second-ranked method by 4.2%, demonstrating its effectiveness. Future research directions include refining entity partitioning strategies and optimizing the index architecture to support real-time retrieval in ultra-large-scale databases. This work provides an efficient and scalable solution for practical Text-to-SQL systems.
  • Li Xiang , Yu Xinsheng , Yu Weidong , Quan Shuilong, Wu Yue, Meng Xuanzhe
    Accepted: 2025-12-30
    To reduce the cost of application system mimic transformation and the complexity of data maintenance, accessing shared data services through mimic data middleware is the optimal solution that combines generality and performance for multi heterogeneous business executors. This article is based on NETTY to conduct research on mimic data middleware. The overall idea of "many to one requests; one to many replies" is adopted to normalize and forward data access requests from multiple executors, and to distribute and return data service responses in multiple ways. Through security consensus analysis based on multi-mode consistency arbitration in the process of request normalization and response multiple distribution and return, secure and trustworthy data is generated for response, enhancing the security of data access. The underlying architecture of the mimic data middleware is designed in Java and has cross platform features. It supports MYSQL and MQTT access protocols, providing important engineering design support for the mimic construction of application systems.
  • LUO Guang, SUN Liping, WANG Saiqi, WANG Liguo, DING Wei
    Accepted: 2025-12-24
    Multimodal recommendation aims to enhance item representation by introducing multimodal content features such as visual and textual information, to effectively alleviate data sparsity and cold start problems while more accurately capturing user interests. However, existing methods mostly rely on hypergraph propagation mechanisms based on ID embedding, often failing to adequately exploit the rich semantic information in multimodal features. To address the above issues, this paper proposes a Semantic Enhanced Multimodal Hypergraph Recommendation Model. First, the model constructs a user-item interaction view and an item-item semantic view, and utilizes Graph Convolutional Networks to extract high-order collaborative signals from behavioral data and to uncover deep semantic relationships between items based on multimodal content, respectively. Secondly, the model designs a modality-aware fusion module to dynamically aggregate the multimodal representations of users and items, balancing the contributions of different modalities. Subsequently, a user-user and item-item hypergraphs is constructed to explicitly model the group interest preferences of user and the high-order semantic relationships between items. Finally, to enhance the mutual information between multimodal and behavioral features, the model introduces a collaborative contrastive learning mechanism and designs two auxiliary contrastive tasks: modality alignment loss aims to ensure the consistency between ID embeddings and multimodal semantics, and neighbor aggregation loss enhances the local robustness of the interaction structure, thus achieving global semantic alignment and local structure preservation in a collaborative manner. Experiments result on three real-world datasets, namely Tiktok, Sports, and Clothing, show that the proposed model improves the Recall@20 metric by 1.32%, 5.99%, and 6.58%, respectively, compared to the best baseline models, and the NDCG@20 metric by 5.69%, 2.00%, and 7.61%, respectively.
  • ZENG Bohan, HU Zhiyong, ZHANG Chen, ZHANG Zhaoxiang, XU Yuelei
    Accepted: 2025-12-24
    The high-precision coupling and docking technology between tanker and receiver aircraft plays a critical role in tasks such as collaborative formation and drone recovery. Traditional manual docking methods face challenges such as high complexity, low success rates, and poor reliability, while the precise recognition of the drogue is a key factor for the future of intelligent docking technologies. Existing drogue detection methods mostly rely on anchor box or point initialization and post-processing techniques such as non-maximum suppression, which leads to issues of low efficiency, high debugging costs, and poor robustness when deployed on embedded platforms. While DETR-based detectors offer end-to-end capabilities, they suffer from slow inference and weak performance with small targets. This paper proposes Drogue-DETR, a real-time drogue detector for embedded platforms. It introduces an adaptive region selection attention mechanism to reduce computational complexity, enhances feature extraction efficiency, and incorporates a frequency-domain filtering module to retain small target edges while suppressing background noise. Additionally, a multi-scale attention aggregation module improves contextual understanding and boosts detection robustness. Experimental results on a custom drogue dataset and VisDrone dataset show that Drogue-DETR outperforms existing methods, meeting the requirements for embedded airborne devices.
  • WAN Yuhao, ZHANG Xin, YAN Yilun, WANG Zhenzhong, SHEN Xi, ZHANG Ya, LIU Shan
    Accepted: 2025-12-24
    In the production process of granular traditional Chinese medicine solid dosage forms, the active ingredients mainly exist in granular and powdered states. Particle size, as a critical quality indicator, directly influences the solubility and bioavailability of traditional Chinese medicines, and plays an essential role in subsequent formulation processes, product quality control, and ensuring medication safety. To address the problems of missed detection and low accuracy in the analysis of Chinese medicine powder particles, this study proposes an intelligent detection system based on improved YOLOv11 and active learning. YOLOv11 is selected as the benchmark model considering real-time performance and computational resources. By introducing Space-to-Depth Non-stride Convolution (SPD-Conv) and an attention mechanism, the Cross-Subblock Multi-Kernel Attention (CSMKA) module is designed to replace the traditional strided convolution, thereby enhancing the feature learning ability for small particles. The improved model is employed to conduct reverse evaluation of the training set, where, based on the idea of active learning, sample images with labeling deviations are automatically identified and handed over to experts for fine correction, thus improving training data quality and model generalization performance. After particle detection, a linear regression model is constructed to predict the weight percentage of particles, enabling accurate assessment of weight characteristics. Experimental results show that after introducing the CSMKA module, the mAP@0.5 reaches 72.8%, representing an improvement of 3.0 percentage points over the original YOLOv11 After incorporating active learning optimization, the performance further improves to 75.0%. The relative error of the particle weight percentage prediction model is controlled at 12.6%. This study provides a comprehensive system that integrates particle detection, data annotation optimization driven by active learning, and particle weight percentage prediction for traditional Chinese medicine powder, offering efficient and reliable technical support for quality control.
  • GUO Yanan, HE Chaoqun, CHANG Ying, ZHANG Benkui, HE Kangjian, CAO Lin
    Accepted: 2025-12-19
    Recent breakthroughs in 3D Gaussian Splatting (3DGS) revolutionized novel view synthesis, accelerating its adoption in medical services. However, under sparse-view conditions, 3DGS tends to overfit training views and learn incorrect scene geometry owing to insufficient constraints. To address the above limitation, this paper proposes a novel view synthesis method under sparse inputs (GMMSplat). This method corrects scene representation through prior-guided depth regularization and photometric constraint based on fine-grained local cropping. Specifically, for training views, GMMSplat employs a Gaussian Mixture Model to dynamically adjust confidence thresholds based on monocular depth confidence maps. Depths with confidence below the threshold are discarded, ensuring only high-confidence depth data constrains rendered depth. This effectively mitigates geometric collapse caused by depth errors. Furthermore, to alleviate overfitting, GMMSplat generates warped images from virtual viewpoints derived from training views. A local-crop strategy is applied to these warped images, with higher weights assigned to center-cropped regions based on image quality. This strategically guides scene appearance reconstruction. Evaluations on LLFF、Mip-NeRF360 and ZED2 datasets show GMMSplat surpasses state-of-the-art (SOTA) performance on key metrics, significantly enhancing few-shot novel view synthesis quality. Specifically, on the LLFF dataset (at 1/8 resolution), the PSNR increases by 3.75%, the inference speed improves by 14.52%, and the storage size decreases by 49%.
  • BAI Liang, WANG Kun, WANG Shiyu, HAN Yong, CHEN Ao, QI Yibo
    Accepted: 2025-12-19
    To solve the problem of low pose estimation accuracy caused by the lack of texture information on the surface of workpieces in industrial application scenarios, a pose estimation method for weakly textured workpieces based on RGB images is proposed. Firstly, an improved ResNeXt feature extraction network was utilized to obtain the feature information of workpieces. Dense connections were employed between convolutional blocks to reduce the loss of feature information during the transfer process. Grouped convolutional residual blocks were introduced to enhance the model's perception ability of multi channel spatial features, and an attention module was added before the residual connection to learn the weights of each channel and locate key regions. Then, the pose estimation problem was transformed, and a cascaded convolutional pose estimation network was used to obtain the pixel positions of key points and the directional vector field. Finally, the perspective projection transformation algorithm was used to solve the pose of the workpiece. To verify the effectiveness of the proposed method, a synthetic dataset containing 20 types of backgrounds and 20,000 images was constructed, covering scenarios with different occlusion levels, illumination conditions, and observation distances. Ablation experiments show that the ADD pass rate of the proposed method is increased by 27.2% to 88.5%, with a parameter count of 70.1M and an inference speed of 1.47 F/S. On the YCB-Video dataset, the proposed method achieves 89.2%, 95.6%, and 94.2% in the three metrics of ADD(-S), AUC of ADD-S, and AUC of ADD(-S), respectively. On the Linemod Occlusion dataset, the average ADD(-S) metric is 88.7%, which is significantly higher than that of mainstream models such as DOPE and RePose. Experimental results demonstrate that the proposed method exhibits superior pose estimation accuracy and generalization ability in complex environments such as weak texture, occlusion, and illumination changes.
  • FENG Guang, SU Xu, LIN Yibao, ZHAO Zhiwen, HUANG Junhui, SUN Xiangli, LIAO Beirong
    Accepted: 2025-12-15
    】Multimodal sentiment analysis leverages the complementary information of speech, text, and visual modalities to enhance emotion recognition accuracy and robustness. However, existing approaches still face three major challenges: (1) the lack of unified modeling for multi-scale emotional dynamics across fast and slow temporal rhythms; (2) the difficulty in explicitly characterizing semantic dominance and subordination among modalities; and (3) the limited ability to adaptively regulate modality intensity and information contribution. To address these issues, this paper proposes a multimodal sentiment analysis framework that integrates multi-scale encoding with a polarity-aware fusion mechanism. Specifically, a Multi-Scale Mamba encoder (MS-Mamba) is introduced for visual and audio modalities to jointly capture global and local temporal dependencies; a Polarity-Aware Fusion (PAF) module is designed to explicitly model inter-modal enhancement and suppression through semantic residuals and signed weights; and a Polarity-Driven Gating (PDG) mechanism is developed to adaptively control information flow via a saliency–direction disentanglement strategy. These components collaboratively form a closed-loop structure of “temporal modeling–polarity alignment–global gating.” Experimental results on the CMU-MOSI and CMU-MOSEI datasets demonstrate that the proposed model achieves binary classification accuracies of 86.58% and 86.50%, with F1 scores of 86.59% and 86.26%, respectively—yielding an average improvement of approximately 1.3% over mainstream baselines. The results validate the effectiveness and robustness of the proposed method in semantic alignment, temporal modeling, and adaptive fusion.
  • ZHAO Yingying, ZHU Shuaishuai
    Accepted: 2025-12-15
    】Knowledge graphs, as a structured semantic knowledge representation with entities as nodes and relationships as edges, can accurately depict various things in the real world and their complex associations, and have become a core supporting technology across multiple domains, including artificial intelligence, natural language processing, recommendation systems, and intelligent question-answering, providing an important foundation for machines to understand semantics and achieve cognitive intelligence. First, this paper expounds the basic concepts and system architecture of the knowledge graph, clarifies the knowledge representation unit with the “entity-relationship-attribute” triple as the core, and analyzes the applicable scenarios and technical characteristics of both top-down and bottom-up construction approaches. Secondly, this paper focuses on analyzing the technical evolution of three core links in the knowledge graph construction process, namely information extraction, knowledge fusion, and knowledge reasoning, systematically combs the technical development context, and compares the advantages and limitations of different methods. Thirdly, through in-depth analysis of the differences in technical route selection between DBpedia and Baidu two typical knowledge graph, the theoretical method is combined with the actual knowledge graph construction scenario. Finally, the challenges faced by the current knowledge graph construction in terms of data quality, semantic consistency, and dynamic evolution are summarized, and future research directions are looked forward, aiming to provide comprehensive guidance for both theoretical research and practical applications in knowledge graph construction, thereby advancing technological development in this field.
  • HUANG Jiahui, XU Ming
    Accepted: 2025-12-15
    Federated learning, as a distributed learning approach that does not require centralizing raw data, demonstrates significant potential in collaborative sensing and decision-making for underwater autonomous vehicle fleets. However, challenges in underwater communication environments, such as severe acoustic channel fading and limited communication bandwidth, cause traditional federated learning methods to suffer from reduced aggregation accuracy and excessive energy consumption, making them inadequate for long-term tasks and battery-powered devices. To address these issues, this paper proposes an IRS-assisted underwater federated learning joint optimization framework (IRS-JOFL). By introducing intelligent reflecting surfaces (IRS) and aerial computation mechanisms, the framework enhances uplink quality and improves gradient aggregation efficiency, while jointly optimizing device selection and power control strategies. This approach ensures model accuracy while significantly reducing communication energy consumption. Experimental results show that, on the Fashion-MNIST dataset, IRS-JOFL achieves an accuracy of 86.73%, which is an improvement of about 5.4% and 3.6% compared to traditional FedAvg and Air-FL without IRS, respectively, while reducing total energy consumption by approximately 16.3% and 14.1%. On the Fish dataset, the proposed scheme achieves a final Top-1 accuracy of approximately 86.6% and maintains the lowest energy consumption when reaching the 80% accuracy threshold.
  • LI Hui, LIU Jiayu, XU Yaping
    Accepted: 2025-12-15
    Medical image segmentation enables pixel-level localization of lesions or anatomical structures in multimodal imaging data, serving as a key foundation for computer-aided diagnosis and clinical decision-making. Addressing the rapid evolution of medical image segmentation network architectures and the inherent limitations (semantic ambiguity, statistical instability) of existing evaluation metrics, this paper aims to systematically analyze the relationship among network structure, task characteristics, and evaluation metrics, revealing the method development path and performance boundaries, and establishing a Structure-Metric matching mechanism tailored for practical clinical needs. Based on representative literature from the Web of Science Core Collection between 2020 and 2025, this study first reviews the design mechanisms and evolutionary pathways of core architectures such as Transformer, graph neural networks (GNNs), and diffusion models, and further summarizes the essential characteristics of lightweight, hybrid, and prompt-guided paradigms. Subsequently, by integrating empirical studies on public datasets, a quantitative comparison is conducted across different architectures in typical segmentation tasks involving organs, tumors, and brain tissues, covering common metrics such as DSC, HD95. The results indicate that HD95 exhibits high variability in boundary-complex tasks, DSC shows limited sensitivity to small targets, and IoU presents insufficient structural discrimination capability. Furthermore, this study reveals the statistical causes underlying metric misapplication and task–metric mismatch, constructs a task–structure-to-metric recommendation mapping, proposes a task-granularity-based metric selection strategy, and explores how dynamic networks, self-supervised learning, and cross-modal modeling contribute to enhancing model generalization.
  • ZHENG LEYU, LI KE, REN YI, ZHANG LEI
    Accepted: 2025-12-12
    Evolutionary algorithms have demonstrated strong performance in solving constrained multi-objective optimization problems (CMOP). However, when the unconstrained Pareto front (UPF) and constrained Pareto front (CPF) do not intersect and are far apart, the evolutionary process often lacks effective differentiation. This leads to the transfer of negative individuals and a lack of diverse feasible solutions, which can hinder population convergence and overall optimization performance. To address these issues, this paper proposes a Problem-Type Guided Dynamic Knowledge Transfer Cooperative Evolutionary Algorithm (DKTCEA), which includes two phases: independent exploration and cooperative evolution. In the independent exploration phase, the main task leverages prior knowledge from the auxiliary task to navigate infeasible regions, identify the problem type, and design a differentiated evolutionary strategy for guiding population evolution in the next phase. In the cooperative evolution phase, the auxiliary task introduces an improved ε-constraint handling mechanism to enhance solution feasibility. Furthermore, an improved knowledge transfer strategy is employed to select individuals from the source task to transfer to the target task. This minimizes the transfer of negative individuals, improving population quality and enhancing the global convergence of the main task population. Compared to five state-of-the-art constrained multi-objective optimization algorithms, DKTCEA achieved 14 and 11 optimal results in Inverted Generational Distance (IGD) and Hypervolume (HV) across 23 problems in the MW and DOC test sets, respectively. Ablation experiments further validate the effectiveness of the proposed strategies.
  • TANG Na, LI Hao, LI Jing-Jing, CHEN Wei-Qi, TANG Yong
    Accepted: 2025-12-12
    With the development of mobile terminal positioning technology, the scale of trajectory data has increased dramatically. The storage and rapid query of massive trajectory data have become research hotspots. The distributed framework can provide efficient data processing capabilities. This paper first proposes a local trajectory index TRindex, which effectively preserves the proximity of temporal and spatial data and supports spatiotemporal queries. This paper also designs a multi-layer range circle mapping method in TRindex, maps the spatial minimum bounding rectangle (MBR) to a one-dimensional axis, establishes an order of the distance from the trajectory to the center of the range circle, and builds a spatial range tree based on this order. This design ensures spatial proximity, providing spatial proximity for range queries. It also forms an ordered relationship between the distances from trajectories to reference points, enabling efficient pruning of K-nearest neighbor queries and effectively reducing the problem of duplicate calculations in K-nearest neighbor queries. Finally, based on TRindex, this paper constructs a distributed trajectory index (DTRindex), which consists of three main components: data partitioning, local indexing, and global indexing. The global index is a modified R*-tree with a Bloom filter applied to each node, effectively improving query efficiency.The DTRindex effectively supports three spatiotemporal query algorithms: spatiotemporal range queries, K-nearest neighbor queries, and mobile object trajectory queries. Finally, the Hadoop-based distributed trajectory index HadoopTrajectory, the single-machine index PM-tree, and the NoSQL-based distributed trajectory index TMan were selected as experimental counterparts for comparison. Through simulation experiments, DTRindex has been demonstrated to exhibit superior performance across multiple metrics: in spatio-temporal range query efficiency, it achieves average improvements of approximately 57%, 74%, and 25% compared to HadoopTrajectory, PM-tree, and TMan respectively; For k-nearest neighbour queries, performance improved by 40%, 48%, and 20% on average; for mobile object trajectory queries, efficiency increased by 50%, 53%, and 30%. Furthermore, ablation experiments validated the effectiveness of each core module. The spatial range tree layer contributed most significantly, achieving an overall average performance improvement of 2.5times. The temporal index layer contributed secondarily, yielding an average performance improvement of 1.2 times. The moving object double linked list contributed approximately 90% to the average performance improvement, making its contribution most substantial in moving object trajectory queries, where efficiency increased nearly fourfold.
  • HUANG Jie, TANG Jianhang, ZHANG Yang, DU Luole, FENG Yixiong
    Accepted: 2025-12-12
    Smart grid has rich power infrastructure in industry 5.0 ,and there are many kinds and widely distributed load detection devices in smart grid, which leads to strong heterogeneity of user load data collected by edge load detection devices. Using distributed federated learning for load training of larger models is prone to unstable model convergence. To address this issue, an efficient training method for the partitioned federated learning model for the smart grid was proposed. This method applies the training of neural network models to the area from substations to users. By using the split layer, the global model for power load prediction is divided into the top model and the bottom model. The server first collects the resource information of the load detection devices, then uses the freshness index of the load prediction model to define the priority to select the training set of the load detection devices, and allocates appropriate batches for heterogeneous load detection devices for the training of the bottom model. The server merges the heterogeneous load detection device features in the training set to generate a larger mixed feature sequence, reducing the impact of device heterogeneity on the training data and improving the model accuracy. The KL-divergence is used to measure the distribution difference of the training set, and the batch size is fine-tuned to reduce the distribution difference. Based on the public power load curve dataset, three baseline methods were compared. In non-independent and identically distributed data, the accuracy of this method was up to 3.6%, 11.7%, and 12.9% higher than the baseline methods.
  • Chen Haiyun, Deng Zhouyao, Xiang Haorui
    Accepted: 2025-12-12
    Small-target detection in aerial imagery is challenged by tiny object sizes, complex backgrounds and large scale variations, while existing detectors still underperform in feature extraction, multi-scale fusion and small-target awareness; to address these limitations we present MA-DETR, an improved RT-DETR-based aerial small-target detection algorithm. First, a Dual Adaptive Perception Network (DAPN) is embedded in the backbone, leveraging a spatial–scale separation module and a dual adaptive pooling mechanism to enhance perception across diverse scales. Second, an Adaptive Multi-Scale Feature Fusion Network (AMSFN) is designed with a multi-module collaborative architecture that establishes a bidirectional multi-path feature transmission mechanism to boost small-target representation. Additionally, a small-target detection layer based on Adaptive Wavelet Convolution (AWC) is introduced, serially combining wavelet convolution with a remote-sensing anchor attention mechanism to strengthen small-target features in both the frequency and spatial domains. Finally, a CF-CGDL loss integrating a core focusing mechanism and a corner geometric distance loss is proposed to refine bounding-box regression. Experiments on VisDrone2019 yield 43.5 % mAP@50, outperforming the baseline by 6.4 % while reducing parameters by 1.1 × 10⁶; generalization tests on DOTA v1.0 and RSOD reach 71.8 % and 95.5 % mAP@50, gains of 3.1% and 7.1 % respectively, demonstrating the method’s effectiveness and robustness.
  • Li Luyang, Yan Jinlong, Fang Zeru, Jin Qiqi, Xue Hongxin
    Accepted: 2025-12-12
    In 3D object detection from point clouds, the inherent sparsity of LiDAR data poses pronounced challenges for small objects. Few effective points lead to weak structural cues and blurry boundaries; limited contextual awareness hinders spatial reasoning and semantic completion, causing localization bias; and the difficulty of precise spatial localization, weak channel expressiveness, and background dominance jointly constrain accuracy. To mitigate the impact of the above issues on detection accuracy, we propose a dynamic-aware 3D detector that integrates dynamic feature extraction with feature-enhancement mapping, targeting the two critical stages of small-object detection—feature extraction and candidate generation. Specifically, we introduce a dynamic point-feature prediction network that adaptively predicts and supplements sampling points to strengthen structural perception of small objects; we then build a feature-enhancement mapping network that deeply fuses the original features with those produced by the dynamic module to yield context-rich 2D feature maps, thereby compensating for contextual deficiency and improving localization; finally, we design a point-cloud feature-enhancement module to sharpen focus on key small-object regions along both channel and spatial dimensions. Experiments on the nuScenes dataset demonstrate that our approach surpasses mainstream detectors: relative to the CenterPoint baseline, mean Average Precision (mAP) increases from 56.1% to 59.4%, and the nuScenes Detection Score (NDS) rises from 64.4% to 67.4%.
  • HUANG Zhengting , CHEN Xuexin, LIN Zhiyong , CAI Ruichu
    Accepted: 2025-12-12
    Predicting synthetic lethality (SL) interactions holds significant promise for anticancer drug discovery. However, existing interpretable SL prediction methods typically assume a fixed number of explanation patterns, limiting their ability to capture the inherent diversity underlying SL mechanisms. In this study, we propose DiSE4SL, a model that formulates the generation of explanatory subgraphs as a stochastic process in function space, thereby addressing the critical challenge of adaptively determining the number of explanatory patterns. Built upon the neural process framework, DiSE4SL first leverages a base SL predictor to obtain prediction scores and node embeddings for gene pairs. A context encoder then integrates structural features with predictive semantics into a unified vector representation, which parameterizes the conditional posterior of a Gaussian Mixture Model (GMM), mapping distinct explanatory patterns to different Gaussian components. During training, latent variables are sampled via the Gumbel–Softmax mechanism, and mode-aware attention weights sparsify local subgraphs to yield explanations. In addition, contrastive loss and Lipschitz regularization are introduced to encourage discriminative yet smooth explanatory patterns across components. Finally, by sampling latent variables and applying clustering without a preset number of clusters, DiSE4SL can adaptively extract multiple explanatory subgraphs for each gene pair. The effectiveness of DiSE4SL is validated on benchmark datasets, where it delivers competitive predictive performance (AUPR 0.9337) against the strongest baseline and significantly enhances explanation diversity and fidelity by 29.1% and 9.5%, respectively, compared to the second-best method.
  • Ren Haimeng, Yu Hongfei, Ai Xin
    Accepted: 2025-12-10
    To address the issues of insufficient feature interaction depth and weak long-term sequence modeling capability in existing trajectory prediction models, a vehicle trajectory prediction model based on coarse and fine-grained feature interaction and long short-term memory enhancement is proposed. This model aims to achieve interactive enhancement of coarse and fine-grained features in the scene, deeply integrating the inherent advantages of dual perspectives. It extracts coarse-grained features such as road structure and traffic flow distribution from the scene center perspective to construct a macroscopic motion framework; and extracts fine-grained features such as the relative motion between the target vehicle and surrounding agents and local interaction relationships from the agent center perspective to depict microscopic behavior details. Through the dynamic constraint and deep interaction of fine-grained features on coarse-grained features, the problem of insufficient feature interaction depth is effectively improved, achieving precise refinement of the end positions of multi-modal predicted trajectories. Meanwhile, to effectively alleviate the weakness in long-term sequence modeling capability, a long short-term memory enhancement module with dual memory units is designed to capture long-distance temporal dependency features. Through feature weighting and trajectory endpoint correction strategies, the model's prediction capability for long-term trajectories is effectively enhanced. Experimental results show that compared with mainstream trajectory prediction models, the proposed method has significant improvements in key indicators. On the Argoverse 1 dataset, the average improvements in the minimum final displacement error, minimum average displacement error, and minimum final displacement error are 4.4%, 5.4%, and 4.9% respectively. On the Argoverse 2 dataset, the corresponding indicators are improved by an average of 5.1%, 6.3%, and 5.8% respectively. This result not only proves the improvement in trajectory prediction accuracy of the proposed model but also verifies its generalization effectiveness in different data distribution scenarios.
  • Wang Fatang, Song Ran, Huang Yuxin, Xiang Yan
    Accepted: 2025-12-10
    Multi-modal Entity Alignment (MMEA) aims to integrate structural, textual, and visual information to identify nodes in different multi-modal knowledge graphs that represent the same real-world entity. Existing methods often ignore the inconsistency in attribute-type descriptions between knowledge graphs when fusing multi-modal features, which leads to deviation in entity representation and affects alignment performance. To address this issue, this paper propose an MMEA method based on attribute filtering enhancement and multi-round instruction reasoning. The method consists of three main modules: First, multi-modal information is integrated and entity similarity is calculated to obtain candidate entity sequences. Secondly, in the entity information processing section, an attribute screening enhancement mechanism is employed to select semantically similar entity attribute types between knowledge graphs, thereby mitigating the interference caused by differences in attribute descriptions and redundant information. This helps reduce interference caused by descriptive differences and redundant information. Finally, the alignment task is modeled as a multiple-choice problem, where the filtered attributes and natural language information of entities are combined to fine-tune large language models. During reasoning, a multi-round reasoning strategy is introduced, dividing the large number of candidate entities into subsequences to enhance the model's ability to distinguish semantic differences among the entities in the subsequences, thereby improving the accuracy of the final alignment reasoning. Experiments are conducted on multiple public datasets FB-DB15K, FB-YAGO15K, EN-FR-15K V2, EN-DE-15K V2, and the results demonstrate consistent improvements in entity alignment performance of our method over the baseline methods. Specifically, on the FB-DB15K, EN-FR-15K V2, and EN-DE-15K V2 datasets, our method achieves absolute gains in MRR of 2%, 1%, and 0.2%, respectively, compared to the second-best model. Notably, a significant substantial margins of 9.1% in MRR and 7.8% in Hits@1.
  • HAN Song , CHE Chang-chang , WANG He-long
    Accepted: 2025-12-10
    With the rapid development of autonomous driving technology, accurate trajectory prediction has become essential for ensuring driving safety. In this context, Adversarial Multimodal LSTM–Informer for Integrated Driving Intention and Trajectory Prediction (AMLI-DIR) is proposed. The model adopts a hierarchical architecture. In the intention recognition layer, a GATv2-BiLSTM network is constructed to extract the spatial and temporal features of the target vehicle and its surrounding vehicles. Meanwhile, a spatiotemporal cross-attention mechanism is introduced to effectively fuse these features, thereby achieving precise driving intention recognition. In the trajectory prediction layer, independent prediction models are built for lane-keeping and lane-changing scenarios, and a multi-criteria generator is employed to produce accurate predicted trajectories. During the prediction stage, the AMLI-DIR model first identifies the most probable driving intention and then activates the corresponding trajectory prediction model, enabling intention-specific trajectory prediction. The model is trained, validated, and tested using the NGSIM and CQSkyEyeX datasets based on real-world traffic scenarios. Experimental results demonstrate that the AMLI-DIR model outperforms all comparison models across multiple evaluation metrics. Notably, in long-term prediction (3 s), it achieves the lowest RMSE of 1.05 m, which is approximately 22.2% lower than that of the second-best model, STEI. Furthermore, the RMSE of the AMLI-DIR model increases by only 0.26 m from 1 s to 3 s, significantly lower than other models, further validating its effectiveness and superiority in trajectory prediction tasks.
  • CHEN Lingqiang , HU Haifeng, ZHANG Suofei
    Accepted: 2025-12-09
    To address the issues of limited single-round generation quality and low search efficiency in small-scale language models for automated workflow generation, a workflow generation method based on Monte Carlo Tree Search and Self-Refine (WGM-MCTSR) is proposed. This method enhances workflow generation performance through two core mechanisms: first, a workflow self-refine optimization mechanism is designed, which employs iterative generation-evaluation-reconstruction cycles to perform structural reconstruction or correction of workflows using feedback evaluation information, thereby compensating for the limited reasoning capabilities of small-scale language models; second, the selection and backpropagation phases of the Monte Carlo Tree Search algorithm are improved by introducing an Upper Confidence Bound Apply to Tree (UCT) selection strategy to replace the traditional soft-mix selection probability, and implementing a child node score backpropagation mechanism to dynamically adjust parent node selection probabilities, thus optimizing the search direction. Experimental results on six datasets including GSM8K, MATH, DROP, HotpotQA, HumanEval, and MBPP demonstrate that the method achieves solve rates of 70.11% and 23.45% in mathematical reasoning tasks, F1 scores of 54.87% and 52.47% in question-answering tasks, and pass rates of 81.83% and 58.82% in code generation tasks. Compared with existing workflow generation methods, the method achieves performance improvements of 5.4% on GSM8K and 9.6% on MATH, obtaining optimal results across all task types, which validates the effectiveness of the improved mechanisms in enhancing workflow generation efficiency and quality for small-scale language models.
  • Li Qian, Liu Peng, Yao Lian, Wu Jigang
    Accepted: 2025-12-08
    The Memristive Crossbar Array (MCA) serves as the fundamental hardware component of the Computing-in-Memory (CIM) architecture, enabling matrix operations to be performed with O(1) time complexity. However, due to the limited bit-width of device, existing methods often require configuring a large number of memory cells to represent numerical values, leading to increased hardware resource consumption and making it difficult to achieve both high precision and high energy efficiency. To address this issue, paper proposes a mixed-precision quantization method based on crossbar-aware. This method first employs K-means clustering to optimize output channel rearrangement, enhancing weight distribution consistency within sublayers to reduce quantization error and improve post-quantization model accuracy. Building upon this, sublayers are partitioned according to the physical constraints of the MCA, ensuring the output channel count aligns with parallel processing capacity of the MCA. This reduces the number of dequantization operations and lowers computational complexity. Simultaneously, an array-aware regularization term is introduced, combining the number of MCA required per sublayer with group Lasso regularization. This dynamically induces bit-level sparsity in weights, reducing hardware resource overhead while compressing bit width. Experiments show that the method is able to quantize the network model to an average of 1.3-bit with no more than 0.2% loss in accuracy and a reduction in hardware area overhead of about 74% compared to traditional quantization methods on different neural networks (ResNet/VGG). Compared with existing quantization schemes, the method proposed in this paper achieves a synergistic optimization of accuracy and hardware resources at very low bit-width.
  • ZHAO Peiyuan, GONG Xiaoliang
    Accepted: 2025-12-04
    Addressing issues in existing rehabilitation robot simulation research, such as the mismatch between biomechanical characteristics and robot control strategies, and insufficient automation in human-robot coupling simulations, this study innovatively integrates robot kinematics analysis, training trajectory planning and design, and biomechanical characteristics of musculoskeletal models to construct a human-robot joint simulation system for upper limb rehabilitation robots based on OpenSim and MATLAB, and proposes an automated human-robot coupling simulation process. The system enables synchronous joint angle adjustment and motion playback visualization for matched models. At the robot simulation layer, it provides forward and inverse kinematics calculations, and offers four trajectory planning algorithms for different application scenarios. The computational results are converted in format and then transmitted to the biomechanical simulation layer. In the biomechanical simulation layer, residual reduction is combined with computational muscle control to compensate for unmodeled external forces (i.e., indirectly compensate for robot external force data errors) and optimize muscle activation solutions. It also supports the visualization of biological information such as muscle activation levels and muscle fiber lengths in simulation results, helping rehabilitation physicians more accurately assess patient recovery outcomes. The system is validated through joint kinematic and dynamic simulation experiments focused on the flexion of the right upper limb elbow. Compared to traditional methods, this innovative system significantly improves efficiency and automation while simplifying the complexity of cross-platform simulation operations.
  • CUI Haoran, QUAN Ting , CHEN Maowei, DAI Rong
    Accepted: 2025-12-04
    In the process of solving computational fluid dynamics (CFD) problems, the Algebraic Multigrid (AMG) algorithm can effectively accelerate the solution process. As the most widely used open-source CFD software, OpenFOAM employs the Geometric Agglomerated Algebraic Multigrid (GAMG) algorithm based on the Lower-Diagonal-Upper (LDU) matrix format to accelerate flow field solutions on CPUs.In recent years, CPU+GPU heterogeneous parallel computing systems have flourished, and domestic GPGPUs have achieved breakthroughs, enabling localized substitution. Targeting such heterogeneous computing systems, extensive research has been conducted on GPU-accelerated algorithms in CFD, implementing a heterogeneous parallel version of the GAMG algorithm in OpenFOAM on domestic platforms can fully leverage domestic computing power and significantly improve simulation efficiency.Targeting a heterogeneous computing platform composed of CPUs and domestic GPGPU accelerator cards, this work designs and implements a parallel acceleration method for the LDU-based GAMG algorithm. By fully utilizing the multithreading capabilities of GPUs, all components of the GAMG algorithm are optimized for parallel execution on the GPU.Benchmark tests on the 3D lid-driven cavity flow and motorBike flow-over cases are conducted to verify the correctness and evaluate the performance of the heterogeneous GAMG algorithm at different problem scales. Experimental results show that the proposed algorithm maintains the same computational accuracy as the original version. The heterogeneous GAMG implementation configured with a Jacobi smoother achieves a 10–27× speedup compared to the CPU serial implementation configured with a Gauss-Seidel smoother. Performance analysis indicates that the computational speed of the time-dominant restriction and smoothing operators has been significantly improved.These results validate the effectiveness and computational potential of the GAMG parallel solver framework on domestic heterogeneous platforms and provide a feasible approach and technical foundation for the heterogeneous parallelization and engineering application of CFD solvers on domestic GPGPU systems.
  • WANG Hao, QIN Jin , YANG Changhao
    Accepted: 2025-12-03
    The Ant Colony Optimization (ACO) algorithm is widely employed for solving combinatorial optimization problems, where effective heuristic information facilitates rapid convergence to high-quality solutions. The existing neural ant colony optimization algorithms, such as Deep Ant Colony Optimization(DeepACO) and Generative Flow Ant Colony Sampler(GFACS), leverage deep reinforcement learning to automatically design heuristic information, substantially enhancing the solution quality of existing ACO algorithms. However, the existing neural ant colony optimization algorithms generate heuristic information solely based on the static features of problem instances, neglecting the temporal characteristics of partial solutions constructed by individual ants. This limitation impedes the heuristic information from effectively guiding differentiated search behaviors among ants during the exploration process, thereby compromising population diversity. Moreover, when aggregating information via graph neural networks (GNN), the existing neural ant colony optimization algorithms only process node features without integrating edge features prior to aggregation, resulting in insufficient information capture by GNN. To address these issues, the Temporal Edge Feature enhanced Neural Ant Colony Optimization (TEF-NACO) algorithm is proposed. TEF-NACO extracts temporal features of each ant via a Recurrent Neural Network (RNN) and subsequently integrates them with global graph structural information. Furthermore, during the node aggregation phase of the GNN, both node and edge features are comprehensively incorporated to enhance the network’s information capture capacity. Additionally, an edge-attention-based regularization term is introduced into the loss function to improve training stability. The experiments show that the TEF-NACO algorithm achieves the best performance in 24 combinatorial optimization tasks, with the percentages exceeding those of ACO, DeepACO and GFACS being 100%, 87.5% and 75% respectively. The average accuracy improvement is 21.5%, 3.4% and 3.2% respectively.
  • Wu Qingbo, Wu Youxin, Yu Chengyuan
    Accepted: 2025-12-03
    3D Gaussian Splatting (3DGS) has shown remarkable performance in novel view synthesis and high-precision scene reconstruction. However, its excessively high model storage overhead significantly limits its practical applications. To address this issue, a lightweight compression method is proposed to reduce the storage cost of 3DGS models and enhance rendering efficiency. First, an importance score metric based on local color differences and redundancy is introduced to identify and eliminate redundant Gaussian primitives. Furthermore, a progressive training strategy that combines Gaussian filtering and downsampling is proposed to improve the stability and efficiency of training. On this basis, a hybrid quantization scheme is applied to different properties of the Gaussian primitives to further improve the compression ratio. Finally, Morton encoding and residual encoding are utilized to compress the coordinate attributes of the Gaussian primitives, further reducing the model size. To validate the effectiveness of the proposed method, experiments were conducted on multiple real-world datasets and compared with various existing compression models. The results show that the proposed method reduces the model size by 97.8% compared to the original 3DGS, and by an additional 38.8% compared to Reduced-3DGS, while maintaining comparable rendering quality to Reduced-3DGS. It also enhances both training and rendering efficiency, demonstrating significant advantages over other existing compression models. The model achieves a good balance between compression ratio and rendering quality, providing an effective solution for advancing the practical application of 3DGS in 3D scene reconstruction.
  • Wangjing Lv, Zhaobo Qi, Xinyan Liu, Beichen Zhang, Weigang Zhang
    Accepted: 2025-12-02
    Long-term action anticipation, as a crucial task in computer vision, aims to predict the sequence of actions a person is likely to perform in the distant future based on first-person video. The main challenge of this task lies in the inherent uncertainty of future behaviors—actors in similar contexts may follow multiple plausible action trajectories, while most video samples in existing datasets typically cover only one. This limits the model’s ability to learn action diversity. Moreover, the input video segments are relatively short compared to the extended range of future prediction, further exacerbating the difficulty due to the contradiction between insufficient observations and long-range reasoning.To address these challenges, we propose a predictive framework named Vision and LLM Cooperative Network (ViLLCoNet), which is based on a cooperative mechanism between a lightweight model and a large-scale model. These two modules are responsible for predictive modeling and constraining the prediction space, respectively. The lightweight model comprises a visual encoder, a visual auxiliary information extractor, and an action predictor. It encodes the input video, extracts visual auxiliary cues, and generates the future action distribution. The visual auxiliary extractor introduces a cross-attention mechanism to capture interactions between hands and object regions by fusing hand cues and object features.The large-scale auxiliary module, built upon a large language model, identifies low-probability object nouns in the current scene and uses them to constrain the predictor of the lightweight model. By masking semantically implausible candidates in the prediction space, this mechanism improves both accuracy and plausibility of predictions. In addition, the loss function is optimized by introducing a noun temporal smoothing loss, which constrains the predicted noun distribution to exhibit temporal coherence. The proposed method is evaluated on the Ego4D and 50Salads datasets. Experimental results demonstrate that, compared with the baseline model, the proposed ViLLCoNet achieves an 8.9% improvement in noun prediction and a 4.2% improvement in verb prediction on the Ego4D dataset.
  • Li Haoxuan, Zhang Zhiyuan, Liu Rui, Xu Peihua, Tian Xin
    Accepted: 2025-11-27
    High-resolution climate data is crucial for local and regional-scale production and livelihoods, while deep learning-based downscaling techniques can effectively bridge the gap between existing low-resolution climate data and application requirements. Deep learning-based downscaling methods that can generate high-resolution climate data hold considerable significance for both local and regional production activities. However, existing methods are often constrained by fixed scaling factors, leading to high training costs in multi-scale scenarios. Meanwhile, their results in climate data are usually blurred and inaccurate in high-frequency details. To address these limitations, this study proposes a deep learning super-resolution network that fuses implicit neural representation and adaptive feature encoding for arbitrary-scale climate downscaling. In detail, the method designs the dynamic pixel feature aggregation module to dynamically adjust the feature encoding process through a learnable modulator, which can adapt to different scaling factors. Besides, the implicit neural representation for the images is designed to predict continuous-domain pixel values by fusing coordinate linear differences features and neighborhood nonlinear features via an attention mechanism. Finally, combined with a high-order degradation training strategy, experiments on the ECMMWF HRES and ERA5 datasets demonstrate that the proposed method achieves a PSNR improvement of at least 0.7 dB at ×2 scaling factor compared to fixed-ratio methods, and outperforms existing arbitrary-ratio methods by at least 0.48 dB under the same scaling condition. These quantitative results demonstrate that our approach is superior to existing methods, as it provides a more flexible and efficient solution for meteorological data processing.
  • SONG Chengqun, ZHANG Ke, YANG Mengjie, CHENG Jun
    Accepted: 2025-11-26
    To address the inefficiency and safety risks of manual patrols in large facilities and complex venues, this study aims to balance global coverage and the prioritization of high-risk areas while improving the efficiency and robustness of path planning. We propose a risk-aware Intelligent Patrol Strategy (IPS): (i) model patrol as a combination of comprehensive and single patrols; (ii) build static/dynamic risk heat map via a Gaussian Mixture Model (GMM); and (iii) design a tanh-based target-point updating method to suppress clustering and balance risk and spatial distribution. For path generation, we develop a Multi-Target Rapidly-exploring Random Tree (MT-RRT) algorithm comprising Multi-Target Feasible Path Planning (MTFPP) and Information Subset Optimization (ISO). MTFPP estimates feasible inter-point costs with an improved RRT-Connect and determines the visiting order using Ant Colony Optimization (ACO), yielding a single feasible path through all targets. ISO samples within an ellipse-shaped informed subset and applies RRT*-style rewiring to iteratively refine that path into a shorter and smoother one. Simulations show that, compared with Euclidean-distance baselines, our method significantly reduces final path length and improves success rate and convergence under limited iterations; it achieves full-area coverage while assigning higher patrol frequency to high-risk regions, making it suitable for industrial plants, hazardous-material warehouses, and large public buildings.
  • ZHANG Longyao, Wen Dongxin, MA Zhuangyu, SHU Yanjun, LI Qing, LIU Mingyi, ZUO Decheng
    Accepted: 2025-11-26
    Large Language Model-based Multi-Agent Systems have demonstrated significant potential in handling complex tasks. Their distributed nature and interaction uncertainty can lead to diverse anomalies, threatening system reliability. To systematically identify and classify such anomalies, this study conducts a comprehensive review. The research selected seven representative multi-agent systems and their corresponding datasets, collecting 13,418 operational traces, and employed a hybrid data analysis method combining preliminary LLM analysis with expert manual validation. A fine-grained, four-level anomaly classification framework was constructed, encompassing Model Understanding and Perception Anomalies, Agent Interaction Anomalies, Task Execution Anomalies, and External Environment Anomalies, and typical cases were analyzed to reveal the underlying logic and external causes of each type of anomaly. Statistical analysis indicates that Model Understanding and Perception Anomalies account for the highest proportion, with "Context Hallucination" and "Task Instruction Misunderstanding" being the primary issues. Agent Interaction Anomalies represent 16.8%, primarily caused by "Information Concealment." Task Execution Anomalies make up 27.1%, mainly characterized by "Repetitive Decision Errors." External Environment Anomalies constitute 18.3%, with "Memory Conflicts" as the predominant factor. In addition, model perception and understanding anomalies often act as root causes, triggering anomalies at other levels, highlighting the importance of enhancing the fundamental capabilities of the model. These classification and root cause analysis aims at providing theoretical support and practical reference for building highly reliable LLM-based multi-agent systems.
  • Wang Wen, Yang Kuiwu, Tong Songsong, Wei Jianghong, Xue Yan, Zhou Rongkui
    Accepted: 2025-11-26
    Model intellectual property protection has become an issue that cannot be ignored in model security. Watermarking technology, as the core means of model traceability, provides technical support for copyright verification by embedding special identifiers into model parameters or generated content. However, the trained watermarked models are very easy to be copied and spread, which enables attackers to destroy or remove the watermarks embedded in DNN models through specific technical means such as fine-tuning, pruning, or adversarial sample attacks, making it impossible to verify the model ownership. To gain a deeper understanding of model watermarking attack methods, this paper first introduces model watermarking attacks, then classifies the model watermarking attack methods into two categories: white-box watermarking attacks and black-box watermarking attacks, based on the attacker's access rights and information acquisition capabilities to the target model. It also sorts out and analyzes the motives, hazards, attack principles, and specific implementation methods of DNN model watermarking attacks. Meanwhile, it compares and summarizes the existing research on model watermarking attacks from the aspects of attacker capabilities and performance impacts. Finally, it further explores the potential positive role of neural network model watermarking attacks in future research and provides suggestions for in-depth research in the fields of model security and intellectual property protection.
  • ZHANG Junna, WANG Hongzun, DING Chuntao
    Accepted: 2025-11-25
    Post-Training Quantization (PTQ) is an efficient model compression method that converts the parameters of high-precision floating-point models into low-bit integer representations without the need for retraining, using only a small amount (or no) unlabeled calibration data. This method significantly reduces storage and computational overhead while maximizing the retention of the original model's inference accuracy, making it widely recognized and adopted in both academia and industry. This paper systematically summarizes the research progress of PTQ from four dimensions: quantization steps, method classification, tool ecosystem, and application advancements.First, a clear framework for the quantization process is constructed, covering steps such as dynamic range statistics, quantization parameter calculation, weight and activation quantization, error optimization, and model generation. Second, a complete classification system for quantization methods is proposed, which includes quantization granularity, bit width, calibration methods, and structure-guided quantization. Third, the tool ecosystem supporting the large-scale application of PTQ is analyzed, discussing its value in hardware adaptation and engineering deployment. Finally, the paper summarizes the integration and application progress of PTQ methods and highlights the challenges faced in practice, especially those related to cross-modal consistency, extremely low-bit semantic collapse, and hardware adaptation. These practical challenges not only reveal the limitations of current technologies but also provide important directions for future research. This review provides a reference framework for PTQ methods for both academia and industry, facilitating the widespread application of artificial intelligence in resource-constrained scenarios.
  • ZHANG Ke, CHEN Jiahao
    Accepted: 2025-11-21
    Multi-Hop Graph Convolutional Network (Multi-Hop GCN) has achieved certain results in alleviating the over-compression problem. However, the multi-hop propagation design has specific parametric information compression loss during the information aggregation process and is sensitive to the local topological structure, which makes it difficult for this type of model to achieve an ideal prediction effect when performing node classification tasks. To address the above problems, this paper starts from the intra-layer and inter-layer perspectives of the multi-hop graph convolutional model, and uses a decoupling-based technique inspired by predictive propagation decoupling and a knowledge jump module to solve the above issues, thereby constructing a new type of multi-hop graph convolutional network—the Knowledge-Semi-Decoupled Multi-Hop Network DrJK-Net. Firstly, a semi-decoupling technique that retains the activation function is proposed to simplify the intra-layer structure of the multi-hop propagation layer. By removing the linear layer in the hidden layer, the number of feature changes during the multi-hop propagation process is reduced, and the parametric information compression loss is decreased. Then, a knowledge jump connection is added between the propagation layers. By connecting all hidden layer embeddings, the model's adaptive selection ability of hidden layer embeddings is improved, and the sensitivity to the local topological structure is reduced. Subsequently, the multi-hop graph convolutional skeleton is combined with the semi-decoupling technique for simplifying intra-layer information propagation and the knowledge jump connection module for establishing inter-layer information channels, proposing a model framework DrJK-Net with lower parametric information compression loss and stronger adaptability to the local topological structure. Finally, comparative experiments and ablation experiments are carried out on multiple public paper networks such as Citeseer, CoraFull, and Actor, as well as social network datasets. The results of the comparative experiments show that DrJK-Net surpasses most cutting-edge models in node classification accuracy and has a significant advantage in running speed. The results of the ablation experiments further verify the effectiveness of the proposed semi-decoupling technique and the introduced knowledge jump connection mechanism, providing new ideas and methods for the development of multi-hop graph convolutional networks.
  • NIU Yan, SUN Yang, LI Jun
    Accepted: 2025-11-21
    Multimodal emotion recognition aims to understand complex human emotion expressions, however, existing methods generally face the challenges of insufficient accuracy and robustness when dealing with nuances of emotion expressions and complex inter-modal interactions. Specifically, traditional speech feature extraction methods are difficult to comprehensively capture emotion information across multiple time scales, and existing fusion strategies are limited in their efficiency in integrating complementary information and dealing with complex inter-modal associations, while category imbalance and boundary sample problems often lead to degradation of model performance. Aiming at the above problems, this paper proposes a new method for multimodal emotion recognition using speech and facial images. The method firstly introduces a multiscale attention mechanism in the speech feature extraction stage, replacing the traditional multilayer perceptron, which can adaptively focus and capture the emotion features from microscopic phoneme changes to macroscopic rhythmic patterns, and realize a more comprehensive emotion information extraction; secondly, a adaptive multi-expert collaborated decision making architecture is designed, which can be used to recognize the emotion information through expert networks and an adaptive multimodal expert coordination network. Adaptive Multimodal Expert Coordination Network, which efficiently integrates complementary information of different modalities and handles complex interactions between modalities; finally, a boundary
  • Guo Wei, Meng Qiaoqiao, Jin Haibo, Tian Congcong
    Accepted: 2025-11-20
    In the field of industrial quality inspection, there are common problems in the detection of steel surface defects, such as insufficient fusion of target features, missed detection of fine edge defects, and unbalanced sample classification. Therefore, a steel surface defect detection algorithm based on multi-scale interaction and dynamic collaboration is proposed. In the backbone network, by fusing the shifted sparse convolution and inverted residual structure, the interactive fusion of defect features under different receptive fields is strengthened, and the feature expression ability of multi-scale defects is improved. Introduce the large separation kernel attention mechanism to dynamically enhance the feature response to fine defect areas and reduce the missed detection rate of cracks and inclusions. In the neck network, by combining the DySample dynamic upsampling strategy, dynamic upsampling based on defect content is achieved, which not only improves the clarity of the defect contour of small targets but also reduces computational redundancy, adapting to the deployment of edge devices. In addition, an EMASlideLoss loss function integrating exponential moving average and sliding threshold mechanisms is designed to dynamically balance the learning weights of difficult and easy samples, thereby improving the detection deviation caused by the uneven distribution of defect samples. Experiments on the NEU-DET dataset show that the mean mAP50% of the average accuracy of this algorithm reaches 84.4%, which is 5.8% higher than that of the original YOLO11n. While the precision and recall rates increase by 5.2% and 4.8% respectively, the computational load decreases by 8%. This algorithm not only optimizes the computational efficiency but also improves the detection accuracy, and is more capable of meeting the detection requirements in industrial scenarios.
  • LIU Ying, ZHANG Runyu , YANG Chaoshu
    Accepted: 2025-11-20
    The Log-Structured Merge tree (LSM-tree) has been widely adopted in key-value storage systems due to its high write performance enabled by sequential write operations. However, it also suffers from issues such as high read/write amplification, significant compaction overhead, and data redundancy. Traditional optimization approaches aim to improve system performance by modifying tree structures, refining compaction strategies, and adopting key-value separation mechanisms. In the era of big data, the rapid growth of data volume leads to increasingly frequent write and compaction operations in LSM-tree systems, placing continuous pressure on CPU computing resources and gradually turning them into performance bottlenecks. Moreover, traditional solutions fail to fundamentally avoid the substantial I/O traffic between the host and storage devices, resulting in high overhead due to redundant data movement. Computational storage technology offers a promising solution to these challenges. By integrating computing resources at the storage layer, it enables task offloading to alleviate the CPU's workload and supports near-data processing to reduce the performance overhead caused by data migration. This survey focuses on optimization strategies for LSM-tree based on computational storage. First, the architecture of computational storage is reviewed. Then, in response to the major bottlenecks under the big data context, existing solutions are classified and compared from two perspectives: compaction optimization and data migration optimization. Finally, potential future research directions are suggested to provide insights in this field.