Author Login Chief Editor Login Reviewer Login Editor Login Remote Office
Home Browse Just accepted

Just accepted

Please wait a minute...
  • Select all
    |
  • Zhou Jieqin, Feng Yixiong, Jin Kebing, Tang Jianhang, Wu Xuanyu, Xiao Xi, Tan Jianrong
    Accepted: 2025-09-18
    UAV edge-computing systems deploy UAVs as mobile edge servers for cost-effective, low-profile services, but uneven user geography and limited onboard resources make placement critical: misdeployment causes coverage holes, inflated cooperative-communication latency, and load imbalance. We pursue an optimal balance among coverage, communication quality, and energy efficiency by integrating dynamic collaborative offloading with hybrid intelligent algorithms that couple discrete placement with continuous offloading. Tasks are intelligently partitioned with dynamic offloading ratios for real-time load balancing. Subject to latency constraints, we jointly optimize deployment, cooperative offloading, and compute/communication allocation within a nonconvex mixed-integer framework. Placement is handled by a hybrid metaheuristic with adaptive mutation/crossover for faster convergence, while offloading/resource control uses an enhanced DDPG (DP-Hybrid) for coordinated decisions. Simulations demonstrate a superior energy–latency trade-off and substantial reductions in overall system cost versus state-of-the-art baselines.
  • Junhan Deng , Bin Wang, Zehua Zhang
    Accepted: 2025-09-18
    In complex intelligent decision-making tasks, domain annotation bias can lead to degradation of the quality of model training data, which in turn affects the generalization ability and decision-making performance of the system. Such bias usually stems from two reasons: (1) the sparsity of expert labeling data due to the scarcity of relevant expert resources, which leads to the performance limitation of traditional supervised learning methods, and (2) the heterogeneity of expert knowledge due to the different tendencies of experts (including the differences in professional backgrounds, the diversity of risk preferences, etc.) triggering decision-making conflicts. Existing studies have not yet effectively solved the uncertainty problems caused by the expert labeling sparsity problem, the expert multi-tendency problem, and the expert knowledge fusion conflict. To this end, this paper proposes a multi-expert multi-perspective approach (Decision Making with MoE, DM-MoE) for the domain labeling bias problem, which integrates the method theory of Mixture of Expert strategy (MoE,Mixture of Experts) and uncertainty reasoning to construct a collaborative decision-making framework. The method constructs a multi-intelligent body system through prompt engineering, so that LLMs (including DeepSeek, GPT-4, and Literally One Mind) construct cross-domain multi-experts for different domains through prompt engineering, and dynamically generate decision annotations according to the real-time tendency changes of the experts. And the dynamic three-way decision-making mechanism is used to model the multi-propensity expert decision-making information. Finally, a two-stage optimization strategy is designed to specify the multi-criteria weights for the uncertainties in the decision to-be-determined domain by AHP hierarchical analysis based on LLMs, and combined with the TOPSIS method for the iterative multi-criteria optimization. Experiments show that DM-MoE has superior accuracy and stability compared with traditional decision-making methods.
  • Xu Xiong, Yang Xinyu , Zhu Xuekang, Du Bo, Su Lei, Tong Bingkui, Lei Zeyu, Zhou Jizhe
    Accepted: 2025-09-18
    In the field of image manipulation localization (IML), the small quantity and poor quality of existing datasets consistently pose major issues. A dataset that contains various types of manipulations significantly improves the accuracy of IML models. Images available on public forums, such as those in online image modification communities, frequently undergo manipulation with diverse techniques. Researchers create a dataset from these images and greatly enhance the diversity of manipulation types in the data. However, due to resolution and clarity issues, images sourced from the internet often carry noise, which complicates efforts to obtain clean masks by simply subtracting the manipulated image from the original. This noise proves difficult to eliminate, and as a result, the masks remain unusable for IML models. Drawing inspiration from the field of change detection, researchers treat the original and manipulated images as temporal changes of the same image and approach the data generation task as a change detection challenge. Due to clarity differences between images, traditional change detection models perform poorly. To address this, researchers introduce a super-resolution module and propose the Manipulation Mask Manufacturer (MMM) framework, which enhances the resolution of both original and tampered images to facilitate better comparison. At the same time, the framework transforms the original and tampered images into feature embeddings and combines them, effectively capturing the context. Additionally, researchers employ the MMM framework to develop the Manipulation Mask Manufacturer Dataset (MMMD), which encompasses a broad spectrum of manipulation techniques. Through MMM and MMMD, researchers aim to contribute to the fields of image forensics and manipulation detection by supplying more realistic manipulation data.
  • LU Kaiwen, YANG Yating, DONG Rui, MA Bo, WANG Lei, ZHOU Xi, MA Rong
    Accepted: 2025-09-18
    Reinforcement learning methods based on direct preference optimization have shown excellent results in many downstream tasks of large language models. However, when applied directly to machine translation, this approach often leads to over-optimization problems due to the global reward maximization strategy. Specifically, it causes the model to overly focus on consistency with the distribution of reference translations, thereby losing the potential for local translation diversity and global optimization. To address the aforementioned issues, the problem of performance degradation of direct preference optimization methods in large language model machine translation was investigated. Based on this, a large language model machine translation method based on local preference optimization was proposed. This method identifies frequently mistranslated low-frequency phrases in translations through dynamic temperature sampling and reference-free evaluation of the large language model. Furthermore, a preference data construction method that combines global differences and local key differences is introduced. Considering both the overall translation quality of the model and the local translation diversity, global loss and local loss functions at the token level are proposed. Finally, a two-phase curriculum learning strategy is employed to gradually adjust the model's output preference for low-frequency phrases. The proposed method was validated on the FLORES-200 dataset, selecting fourteen multilingual translation tasks with complex morphologies for testing. The experimental results showed that the scores of the proposed method on XCOMET, COMET-22, and BLEU were 80.7, 89.9, and 30.2, respectively. By comparing with several strong baselines in multilingual machine translation, the proposed method outperformed the baseline models across all translation directions, confirming the effectiveness of the method.
  • Wang Xuguang, Liu Wangjie, Jiao Qiantian, Zhang Mi
    Accepted: 2025-09-16
    Reliable short-term prediction of photovoltaic power generation power is very important for the dispatch and safety of new energy power, and the planning and operation of energy storage systems. However, there is often a time-domain alignment bias between PV power and related meteorological factors, which makes it difficult for the prediction model to learn a stable quantitative relationship between future PV power and historical meteorological factors, which leads to the problem of low accuracy of PV power prediction. In this paper, the delay embedding model is used to describe the quantitative relationship between PV power and historical meteorological factors in the future, and the time-domain alignment bias between PV power and related meteorological factors is often described based on delay parameterization. The simulation and real data experimental results show that the correction of alignment bias can effectively improve the prediction accuracy.
  • Chen Ran, Han Jinyu, He Wenwen, Du Weiwen
    Accepted: 2025-09-16
    Cardiovascular diseases seriously threaten human health, and applying deep learning to ECG analysis can significantly improve diagnostic accuracy. However, existing ECG classification algorithms often lack effective modeling for multi-resolution temporal features and channel coordination. This paper proposes TCC-ResNeXt, a multi-scale temporal and channel-coordinated ECG classification algorithm. The method combines a Period-Adaptive Module (PAM) for extracting complex temporal features and an ECG-ACmix module for adaptively fusing multi-head attention with convolutional features across channels. Experiments on CPSC-2018, Chapman, and DS-COM datasets show that the proposed approach achieves superior performance, with average F1 scores of 0.798, 0.968, and 0.751, respectively, outperforming methods like MobileNetV3, MVMSNet, and EcgTransformer in AUC, Recall, and F1. These results confirm the effectiveness of TCC-ResNeXt for automated ECG classification and intelligent cardiovascular disease diagnosis.In addition, the framework demonstrates strong generalization and robustness across datasets. It provides a promising direction for practical clinical ECG analysis and real-world deployment.
  • Wang Zhiyuan, Zhang Wei, Guan Bingzheng, Yang Huili
    Accepted: 2025-09-16
    In the field of industrial production, taking the tire molding process as an example, this paper constructs a high-performance question-answering system for private data security in a low-cost environment. Existing RAG methods, such as Self-RAG, will increase computational complexity, and Corrective-RAG will lead to excessively long contexts, making them unsuitable for use in low-cost environments. A Multi-Agent Sequential Collaboration Graph Retrieval Augmented Generation (MSCG-RAG) method is proposed. Each agent performs a single task and uses structured data as contextual information, which avoids excessively long contexts and reduces the difficulty for large models to understand the context, ultimately realizing the question-answering service for the tire molding process. The performance of the MSCG-RAG method in terms of general RAG metrics, namely Context Relevance, Faithfulness, and Answer Correctness, is 75.0%, 75.8%, and 85.7% respectively. In the evaluation where high-performance large models are used as domain experts for scoring, the method scores 7.833, 7.826, and 8.301 respectively under the scoring of three large language models: DeepSeek-R1, Qwen-plus, and Qwen-turbo, all of which are higher than those of the Basic Graph RAG (BG-RAG) method and the Graph-Vector Hybrid RAG (GVH-RAG) method. The results of the ablation experiment show that link filtering has the greatest impact on context relevance; the loss of filtering ability will reduce the context relevance by 18.5 percentage points. The result correction part mainly affects the faithfulness of the generated results; the loss of result correction ability will reduce the faithfulness of the generated results by 12.6 percentage points. The results of the base model replacement experiment show that the MSCG-RAG method performs stably on different combinations of large models, with high practicability and feasibility.
  • Zhang Jiaqing, Ma Xiujuan, Ma Fuxiang, Zhou Bin, Yin Jun
    Accepted: 2025-09-09
    Addressing the limitations of traditional Graph Neural Networks (GNNs) in modeling higher-order relationships and multi-way interactions, this paper proposes a novel heterogeneous hypergraph recommendation model, termed HNSGCN, which integrates node similarity associations with a hypergraph attention mechanism. Within this framework, users are abstracted as hyperedges and items as nodes. Leveraging contextual semantic features of both users and items, the model constructs user-user and item-item similarity matrices utilizing cosine similarity and Jaccard similarity coefficients. This process effectively transforms the conventional dyadic interaction network into a heterogeneous hypernetwork. Building upon this hypergraph structure, the model incorporates hypergraph convolutional operations and a hierarchical attention mechanism. This enables the adaptive aggregation of structural information across different levels, thereby effectively capturing complex higher-order latent relationships between users and items and significantly enhancing recommendation accuracy. To rigorously validate the model's efficacy, comprehensive comparative experiments were conducted on two real-world datasets, Amazon and Yelp-1K. Comparisons against multiple state-of-the-art recommendation baselines demonstrate that the proposed HNSGCN model achieves significantly superior performance across all three evaluation metrics: Recall@K, Precision@K, and NDCG@K. Furthermore, ablation studies confirm that both the incorporation of node similarity associations and the multi-layer attention aggregation mechanism play crucial roles in driving the model's performance gains.
  • Huang Keming, Liu Miao
    Accepted: 2025-09-09
    Federated Learning (FL), as a distributed edge training framework, enables model training without centralizing clients' private data, thus offering significant advantages in terms of data privacy and security. However, in practical applications, clients not only face communication constraints but, more commonly, suffer from performance degradation due to inconsistent data distributions (Non-Independent and Identically Distributed, Non-IID). To address this challenge, this paper proposes a Multi-stream Feature-aware Network, FedMFP. Specifically, the method employs a dual-stream feature decoupling architecture to separately extract global features and fine-grained features from clients: The global stream network utilizes feature perturber/ compensator mechanisms to capture inter-sample correlations from a holistic perspective; The fine-grained stream network adopts a multi-stream architecture to extract personalized multi-scale information. Concurrently, distinct loss functions are designed to effectively decouple these two types of features, minimizing mutual interference between them. Extensive experimental results demonstrate that FedMFP achieves average test accuracy improvements of 13.27% and 14.41% compared to nine baseline algorithms on classic Non-IID datasets including Cifar100 and Tiny-ImageNet, significantly enhancing the model's generalization capability and robustness under Non-IID data distributions.
  • Yang Yu, Shijie Hu, Kangkang Fan, Wei Guo, Yazhou Hu, Dawei Zhang
    Accepted: 2025-09-09
    Traditional visual perception methods can only capture information about objects within the line of sight and are unable to detect objects obscured by obstacles in the scene. Non-line-of-sight (NLOS) methods, on the other hand, reconstruct information about these occluded objects by analyzing light or electromagnetic signals reflected or projected onto visible relay surfaces. However, after years of research, existing NLOS methods still face a significant challenge in capturing faint signal components that have undergone multiple reflections in outdoor environments. This poses considerable challenges for non-line-of-sight (NLOS) perception applications in complex, dynamic outdoor real-world scenarios. To address this, this paper proposes the use of cost-effective millimeter-wave radar to detect and track hidden targets in large-scale dynamic scenes. Such radar has been widely adopted in the automotive industry and supports low-cost mass production. After converting radar point clouds into pseudo-images, we apply the proposed two-stage attention network (TSAN) for the detection and tracking of hidden targets. Experiments show that the TSAN network model significantly improves detection performance across multiple categories under various Intersection over Union (IoU) thresholds, achieving a mean average precision (mAP) of 75.62%. Compared with existing results, the TSAN network yields a 5.99% improvement in mAP, outperforming current state-of-the-art methods. In addition, the prototype built based on the method described in this paper provides a low-cost solution for NLOS target detection and tracking systems, and verifies its effectiveness in achieving cost-effective real-time NLOS target detection and tracking.
  • LIU Tianquan, LU Cunyue, WANG Xiaolong, LUO Runshu
    Accepted: 2025-09-09
    Underwater image generation technology acts as a crucial solution for filling data gaps in marine exploration. The authenticity and diversity of generated images directly affecting the reliability of subsequent analytical studies. Existing models typically possess enormous parameter quantities, with prolonged training and inference processes; the generated underwater images suffer from insufficient clarity, and distortions exist in the structures and edges of image subjects; the inference process has not adequately considered the unique optical properties of underwater environments, so the authenticity of the generated images remains to be optimized. To resolve these issues, this paper proposes UW-ControlNet (Underwater ControlNet), a novel network built upon the ControlNet architecture performing parameter fine-tuning on a pretrained Stable Diffusion model. It combines structural constraints from conditional images with semantic guidance from textual prompts, achieving cross-modally controllable generation of underwater images. A lightweight feature extraction network is introduced to optimize the feature extraction process of conditional images, thereby enhancing the convergence speed and inference speed of the model. A correlation matrix-based channel attention model is designed to decouple and couple global channel features corresponding to the background with local channel features corresponding to the subject, optimize text-image multimodal alignment in the generation process, and enhance the credibility of the generation results. A Structure-Semantics Constraint Enhancement Module is constructed to prevent constraint information loss caused by the downsampling process, ensuring structure consistency between generated images and conditional images to be guaranteed. Experimental results confirm that UW-ControlNet surpasses existing methods in both quantitative metrics and qualitative evaluations, demonstrating significant application potential.
  • Wu Jiang, Li Ziqi, Zhang Yonghong
    Accepted: 2025-09-05
    The joint classification of hyperspectral imagery (HSI) and light detection and ranging (LiDAR) data can fully leverage their complementary advantages in spectral and spatial-structural information, and has become an important research focus in the field of remote sensing. However, due to significant differences in their imaging mechanisms, HSI and LiDAR exhibit a high degree of heterogeneity in terms of data dimensionality and feature distribution, which poses severe challenges for semantic representation and efficient fusion of multimodal data. To address these challenges, we propose a Multi-Scale Hybrid Convolution Mamba Network (MHCMNet) for joint HSI and LiDAR data classification. Specifically, the framework first employs a Multi-Scale Feature Extraction Module (MFEM) to extract spectral, spatial, and elevation features from the two modalities. Subsequently, the parallel Feature Tokenization Module (FTM) transforms the features of both modalities into unified feature tokens. To further enhance the collaborative representation of multimodal features, MHCMNet innovatively introduces a Mamba-based Feature Fusion Module (MFFM), which leverages its powerful long-range dependency modeling capability to achieve deep intra- and inter-modal feature interaction and efficient fusion. Experimental results demonstrate that MHCMNet achieves the highest overall accuracy (OA) of 99.03%, 90.71%, and 91.47% on the Trento, Houston2013, and MUUFL datasets, respectively, while maintaining low model complexity. In addition, ablation studies validate the effectiveness of each module in performance improvement, further confirming the superiority of the proposed method in multi-source remote sensing data classification.
  • ZHANG Wei, ZHENG Hao, ZHU Shiyi, XIAO Yimei, ZENG Xinyao
    Accepted: 2025-09-03
    Course recommendation is crucial for enhancing learners' learning efficiency and engagement, and modeling learners' learning sequences is a key part of course recommendation, because these sequences not only contain learners' dynamic learning interests, but also imply the evolutionary law of learning behaviors. However, existing methods focus on sequential relationships in sequences and fail to consider the impact of the time interval between courses on the dynamic evolution of learners' interests. In addition, most models characterize learners' behaviors with a single vector, failing to portray the dynamic evolution process of their multidimensional learning interests and the associations between different interests, resulting in biased interest modeling. To address these issues, this paper proposes a time interval-enhanced multi-interest dynamic evolution network for course recommendation(TIMIR). The method treats learners’ interaction history as sequences with different time intervals and designs a spatio-temporal dual self-attention mechanism to capture the persistence and transfer patterns of learners' interest dynamic evolution by designing a dual-path mechanism to differentiate the differential effects of long and short-term time intervals; generating multiple learner interest vectors by combining with the dynamic routing mechanism in the capsule network; and constructing a multi-interest dynamic evolution network, modeling the temporal evolution of learners' multi-interests and the associations between multiple interests, so as to improve the prediction accuracy of long-term learning behaviors and recommendation coverage in complex interest scenarios. Experimental results on the MOOCCourse dataset demonstrate that TIMIR outperforms other advanced recommendation models by 2.56% on HT@20 and 4.18% on NDCG@20; on the MOOCCube dataset, the two metrics outperform other advanced recommendation models by 1.27% and 1.71%, respectively, validating its effectiveness in enhancing recommendation accuracy.
  • Jin Kexin, Chen Donglin
    Accepted: 2025-09-03
    Existing models often fail to handle trend, seasonality, and nonlinear disturbances simultaneously in seasonal time series forecasting, leading to limited adaptability in complex scenarios. To address this issue, this study proposes a novel STL-ARIMA-Prophet-LSTM hybrid model. First, The method applies STL to decompose the original time series into trend, seasonal, and residual components. Then, different models are used for each component based on its characteristics: the ARIMA model captures linear trends in the trend component, the Prophet model handles periodic patterns and holiday effects in the seasonal component, and the LSTM model nonlinear variations in the residual. Finally, according to the strategy adopted during STL decomposition, the predicted results of each component are reconstructed to obtain the final forecasting outcome. The model is evaluated on three real-world datasets: financial reimbursement volume from universities, transaction volume from e-commerce platforms, and regional power load. Experimental results show that the hybrid model achieves superior or second-best forecasting performance compared with five baseline models across all datasets. Further ablation studies confirm that the integration of STL decomposition and multi-model collaboration significantly enhances both accuracy and robustness. The results demonstrate that the STL-ARIMA-Prophet-LSTM hybrid model provides excellent forecasting performance and holds strong potential for application in seasonal time series prediction tasks.
  • YU Xiaosheng , LI Sheng , LI Songpu
    Accepted: 2025-09-02
    】Noise interference and low resolution degrade feature expression, causing key detail loss and semantic information degradation, which limits model robustness and generalization in complex scenes. To address this problem, a visual language model-driven dual-branch anomaly detection network MSRA-CLIP (Multi scale and Residual Attention-CLIP) was constructed.First, two parallel branches are used to process the image. The upper branch designs a combined attention unit of multi-scale attention, which balances computational complexity and performance while improving the quality of image super-resolution. The lower branch uses a residual attention module that includes residual attention and skip connections. Through a large number of residual attention and skip connections, rich global and local features are captured, and then the image features processed by the two branches are spliced. Finally, the processed image features are mapped to the joint embedding space using an image-text multi-level alignment module and then compared with the text features to generate anomaly maps. Experiments on five medical anomaly detection datasets (Brain MRI, Liver CT, etc.)demonstrate MSRA-CLIP's superiority over MVFA, with average AUC improvements of 5% in zero-shot anomaly classification, 1.1% in anomaly segmentation, and 0.93% in few-shot classification.
  • KONG Yulong, LIN Suzhen, JIN Zanxia
    Accepted: 2025-09-02
    Video captioning aims to deeply analyze video content and accurately and fluently describe it in natural language. Concepts, corresponding to objects, actions, and attributes in video content, can serve as a medium for video captioning. Although some studies have explored concept-guided video captioning, two main issues remain, limited concept detection accuracy and insufficient concept utilization. To address these issues, this paper proposes a multimodal video captioning approach guided by global and local concepts (CGMVC) to improve the quality of generated descriptions. First it extracts multimodal features of videos using different backbone networks. It leverages HMMC model via hierarchical matching video-to-text retrieval to provide textual information from videos. Then, it uses multimodal feature fusion and concept detection network to precisely detect concepts. To fully utilize the detected concepts, concept projection module is employed to uncover the latent themes of videos to globally guide decoding, while semantic attention module and cross attention module are used to locally optimize decoding by leveraging concepts and multimodal features of videos. By fully utilizing concepts and information from different modalities, more natural and accurate descriptions are generated. Experiments on the MSVD and MSR-VTT datasets show that the CGMVC model achieves CIDEr scores of 111.2% and 64.1%, and BLEU@4 scores of 57.1% and 51.2%, respectively. Comparative and ablation studies demonstrate the superiority of the CGMVC method over baseline approaches and other state-of-the-art methods.
  • Qi Hui, Zhang SiQi, Shi Ying, Qi XiaoBo
    Accepted: 2025-09-02
    With the rapid development of socio-economics, resident happiness has emerged as a critical indicator for measuring social progress. Accurate prediction of resident happiness is essential for policy formulation and social resource allocation. However, existing methods exhibit systematic limitations in cross-group applicability and policy interpretability. To address these challenges, this paper proposes a Feature Interaction-Optimized Dynamic Weighted Ensemble Model (FIO-DWEM) for happiness prediction. First, a feature interaction optimization mechanism is constructed by generating second-order interaction features through polynomial expansion, combined with correlation filtering and Recursive Feature Elimination (RFE) to extract high-information features. Subsequently, randomized search integrated with leave-one-out cross-validation is employed for hyperparameter tuning of base models, dynamically adjusting their weights based on error ratios, and integrating probabilistic outputs through a soft voting mechanism. Experimental results demonstrate the superior performance of FIO-DWEM across multiple datasets: it achieves performance improvements of 0.54%–39.86% on the Somerville dataset and maintains cross-domain validation accuracy ranging from 89.57% to 98.89%. SHAP analysis reveals the impact mechanisms of key features (e.g., urban service information availability) on happiness, providing interpretable technical support for policy-making and individualized assessment.
  • LI Zekai, YAN Zhidan, CHEN Can
    Accepted: 2025-09-01
    When pursuing favorable instantaneous response characteristics in gimbal servo control systems, a reduction in system stability margin often occurs. Concurrently, the large integral component introduced in controllers to ensure high positioning accuracy constrains the system's response rate and induces phase lag in the 60–120 Hz frequency band—where the system is susceptible to mechanical resonance. This significantly compromises anti-interference capability. To address these challenges, this paper proposes a gimbal control method based on lag compensation and disturbance suppression. Building upon the traditional cascade dual-loop Proportional-Integral-Derivative (PID) controller, we introduce an additional Linear Extended State Observer (LESO). This LESO treats motor control phase lag, moment-of-inertia identification errors, mechanical vibrations, and other internal/external disturbances as extended state variables. The LESO’s output is then utilized as a feedforward control signal. Compared to conventional cascade PID control, the proposed method achieves balanced high responsiveness and stability, effectively mitigates phase lag issues, enhances the gimbal servo system’s anti-interference performance, and demonstrates strong practical value in engineering applications.
  • Xiao Liyang, Ai Xinyang, Xie Wei, Gu Kaijie
    Accepted: 2025-09-01
    Unmanned aerial vehicle (UAV) is being extensively applied in agricultural operations, drawing increasing research attention to the optimization of efficient operational strategies. In agricultural spraying tasks, UAVs are constrained by both battery capacity and tank volume. To solve this problem, a multi-trip operation mode is introduced. The research formulates an integer linear programming model that simultaneously addressed three critical components: spraying sequence optimization, flight path planning, and multi-UAV scheduling coordination, with the objective of minimizing operational costs. To effectively solve this complex combinatorial optimization problem, an improved Adaptive Large Neighborhood Search algorithm is proposed. Four removal operators and three insertion operators are designed based on the characteristics of the problem. A Simulated Annealing mechanism is also used to accept worse solutions and improve global search ability. By calculating operator scores, the algorithm dynamically adjusts the operator selection strategy, thereby enhancing the solution performance. Parameter values are determined through preliminary experiments. Extensive computational experiments on benchmark instances of varying sizes demonstrate that the proposed algorithm significantly outperforms both the commercial solver CPLEX and sequence-based method in terms of solution quality and computational efficiency. Furthermore, the ALNS algorithm is compared with two mainstream metaheuristics—Genetic Algorithm (GA) and Ant Colony Optimization (ACO). The results show that ALNS consistently yields better solution quality on both medium-scale and large-scale instances. On medium-scale instances, it achieves average improvements of 6.90% over GA and 3.55% over ACO, while on large-scale instances, the improvements are 7.84% and 4.47%, respectively.
  • JIANG Huan, HAN Hua, HUANG Li, A. A. M. MUZAHID
    Accepted: 2025-09-01
    Deep learning models have become increasingly prevalent in practical applications, but they remain vulnerable to adversarial examples. In recent years, physical adversarial examples have emerged as a research hotspot. Most existing approaches focus on enhancing the attack strength and specificity of adversarial examples, yet they often overlook the commonalities across different models, as well as the generalizability and visual naturalness of adversarial samples. To address these limitations, this paper proposes a physical adversarial camouflage generation method based on color perception constraints, aiming to improve the transferability and naturalness of the camouflage. Specifically, the method first preprocesses a given 3D car model to generate multi-layer attention maps. Then, using derived binary masks, it separates attention regions across layers. For each connected subgraph, the corresponding pixel set is extracted from the texture and mapped to the printable color space. Subsequently, a joint loss combining attention and color constraints is optimized to generate the optimal adversarial camouflage. After processing all subgraphs, global consistency optimization is performed to eliminate abrupt boundaries and color discontinuities between subgraphs, thereby enhancing visual comfort. The proposed method is independent of specific model architectures, demonstrating strong cross-model transferability and practical potential. Extensive experiments show that the color-constrained physical adversarial camouflage method outperforms baseline approaches in both digital and physical environments.
  • Yin Xinyu, Li Wenxi, Xu Gang, He Sheng
    Accepted: 2025-09-01
    With the advancement of regional intelligentization, data-intensive and latency-sensitive services have proliferated. Although the introduction of edge computing alleviates pressure on existing regional dedicated networks, the increasingly stringent comprehensive requirements for metrics such as device energy consumption and latency necessitate research into higher-performance edge computing offloading strategies.To address these challenges, this paper proposes an energy-efficient collaborative task offloading model tailored for regional environments. The model holistically integrates task deadlines, queue backlog states, and bandwidth resource constraints while incorporating awareness of sudden channel condition variations.By establishing a cloud-edge-terminal tripartite collaborative framework, we jointly optimize multi-stage task completion latency, multi-user offloading ratios, and bandwidth allocation. Leveraging Lyapunov optimization, the long-term stochastic optimization problem is transformed into an online decision-making framework. An enhanced particle swarm optimization (PSO) algorithm is introduced to construct a Lyapunov-PSO hybrid optimization architecture, strengthening global exploration capability under non-convex constraints and achieving multi-objective collaborative optimization. Furthermore, a hierarchical threshold-mapping encoding method resolves mapping conflicts between discrete offloading decisions and continuous optimization spaces.Experimental results demonstrate that, compared to standalone heuristic algorithms and artificial intelligence methods, the proposed algorithm achieves holistic resource optimization configuration and further reduces energy consumption during task processing.
  • LIU Fengchun, HAN Hongshuai, ZHANG Chunying, MA Jiang
    Accepted: 2025-09-01
    Current thyroid nodule segmentation methods often lead to blurred boundaries or detail loss during image feature analysis, and the low quality and high noise of thyroid ultrasound images further hinder precise feature extraction. To address these issues, we propose a thyroid nodule ultrasound image segmentation network FMVM-DFFT based on the latest visual state space model (VMamba), integrating factorized VSS and feature frequency-band separation. The network architecture boasts three key innovations: (1) By combining Factorization Machine (FM) and External Attention (EA), a factorized variant of VSS, namely FMVSS, is proposed to efficiently extract features from multiple dimensions of input images and adaptively adjust fusion weights, enhancing the capture of critical information and local details; (2) A DFFT module with dual-branch fast Fourier transform is designed to dynamically separate and finely extract high-frequency and low-frequency features of encoder outputs, improving the network's frequency-domain perception. This is combined with Channel Attention (CA) to optimize feature selection and fusion for better detail capture; (3) A Laplacian operator-based edge optimization strategy combined with the novel BDELoss is proposed and applied in the training process to further enhance the network's learning ability for image edge regions. Comparative experiments on TN3K and DDTI datasets show that FMVM-DFFT outperforms mainstream segmentation networks and latest image segmentation networks methods, achieving DSC scores of 88.50% and 79.37% on TN3K, and 78.85% and 65.09% on DDTI for DSC and IoU, respectively.
  • JIAO Ruixuan, QIN Jia, QIN Pinle, ZENG Jianchao, CHAI Rui
    Accepted: 2025-09-01
    Deformable 3D medical image registration faces many challenges due to the complex and diverse morphological changes of human organs. Although various advanced registration models have demonstrated promising results, the limitation of the convolutional neural network’s fixed receptive field and kernel size restricts its ability to capture global contextual information during feature extraction. To address this issue, frequency domain information is incorporated into deformable 3D medical image registration, resulting in the Spatial-Frequency Deformable Registration Network (SFDR-Net). This network synergistically combines spatial and frequency domains with a dynamic gating mechanism to enhance the representation and interaction of multi-scale features. Specifically, given that the Fourier transform is sensitive to deformations while extracting high- and low-frequency information, it is integrated into deformable 3D medical image registration, leading to the development of an efficient Space-Frequency Dual-Domain Transformer Block (SFTB).The SFTB leverages the Fast Fourier Transform (FFT) to extract compact global structural information, which is then combined with multi-scale convolutions in the spatial domain to precisely estimate extensive deformations through the interaction of features at different granularities. Furthermore, a Dynamic Gating Fusion Module (DGFM) is employed to fuse and enhance multi-scale spatial-frequency optimized features, selectively incorporating them into subsequent deformation estimation stages to avert inaccuracies arising from the degradation of long-range feature information. SFDR-Net achieved average Dice scores of 64.33%, 81.89%, and 79.81% on the Mindboggle-101, OASIS, and IXI datasets, respectively, representing average improvements of 5.2%, 2.75%, and 2.34% compared to other state-of-the-art networks. This highlights its superior capability to integrate overall features with fine details and adaptively balance multi-scale deformation features for more precise registration across various deformation scenarios.
  • Gao Lei, Jiang Hailong, Min Fan, Yang Mei
    Accepted: 2025-08-29
    Surface waves in seismic data are typical coherent noise. They pose major denoising challenges due to strong energy, complex propagation directions, and waveform similarity to valid signals. Existing deep learning methods often rely on deep network stacking or single-modal feature representation. While these approaches improve surface wave suppression, they suffer from insufficient multi-scale feature fusion and limited long-range dependency modeling, leading to structural blurring or low-frequency loss in valid signals. In this paper, we propose a Multi-scale Attention and Dilated Convolution Fusion Network (MA-DCNet) for surface wave suppression. MA-DCNet has four main modules, including direction-adaptive feature enhancement module (DAFEM), multi-scale feature fusion module (MSFFM), channel-localized attention module (CLAM) and global context self-attention module (GCSAM). DAFEM employs multi-axis self-attention to adaptively enhance direction-critical information. MSFFM constructing multi-scale receptive fields through windmill convolution. CLAM integrating channel attention with depthwise separable convolution to preserve event continuity. GCSAM establishing full-trace-gather dependency relationships using global contextual attention to discriminate surface waves from valid signals. Experimental results demonstrate that MA-DCNet outperforms four state-of-the-art deep learning methods, achieving superior surface wave suppression while better preserving seismic signal integrity.
  • Zhao Ya, Zhu Wanzhen, Jia Di, Shan Kexin, Yao Wenda
    Accepted: 2025-08-29
    Although traditional reconstruction methods can effectively fit the global shape of the human face and the basic topological structure, they have certain limitations when capturing complex facial expression changes and high-frequency detail features. To solve this problem, this paper proposes a 3D face reconstruction method that integrates expression perception and detail enhancement, aiming to achieve high-fidelity reconstruction of the face model through semantic mapping of expression parameters and extraction of local high-frequency details. The expression perception module builds an expression encoder based on the EfficientViT network, combines the attention mechanism and the expression basis matrix to dynamically represent the facial geometric deformation under different expression changes, and designs the expression cross-entropy loss to optimize the discriminability of expression parameters and improve the accuracy of expression modeling. Secondly, the detail enhancement module adopts deformable convolutional networks to extract high-frequency texture features of the face, and fuses mask information and multi-scale semantic features to guide the detail reconstruction of the facial area. Meanwhile, the local detail consistency loss based on wavelet transform is introduced to constrain the detail features in different frequency domain subbands and enhance the expressiveness of facial details. The experimental results show that, compared with the existing reconstruction methods, the method proposed in this paper performs well in key indicators such as root mean square error (RMSE, 1.36) and normalized mean square error (NME, 3.04), verifying its outstanding performance in the accuracy of expression reconstruction and the ability to restore details. At the same time, it demonstrates strong robustness to extreme expressions and large pose head changes.
  • Lin Rongxin, Li Shuohao, Dong Liming, Hao Siqi
    Accepted: 2025-08-28
    With the exponential growth of information dissemination on social media platforms, false news detection has become a critical challenge in the field of information authenticity verification. Existing research methods focus on single-modal semantic analysis, which inadequately models cross-modal semantic contradictions in multimodal news, while also suffering from limited interpretability due to the absence of explainable auxiliary information. To address the above issues, this study proposes a multimodal fake news detection framework utilizing a large vision-language model. The framework introduces the following innovations: 1) utilization of the large vision-language model Qwen2.5-VL to reason over news content and generate multimodal description sets that enhance interpretability; 2) design of a multi-granularity co-attention mechanism to achieve cross-modal feature alignment across textual semantics, visual features, and auxiliary descriptions. We design news prompting templates to guide Qwen2.5-VL in extracting key objects, scene elements, and contextual semantic enhancements from news, thereby generating explainable auxiliary decision-making evidence. Based on co-attention layers, the multi-grained co-attention fusion mechanism employs hierarchical feature interactions to capture latent fake patterns in multimodal news within high-dimensional semantic spaces. This study conducted experiments on three multimodal fake news datasets Weibo, GossipCop and Pheme, and the experimental results showed that the accuracy rates reached 90.4%, 99.7% and 86.6% respectively.
  • ZHONG Zishan, TANG Jianhang, JIN Kebing, ZHANG Yang, DU Luole, YAO Hui
    Accepted: 2025-08-28
    数字孪生(Digital Twin,DT)技术是基于现实物理实体映射的虚拟模型,为了使DT能够随着物理实体的变化而更新,物理实体应定期向DT发送实时的状态和信息。在数字孪生系统中,实时数据的处理和历史数据的存储与分析分别代表了实时数字孪生(实时DT)和历史数字孪生(历史DT)。在边缘计算中,虚拟与物理设备之间的信息交互时延是制约数字孪生实时类业务性能的关键因素。在实际部署中,流量大的关键边缘节点面临存储、带宽和计算资源的紧张局面,而流量较小的边缘节点则存在资源闲置的问题。不同地理位置的边缘服务器之间服务能力和可用资源分布极为不均,导致整体系统资源利用率低下、服务质量难以保障。本研究提出一种深度强化学习的联合部署策略,通过深度强化学习方法优化实时DT和历史DT的边缘部署方案。该方法通过建立联合边缘放置模型,考虑实时数据流时效性高的特性,根据实时DT与历史DT的耦合性,综合考虑实时DT和历史DT的部署成本,建立部署时间最小化问题,利用深度Q学习来平衡实时DT和历史DT之间的资源分配、时延优化以及服务质量保障。对于提出的任务请求,利用变分近似互信息选出与任务相关性最高的实时DT与历史DT,并为任务提供服务。通过仿真实验表明,所提出的深度强化学习算法能够在多种场景下自适应地调整边缘设备的资源分配策略,相对于基准算法平均降低了34%虚实信息同步时延,内存利用率平均可以提高7%。
  • Liu Yujie, Wang Yiwen
    Accepted: 2025-08-27
    Micro-gestures are subtle, unconscious movements driven by internal emotions, with significant value in affective computing. Due to their transient nature in time and subtle, ambiguous patterns in space, they are difficult to capture using traditional methods. This paper proposes a multi-modal collaborative framework for micro-gesture recognition by integrating video, skeleton, and text as complementary representations. The skeleton modality is introduced as a kinematic prior to bridge visual and semantic gaps. Two collaborative modules are designed: Video-Pose Collaborative Module(VPCM), which fuses visual details with global motion features and uses cross-temporal attention to enhance temporal modeling; Text-Pose Collaborative Module(TPCM), which leverages semantic priors through a Top-K fusion strategy to enhance skeleton-text alignment. A two-stage training strategy was adopted, pre-training unimodal encoders before collaborative learning with lightweight adapters. Experiments show the proposed method achieves 70.40% accuracy, outperforming existing approaches.
  • HAO Jinlong, ZHANG Zhen, LI Xiuhua, ZENG Hushuang, HUANG Hepeng, CAI Chunmao
    Accepted: 2025-08-27
    To address the challenge of balancing high accuracy with low parameter count and computational complexity in multi-scale object detection for resource-constrained edge scenarios, this paper proposes LMS-YOLO, a lightweight multi-scale detection method based on YOLOv8. The approach introduces four key innovations: a lightweight channel-spatial attention module integrated into the backbone's CSP blocks that combines efficient channel attention with multi-scale depthwise separable low-rank convolutions for dual-dimensional feature enhancement; a cross-layer adaptive weighted fusion module establishing skip connections to dynamically integrate shallow detail features with deep semantic information; replacement of standard bottlenecks with generalized inverted bottlenecks in the neck network to reduce computation while maintaining accuracy; and a novel focal scale-aware dynamic IoU loss that adaptively adjusts error penalties based on target scale and detection difficulty. Comprehensive evaluations on the BDD100K dataset demonstrate that LMS-YOLO-m achieves superior performance compared to YOLOv8m, with 0.5% and 0.1% improvements in mAP@50 and mAP respectively, while reducing parameters by 2.4% and computation by 5.8%, making it particularly suitable for deployment in resource-constrained edge computing environments where efficiency and accuracy must be carefully balanced.
  • LI Bowen, TAN Tai, LI Jie, ZHANG Jianwei, ZHANG Xiangrui
    Accepted: 2025-08-27
    The six degrees of freedom (6-DOF) unmanned aerial vehicle (UAV) air combat scenario is highly challenging, involving high-dimensional continuous state and action space, as well as nonlinear dynamics. To address the difficulties of decision-making in such scenarios, this paper proposes a Progressive Multi-objective Strategy Optimization (PMSO) algorithm, which enhances policy learning performance by dynamically adjusting the granularity of the action space and incorporating multi-objective reward functions. To overcome the challenges caused by the high dimensionality of the continuous action space and the excessively large search space, which often result in difficulty in decision-making or the failure to learn effective strategies, this paper designs a progressive discretization mechanism. In the initial stage, coarse-grained discrete action commands are adopted to facilitate rapid exploration of the strategy space, leveraging the local similarity in the control effectiveness of action commands to reduce the search space. As training iterations progress and task difficulty increases, the degree of discretization gradually decreases, thereby preserving the precision of action control. To address the sparse reward problem prevalent in air combat tasks, multi-objective reward functions are designed, incorporating objectives such as angle, distance and altitude. The coordination of these reward functions guides the algorithm to better understand the impact of current action commands on the overall air combat task, accelerating convergence. Simulation experiments in randomized air combat scenarios, including advantageous, neutral, and disadvantaged situations, demonstrate that the proposed PMSO algorithm achieves rapid convergence and learns effective air combat strategies. The convergence speed and the performance of the learned strategies outperform existing air combat algorithms.
  • WANG Yunhan, HU Yabing, CHEN Yujie, LIU Ying
    Accepted: 2025-08-27
    During the execution of tasks, identifying the potential collision risk and taking necessary maneuvering measures are critical to ensure the safe flight of Unmanned Aerial Vehicles (UAVs). To address the challenges of obstacle avoidance in external environments and inter-UAV collision avoidance among multiple drones, this paper proposes a cooperative collision avoidance algorithm based on adaptive artificial potential field method. Firstly, the proposed algorithm considers time and distance factors for conflict detection, and introduces an adaptive conflict detection coefficient to reduce unnecessary collision avoidance maneuvers. Then, an adaptive method for adjusting the repulsive force gain coefficient is proposed to prevent collisions caused by maneuverability limitations and improper initial settings. Besides, to reduce redundant actions during collision avoidance between UAVs, a priority-based conflict resolution strategy is developed. In addition, based on the UAV kinematic model, compensation for errors caused by communication delay and packet loss is made by predicting neighbor UAV’s position using the most recent information. Considering maneuverability constraints, the proposed algorithm outperforms the traditional APF by effectively avoiding collisions with a small repulsive gain coefficient and reducing the total path length by about 1.76%. When the data link latency is under 200ms and packet loss is below 50%, the proposed method shows good performance in avoiding collisions.
  • LI Xuexiang, ZHENG Yongli, ZHANG Yize, DUAN Pengsong
    Accepted: 2025-08-27
    With the popularization of the Internet and the diversification of applications, the fine-grained classification of massive network traffic has become the key to optimizing the quality of service and analyzing user behavior patterns. An overview of machine learning-based and pre-trained model-based network traffic analysis methods is presented, aiming to promote further research development in this field through multi-dimensional comparison and analysis. First, the complete flow of traffic classification is analyzed, covering data acquisition, preprocessing, and feature extraction processes, and the practical value of data balancing techniques is examined. The data format, scale, and scene suitability of mainstream public datasets are introduced, compared, and analyzed from multiple perspectives, pointing out their data distribution, feature redundancy, and timeliness problems. Secondly, not only the limitations of traditional algorithms in high-dimensional data processing and real-time are summarized at the methodological level, but also the trends of applying pre-trained model technology in the field of traffic analytics are summarized by focusing on the comparative analysis of the experimental results, including the breakthroughs of the pre-trained model BERT based on Transformer, the fusion model of big model and deep learning, and the breakthroughs of the lightweight big model in traffic classification. Finally, combined with the dynamic research trends, we discuss the opportunities and challenges in the future application of pre-trained models, analyze their limitations regarding computational cost and privacy protection, and propose future research directions and outlooks on research prospects.
  • Haifeng Zhu, Changyan Yi, Hao Wu, Hao Zheng, Xingan Dai, Kun Zuo, Youhua Gu
    Accepted: 2025-08-27
    The aerospace servo system, due to its unique operating environment, faces challenges when driving loads with high-order nonlinear motion characteristics using permanent magnet synchronous motors. In the above scenario, inaccurate load position feedbacks cause traditional closed-loop feedback based control algorithms, such as PID three-loop control, to exhibit problems of low tracking accuracy and insufficient command adaptability. To address these issues, a dual-delay deep deterministic policy gradient algorithm is employed to train a reinforcement learning agent. This agent fine-tunes the motor position feedback approximating the load position in the position loop to overcome the accuracy loss caused by the semi-closed-loop and enhance the controller's performance across multiple tasks. Simultaneously, the policy model of the agent is lightweighted and deployed on TMS320C6713B DSP to verify its real-time operation. Experimental results show that the proposed optimization based on deep reinforcement learning has a 2.07% improvement in load position testing and a 59% improvement in speed testing compared to the comparative control algorithms; In terms of load frequency characteristic testing, it generally outperforms the comparative control scheme and can be deployed on edge controllers with limited computing power to achieve real-time control.
  • ZENG Yi, GAO Yan, SHI Xianhui, GUO Xincheng
    Accepted: 2025-08-27
    Handling heterogeneous faults in power grids faces challenges such as difficult identification of overlapping triples and inefficient multi-modal feature fusion. For heterogeneous fault information, the representation differences of different data types and the difficulty in mining correlations increase the complexity of problem-solving. Therefore, this work proposes a joint optimization framework based on adversarial training and adaptive relation weights (Heterogeneous knowledge graph-Adaptive weight graphical convolutional networks, HKG-AWGCN). First, ontological symbols in the power grid domain are constructed. Five types of entities and eight types of relations are defined, and standardized mapping rules for entity-relation are established. In the knowledge extraction stage, a multi-stage adversarial training mechanism is designed. After extracting basic triples through the (BERT-BiLSTM)ATT-CRF model, FGM adversarial perturbations are injected into the CRF layer to optimize entity boundary recognition, and a relation-aware attention module is adopted to solve the conflict of overlapping relation paths. In the knowledge optimization stage, an adaptive-weight heterogeneous graph convolutional network is proposed. Multi-modal sub-graph features of fault propagation are aggregated through relation weight calculation constrained by electrical parameters, and a joint loss function is designed to optimize node embedding and topological structure synchronously. This work conducts comparative experiments on sequential modeling performance, graph-structured data processing performance, and multi-modal feature fusion performance. By comparing with 8 types of baseline models such as BiLSTM-CRF and GraphTransformer, it is found that HKG-AWGCN reaches 96.07% in accuracy, 95.58% in recall, and 95.15% in F1-score, providing interpretable decision-making support for power grid fault handling.
  • Feng Guoping, Wang Haiji, Hong Liang, Fang Jialiang
    Accepted: 2025-08-21
    The dissemination of misinformation on social media poses severe threats to public safety. Existing external knowledge-enhanced detection methods often suffer from performance limitations due to knowledge redundancy and noise interference. This paper introduces a key feature extraction method based on Conditional Optimal Transport (COT), which leverages the global semantics of raw text as prior conditions to minimize the conditional Kantorovich-Rubinstein (KR) distance between external knowledge generated by Large Language Models (LLMs) and original texts, thereby extracting critical features from external knowledge. Furthermore, we design a spatial-sequential mapping module to explicitly model textual positional information for preserving structural features, while integrating cross-attention mechanisms and cosine similarity to dynamically weight external knowledge for adaptive fusion. Experiments on public datasets Weibo and GossipCop demonstrate that the proposed knowledge-enhanced detection method outperforms the best baseline models by 3.1% and 1.3% in F1-score, respectively. Ablation studies confirm the effectiveness of both the COT module and spatial-sequential mapping module. Additionally, parameter sensitivity analysis reveals the model's stability under hyperparameter fluctuations (F1-score fluctuations remained below ±0.015), proving its strong robustness. This study provides novel theoretical paradigms and technical pathways for knowledge-enhanced misinformation detection.
  • YANG Yuan , ZHANG En , LI Gongli
    Accepted: 2025-08-21
    To solve the problems that users' private data may be leaked during model aggregation in federated learning, and that the server may tamper with the aggregation result to obtain certain illegal benefits, an efficient and verifiable privacy-preserving federated learning scheme based on game theory in dual-server architecture was proposed. Firstly, a single mask scheme based on homomorphic pseudorandom generator was used to protect data privacy, and the Shamir (t, n) threshold secret sharing scheme was used to distribute and reconstruct the mask, so that the proposed scheme can ensure privacy while allowing some users to drop out due to instability in the network environment. Secondly, a lightweight verification method based on Hadamard product was constructed, so that users only need to perform simple vector product operations to verify the correctness of the aggregation result, reducing the computational overhead required for verification. Finally, the prisoner contract and betrayal contract in game theory were introduced, and the two servers were promoted not to initiate collusion attacks through incentive strategies, solving the server collusion problem faced in the dual-server architecture, ensuring the security of user privacy and the credibility of the global model. The experimental results show that the proposed scheme can safely aggregate the gradients without affecting the accuracy of the model, and compared with existing schemes, its computational efficiency and communication efficiency have been improved, which is more obvious when users drop out.
  • LIN Mingsheng, SHEN Liwei, DONG Zhen
    Accepted: 2025-08-21
    In the home service scenario, multi robot systems need to process natural language instructions issued by non professional users, which poses higher requirements for automated task scheduling. In response to the shortcomings of existing multi robot scheduling methods in task understanding, dependency management, and resource optimization, this study proposes the DAG-LLM scheduling method to achieve full process automation from natural language input to multi machine collaboration. This method first utilizes the Large Language Model (LLM) combined with environmental information for semantic parsing and task decomposition, and generates a set of subtasks with execution constraints through the Chain-of-Abstraction (CoA) mechanism; Secondly, based on LLM, a Directed Acyclic Graph (DAG) is automatically constructed between subtasks to replace traditional manual modeling processes and accurately represent task dependencies; Finally, the backtracking algorithm is used to match robot skills with subtask requirements, combined with asynchronous execution strategy to improve execution, and dynamic scheduling is used to reduce waiting time while ensuring dependency order. To verify the effectiveness of the method, three types of household tasks with different complexities (including four sets of scenarios) were designed in the AI2-THOR simulation environment for comparative experiments. Experimental data show that DAG-LLM improves the task success rate by 43.3% compared with SMART-LLM and 60.0% compared with AutoTAMP; The running time is shortened by 32.8% and 39.4% respectively. The ablation experiment further demonstrates that task dependency modeling and asynchronous execution mechanisms play a crucial role in improving system performance. This method does not require manual involvement in task decomposition and dependency modeling, and is suitable for efficient collaborative scheduling of multi robot intelligent agents in natural language driven application scenarios such as homes.
  • Sun Haifeng, Yao Junping, Li Xiaojun, †, Liu Yanfei, Gu Hongyang
    Accepted: 2025-08-21
    Short-term action anticipation, a crucial task in video understanding, aims to model spatiotemporal and semantic features of historical actions to infer behavioral intentions and goals from observed physical motions. This technology enables precise prediction of interactive behaviors within the next few seconds. It demonstrates broad application prospects in human-machine collaboration, security surveillance, autonomous driving, and augmented reality. In recent years, with breakthroughs in deep learning,particularly in feature extraction models and high-quality datasets within the field of video understanding,short-term action anticipation has transitioned from knowledge-driven machine learning paradigms to data-driven deep learning frameworks. This survey systematically reviews the latest advancements in deep learning methods for short-term action anticipation, aiming to provide references and insights to related research and practical application . The analysis establishes a classification framework through three dimensions: model architecture innovation, training strategy implementation, and contextual modeling approaches. It examines core technologies and challenges, while detailing the characteristics, applicable scenarios, and research progress of each methodology category. Finally, potential future research directions were summarized and prospected, including multi-view collaborative prediction, real-time model inference validation, weakly-supervised learning from untrimmed data, few-shot class-incremental tiveization, dynamic open-scene adaptation, variable time interval anticipation.
  • Lai Guoyan, Chen Hui
    Accepted: 2025-08-20
    Aiming at the problem of data privacy protection in multilingual machine translation, A multilingual text-to-text transfer transformer (mT5) translation model incorporating a differential privacy mechanism is proposed, which protects user privacy while maintaining translation quality. First, gradient clipping is introduced during the model fine-tuning phase to limit the norm of each sample’s gradient, controlling the maximum influence of a single sample on parameter updates, thereby reducing overall sensitivity and providing a theoretical basis for differential privacy. Second, Gaussian noise that satisfies differential privacy constraints is injected based on the clipped gradients, and the model's resistance to member inference attacks is enhanced by perturbing the aggregated gradients. Finally, based on differential privacy theory, the privacy budget is set and training rounds and noise intensity are adjusted to achieve an optimal trade-off between privacy protection and translation performance. The experiments were conducted on a standard multilingual translation dataset, and the model performance was evaluated using the Bilingual Evaluation Understudy (BLEU) metric. The ablation experiment further verifies the synergy of the three technologies. The results show that the translation quality degradation is controlled between 9% and 28%, which is in line with the reasonable range in practical applications. Experimental comparison with other machine translation models on the same dataset shows that although the BLEU score drops by about 5% to 6% on average, the privacy protection ability of the model is effectively improved while ensuring the translation quality. Through the member inference attack experiment, the attack success rate of the standard Transformer model is 78.3%, while the attack success rate of the differential privacy mT5 model is reduced to 52.4%, further proving the advantage of this model in privacy protection.
  • Wang ShanShan, Du CunPeng, Wang XingTong, Ma Hao, Chen ZhenXiang, Yang Bo
    Accepted: 2025-08-15
    Large Language Model(LLM) has demonstrated remarkable performance across various domains. However, their security vulnerabilities, particularly regarding jailbreak attacks, have raised significant concerns. In this work, we propose a novel and more practical indirect jailbreak attack method, named Self-Retrieval Induced Self-Jailbreak (SRIS). It retrieves knowledge from within the LLM itself, utilizing internally generated information to induce the model to produce harmful responses. our approach does not rely on external knowledge, making the jailbreak attack more feasible and easier to execute. We have conducted extensive experiments on seven state-of-the-art LLMs. The experimental results show that SRIS significantly outperforms existing methods in attack success rates, with a maximum success rate of 74.76% on GPT-3.5 and 56.8% on GPT-4. In most of the question domains, SRIS Significantly outperforms others, highlighting robustness and broad applicability. Our findings emphasize the importance of careful training data selection for LLMs. We advocate for further research into safer development practices to improve the overall security and reliability of LLMs in practical applications.
  • Huan Zhan, Wang Yi, Wang Cheng , Wang WenTao , Lin Zhi Quan
    Accepted: 2025-08-15
    With the rapid development of Industrial Internet of Things (IIoT) technology, the increasing proliferation of smart devices imposes substantial computational and transmission burdens on traditional Ethernet networks. Moreover, ensuring deterministic transmission of tasks remains a significant challenge due to the heterogeneity of industrial tasks, which vary in type, priority, and computational resource requirements. To address these challenges, the integration of edge computing (EC) and time-sensitive networking (TSN) can better meet the requirements of task offloading and resource allocation in complex industrial networks. Specifically, a MAPPO-Conv strategy is proposed to effectively minimize the average total delay and energy consumption, while considering critical factors such as task deadlines, device computational power, maximum transmit power, and energy constraints. The proposed algorithm introduces the Multi-Agent Proximal Policy Optimization (MAPPO), which employs reward functions tailored for tasks with multiple priorities, enabling devices to make real-time, efficient power control and offloading decisions within each time slot. Furthermore, a convex optimization approach is applied to effectively allocate computational resources and optimize energy consumption based on the task requirements. Experimental results show that the proposed scheme not only shows obvious advantages in terms of latency and energy consumption, but also can ensure the real-time processing requirements of high-priority tasks in a resource-constrained network environment; in a simulation involving 30 terminal devices, compared with other multi-agent reinforcement learning algorithms (IPPO, MADDPG) and benchmark algorithms, the latency is reduced by an average of 25.9% and the energy consumption is reduced by an average of 18.9%.
  • Cao Tianya, Li Kang, Jia Junjie
    Accepted: 2025-08-15
    Collaborative filtering recommendation systems, due to their open architecture, face the issue of malicious users manipulating the model through shilling attacks to disrupt recommendation accuracy. Existing detection methods extract features from ratings based on the characteristics of different attack models. However, due to the limited range of the ratings themselves, the extracted features lack sufficient distinctiveness. Additionally, existing methods often overlook interaction information among users, resulting in incomplete detection features. To address these limitations, we propose a novel shilling attack detection algorithm based on weighted rating and weighted interaction. The algorithm employs item popularity to weight ratings and transforms them into grayscale image representations, enabling convolutional neural networks to extract more distinctive rating features. Simultaneously, it utilizes user similarity to weight interaction data, with fully connected neural networks effectively capturing inter-user relational features. Finally, feature fusion is performed through a neural network to obtain more comprehensive user features, thereby enhancing the performance of the classifier in shilling attack detection. Experimental results on the MovieLens-100k dataset demonstrate that the algorithm can effectively and stably detect various attack models under different filler rates and attack intensities. Results on the MovieLens-1M dataset show that the algorithm can handle more complex scenarios where four attack models coexist, exhibiting strong robustness.
  • ZHENG Yang, WANG Lei, SHENG Jie
    Accepted: 2025-08-13
    Multimodal sentiment analysis utilizes multimodal data to infer human emotions. However, existing models significantly degrade in performance when faced with issues such as modal information loss, text dependency, and cross-modal conflicts. To address this, a Generative Completion and Dynamic Knowledge Fusion Model (GC-DKF) for multimodal sentiment analysis is proposed. First, the model employs a generative prompt learning module to complete the missing intra-modal and inter-modal information in the original data, generating missing modal features to enhance the model's adaptability to uncertain modal scenarios. Then, a dynamic dominant modality selection mechanism is designed to dynamically select the dominant modality based on emotional proportion factors. Meanwhile, a knowledge encoder is introduced to strengthen the representation capability of a single modality, obtaining knowledge-enhanced representations of each modality. Finally, guided by the features of the dominant modality, the model further learns other secondary modalities to generate complementary multimodal fusion joint representations, achieving more efficient and accurate multimodal sentiment analysis. Experiments on the public CMU-MOSI and CMU-MOSEI datasets demonstrate that the proposed model outperforms existing mainstream multimodal sentiment recognition methods in terms of metrics such as binary classification accuracy, F1 score, mean absolute error, and Pearson correlation coefficient, with sentiment recognition accuracies reaching as high as 83.55% and 83.02%, respectively. This fully demonstrates that the proposed model has strong competitiveness in multimodal sentiment recognition tasks.
  • YANG Longfei, LAI Huicheng, DU Haohao, ZHANG Guo
    Accepted: 2025-08-12
    Aiming at the problems of slow convergence and single strategy of the Sine - Cosine Algorithm (SCA) in optimization problems, as well as the issues of zig - zag paths and oscillation in dynamic obstacle avoidance in local dynamic path planning, this paper proposes a Multi - Strategy Adaptive Network Sine - Cosine Algorithm (MANSCA). Firstly, by introducing differential strategies, global update strategies and local update strategies, a multi - strategy adaptive network is constructed, and the strategy weights are dynamically adjusted based on the roulette wheel selection mechanism to enhance the global search and local development abilities of the algorithm. Secondly, for path planning problems, a goal - oriented strategy is proposed. The gravitational potential field function is improved to balance the attraction of the target point and reduce path oscillation. At the same time, a dynamic obstacle avoidance strategy is designed. The obstacle - avoidance direction of the robot is adjusted by combining it with the moving direction of obstacles, which avoids the problem of direction - missing in the traditional node - deletion method.The effectiveness of the MANSCA algorithm has been verified in the CEC2015 benchmark functions and CEC2022 benchmark function problems, and it is competitive compared with other latest meta - heuristics. When applied to multi - robot local path planning in complex environments with static and dynamic obstacles, the proposed algorithm respectively reduces the total travel distance and maximum node number by about 62.6% and 63% compared with SCA.
  • WANG Xiangqian, PAN Shiwei, LV Yafei, XIANG Yunlong, JING ku
    Accepted: 2025-08-12
    Efficiency optimization for deep learning models remains a key research focus in artificial intelligence applications. When deploying deep learning models, efficiency improvements can be achieved by reducing operator scheduling overhead and improving operator execution efficiency. This paper targets the Bi-directional Long-Short Term Memory (Bi-LSTM) structure widely used in temporal networks. Leveraging the input reuse characteristics between forward and reverse Long-Short Term Memory (LSTM) cells in its architecture, the study proposes an efficiency optimization method for Bi-LSTM operators on LUNA chips through operator fusion and tensor computation consolidation techniques. This method enhances the execution efficiency of the Bi-LSTM operator by eliminating redundant operations, reusing data, and merging tensor computations to reduce time overhead. The algorithm can also extend to other temporal network operators like Bi-RNN and Bi-GRU. An experimental platform is established on the edge heterogeneous computing chip LUNA to validate the optimization algorithm. Test results demonstrate that the proposed Bi-LSTM efficiency optimization approach achieves a maximum optimization improvement of 30%.
  • GUO Hui, DING Chuntao, ZHANG Junna
    Accepted: 2025-08-12
    With the widespread application of internet of things devices, efficiently deploying robust Convolutional Neural Network (CNN) on resource-constrained internet of things devices becomes a significant challenge. Existing methods that rely on cloud servers to assist CNN training reduce the parameter transmission between the cloud and devices, but they do not reduce inference computation and show limited robustness. To address this issue, a group-based method for generating filters using multiple nonlinear transformation functions (GroupMNL) is proposed. The method first randomly generates a small number of standard filters in each convolutional layer as seed filters. These seed filters are grouped, and different types of nonlinear transform functions are applied to different groups to generate different filters. The generated parameters are unlearned, which reduces the number of trainable parameters in the CNN. Then, the seed filters and generated filters are concatenated to form a complete convolutional layer. Group convolution is introduced during the convolution operation to reduce computational cost. Finally, to further enhance the robustness of the CNN model, group normalization is introduced and combined with multiple nonlinear transformations to strengthen the model's regularization effect, thereby improving its robustness. The experimental results show that the ResNet101 model based on the GroupMNL method reduces the learnable parameters by 87%, decreases the computational cost by 71%, and improves the model's robustness by 6.09% compared to the standard model.
  • LIN Jiarong, LIU Li
    Accepted: 2025-07-31
    Multi-object tracking (MOT) faces numerous challenges in the field of computer vision, such as target occlusion and appearance similarity, which significantly constrain tracking accuracy and robustness. To address these issues effectively, a new multi-object tracking method, TBSTrack, is proposed. The method consists of three core modules: temporal prediction, feature extraction, and stage-wise matching. The temporal prediction module constructs a temporal information buffer and uses a self-attention mechanism to calculate the predicted results for the current frame, enhancing the spatiotemporal association of targets and accurately predicting their positions. The feature extraction module handles occluded targets through segmentation, employing convolutional neural network (CNN) to extract features from each segment, and then merges them based on the occlusion status, effectively eliminating interference and enabling robust target feature representation. The stage-wise matching module adopts a two-stage matching strategy, utilizing learnable anchors to recover missed targets during matching and mining potential targets from the background. The final tracking results are obtained by integrating both, updating the temporal information. To evaluate the method's performance, experiments are conducted on the MOT17, DanceTrack, and SportsMOT datasets. The results show that the method achieves HOTA scores of 63.9%, 57.3%, and 75.6%, and IDF1 scores of 79.6%, 56.7%, and 78.8%, respectively. Experimental results demonstrate that the method significantly improves the accuracy and robustness of multi-object tracking, especially in complex scenarios, providing an effective solution for multi-object tracking.
  • FENG Guang, XIANG Feng, HUANG Rongcan, ZHOU Yuanhua, ZHENG R unting, YANG Yanru, LIU Tianxiang, LI Weichen
    Accepted: 2025-07-31
    In multimodal sentiment analysis, traditional methods rely on directly fusing multimodal information, while modality-specific private features are often overlooked in cross-modal interactions. This may reduce accuracy and robustness in handling complex sentiment expressions. Particularly in smart education scenarios, teachers need to accurately assess students' learning states and emotional fluctuations by analyzing their speech, facial expressions, and textual feedback. Thus, enhancing the precision of multimodal sentiment analysis is crucial for personalized learning and classroom interaction.To address this issue, this study proposes a sentiment analysis model that integrates private feature learning and contrastive learning. First, to fully leverage private features, the model compares shared features with the original text, audio, and visual features to identify modality-specific information that is often overlooked in cross-modal interactions. The private features are then fused with the shared features to enhance the model’s expressive capability. Second, a Modality-Agnostic Contrastive Loss (MACL) is introduced to perform contrastive learning on the fused multimodal features, effectively capturing sentiment information from different modalities while mitigating cross-modal discrepancies to obtain a unified sentiment representation.Experimental results on the CMU-MOSI and CMU-MOSEI datasets demonstrate that the proposed model achieves F1 scores of 85.98% and 85.95%, with binary classification accuracy reaching 86.01% and 85.97%, respectively. These results demonstrate a significant improvement over state-of-the-art models, validating the effectiveness of the proposed approach.
  • Wang Keke, Yan Nannan
    Accepted: 2025-07-31
    Due to its tamper proof and traceable characteristics, blockchain provides transparency and security guarantees for data storage and transactions, demonstrating its wide applicability in fields such as finance, livelihood services, and public management. With the deep penetration of blockchain technology in diversified application scenarios, its underlying architecture needs to cope with the dual pressure of massive data storage and high-frequency transaction processing. To overcome the scalability bottleneck of blockchain systems, blockchain sharding technology has emerged, which effectively improves the overall throughput of the system through distributed parallel computing. This technology has been successfully validated in Ethereum 2.0. However, there are still issues with frequent cross shard transactions, high reconfiguration costs, and uneven load between shards in current blockchain network sharding. In response to these issues, this article proposes a blockchain dynamic network sharding strategy based on incremental graph partitioning. The incremental graph partitioning technique is used to partition the blockchain network transaction graph, and the sharding is adjusted in a timely manner according to the changes in nodes in the network. When a new node is added to the transaction graph, only the newly added part is processed, without the need to re partition from scratch, reducing the proportion of cross shard transactions and better adapting to the dynamic changes of the blockchain network; Propose a Dynamic Sharding Load Optimization Algorithm (DSLO) to calculate the weights of transactions on each shard, and introduce a Frequency Time Counter (LFU-TTL) into the algorithm to predict shard load by combining transaction frequency and survival time, optimizing load balancing. The experimental results show that this scheme can effectively reduce the proportion of cross shard transactions, data replication costs during shard reconfiguration, and improve load balancing between shards.
  • WANG Xindi, CHAI Xiaoli, XU Xiaofei, SHE Ping
    Accepted: 2025-07-31
    Most existing trajectory similarity measurement methods in the maritime domain rely on traditional algorithms, which often suffer from high computational complexity. Although a few deep learning-based approaches have been proposed, some of them lack joint modeling of spatial and temporal features, resulting in limited accuracy and robustness in similarity measurement. To address these issues, this paper proposes a novel model named MDU-net (Marine Density U-Net), which can automatically extract low-dimensional features from ship trajectories, enabling efficient and reliable retrieval of trajectories similar to a specified target. Specifically, The trajectory data is first interpolated at equal time intervals, and then kernel density estimation is applied to generate grayscale maps that integrate spatial and velocity information, thereby achieving trajectory pixelization. Then, an unsupervised U-net-based neural network is employed to learn the low-dimensional representations of trajectories. Finally, cosine distance is calculated between the feature vectors to construct a similarity matrix and quantify trajectory similarities. Experimental results demonstrate that MDU-net significantly outperforms traditional methods and mainstream deep learning models across multiple evaluation metrics. Compared with the classical Dynamic Time Warping (DTW) method, MDU-net improves the Top-10 hit rate by 7.7 percentage points, and by approximately 14.7 percentage points over the Hausdorff distance. In comparisons with deep learning models, the advantage of MDU-net is even more pronounced, it achieves a 25 percentage point improvement in Top-10 hit rate over the Convolutional Autoencoder (CAE), fully validating the superior effectiveness of MDU-net in ship trajectory similarity measurement tasks.