Blockchain has gradually evolved into a critical infrastructure that supports the digital economy. However, its inherent characteristics such as anonymity, cross-chain interoperability, and multi-party participation have led to frequent security incidents, including fraud, money laundering, and cyberattacks, which pose serious threats to the stability and compliance of the blockchain ecosystem. Although existing analytical tools and methods have made notable progress in blockchain service security, they suffer from limited generalizability, insufficient reasoning capabilities, and poor adaptability to the evolution of complex business logic. The rapid development of generative Large Language Model (LLM) has significantly reshaped the service computing paradigm. With their strong capabilities in natural language understanding, knowledge reasoning, and multimodal integration, LLM provide new perspectives and technical pathways for research on blockchain service security. This paper systematically reviews the progress of LLM applications in three major areas: pre-event smart contract auditing, in-event anomaly detection, and post-event cross-chain behavior correlation. Further, it summarizes their advantages and limitations and highlights representative practices of LLM-enabled blockchain security. Finally, open research challenges and future directions are discussed, aiming to provide insights for building a trustworthy, interpretable, and efficient framework for blockchain service computing and governance.
Large Language Model (LLM)-based Multi-Agent System (MAS) has demonstrated significant potential in handling complex tasks. Their distributed nature and interaction uncertainty can lead to diverse anomalies that threaten system reliability. This paper presents a comprehensive review, identifying and classifying these anomalies systematically. Seven representative multi-agent systems and their corresponding datasets are selected, accounting for 13 418 operational traces, and a hybrid data analysis method is employed, combining preliminary LLM analysis with expert manual validation. A fine-grained, four-level anomaly classification framework is constructed, encompassing the following anomalies: model understanding and perception, agent interaction, task execution, and external environment. Typical cases are analyzed to reveal the underlying logic and external causes of each type of anomaly. Statistical analysis indicates that model understanding and perception anomalies account for the highest proportion, with ″context hallucination″ and ″task instruction misunderstanding″ being the primary issues. Agent interaction anomalies represent 16.8%, primarily caused by ″information concealment″. Task execution anomalies account for 27.1%, mainly characterized by ″repetitive decision errors″. External environment anomalies account for 18.3%, with ″memory conflicts″ as the predominant factor. In addition, the model perception and understanding of anomalies often act as root causes, triggering anomalies at other levels, highlighting the importance of enhancing fundamental model capabilities. These classification and root cause analyses aim to provide theoretical support and practical reference for building highly reliable LLM-based multi-agent systems.
Post-Training Quantization (PTQ) is an efficient model compression method that converts the parameters of high-precision floating-point models into low-bit integer representations without requiring retraining, using only a small amount of unlabeled calibration data. This method significantly reduces storage and computational overhead while maximizing the retention of the original model's inference accuracy; therefore, it is widely recognized and adopted in both academia and industry. This paper systematically summarizes the progress of research on PTQ from four dimensions: quantization steps, method classification, tool ecosystem, and application advancements. First, a clear framework for the quantization process is constructed, covering steps such as dynamic range statistics, quantization parameter calculation, weight and activation quantization, error optimization, and model generation. Second, a complete classification system for quantization methods is proposed, which includes quantization granularity, bit width, calibration methods, and structure-guided quantization. Third, the tool ecosystem supporting the large-scale application of PTQ is analyzed, and its value in hardware adaptation and engineering deployment is discussed. Finally, this paper summarizes the progress in the integration and application of PTQ methods and highlights practical challenges, particularly those related to cross-modal consistency, extremely low-bit semantic collapse, and hardware adaptation. These challenges not only reveal the limitations of current technologies but also provide important directions for future research. This review provides a reference framework for PTQ methods in academia and industry, thereby facilitating the widespread application of artificial intelligence in resource-constrained scenarios.
With the rapid development of the Internet, cloud computing, and artificial intelligence, service recommendation has become a key technique in service computing. It helps users find appropriate services quickly and accurately, improves resource utilization, and enhances user experience. This paper presents a systematic review of the research progress in service recommendation and summarizes representative studies. This review introduces three main recommendation methods: traditional method, context-aware, and neural network-based. Each category is described in terms of fundamental principles, typical applications, advantages, and limitations. This paper also discusses the major challenges in service recommendation, including data sparsity and cold start; incomplete and noisy Quality of Service (QoS) data; dynamic changes in services and contexts; insufficient explainability; and issues of real-time performance, scalability, privacy, and security. Finally, this paper presents an overview of the limitations of current research and explores future research directions. Emerging technologies, such as big data analytics, Knowledge Graphs (KGs), deep learning, Large Language Models (LLMs), and reinforcement learning, have been highlighted as promising approaches for improving the intelligence, personalization, and trustworthiness of service recommendations. This review provides a comprehensive understanding of the field and serves as a valuable reference for further research and practical applications.
As the paradigm of ″decentralized next-generation Internet, ″ Web3, relying on blockchain technology, has become an emerging field with great potential in the digital intelligence service ecosystem. However, Web3 phishing websites pose a serious threat to ecological health. Phishers carefully design domain names as the primary bait, inducing users to visit and engage in high-risk operations to steal digital assets. Currently, the antiphishing works of Web3 primarily focus on phishing account detection, phishing transaction detection, and phishing gang mining, whereas the existing phishing website domain name detection primarily targets traditional phishing websites, which have limitations such as insufficient adaptability and a lack of systematic analysis. To this end, a detection method called WPWHunter is proposed for Web3 phishing website domain names, which conducts multidimensional analysis on the detected real Web3 phishing websites and explores the potential application of Large Language Model (LLM) in web page analysis. The WPWHunter algorithm detects three features in Web3 phishing website domain names: inducing words, visual deception, and item name imitation. The experimental results show that WPWHunter can effectively detect suspicious Web3 phishing domains with a G-means index of 0.769 on a test set, which is 0.048 higher than that of the best-performing baseline method. Additionally, as a supplementary exploratory experiment, three universal LLM are used to analyze the content of Web3 phishing websites that WPWHunter failed to detect and the logic used by LLM to determine Web3 phishing websites is summarized.
Purpose-driven Artificial Intelligence (AI) systems must exhibit adaptive purpose perception, dynamic adjustments, and multi-level feedback when operated in complex and evolving environments. However, traditional AI models lack a unified mechanism for modeling purpose lifecycle, thereby resulting in challenges in behavior traceability, control, and optimization, which, in turn, limit interpretability and long-term effectiveness. This paper proposes a Data-Information-Knowledge-Wisdom-Purpose (DIKWP)-based semantic framework for purpose lifecycle management oriented toward cognitive evolutionary pathways. This mechanism consists of five semantic stages: data-layer dynamic verification, information-layer migration response, knowledge-layer logical reconstruction, wisdom-layer value evolution, and purpose-layer goal closure and conflict regulation. A multi-level, multi-goal, and multi-feedback semantic governance structure is formed. In addition, multi-layer graph modeling and cognitive space differentiation are introduced, specifically between conceptual and semantic spaces, to enable structured and visual modeling of purpose generation, updating, and tuning. By integrating the dual-loop structure of "experience-narrative" from artificial consciousness theory, the purpose stability and adaptability of the system in interactive environments are enhanced. The proposed mechanism was theoretically validated in smart home and smart city scenarios. The experimental results demonstrate its generality, scalability, and robustness, offering theoretical and engineering support for value alignment, semantic safety, and autonomous evolution in sovereign AI systems.
Link prediction is an important task in graph machine learning that aims to recover missing edges in graphs or predict potential future connections between nodes. Link prediction has various applications across graphs of different types, such as friend recommendations in social networks, recommendation systems on user-item bipartite graphs, and knowledge graph completion. With the advancement of Graph Neural Networks (GNNs), GNN-based methods have become increasingly important in link predictions. These methods can be broadly categorized into node-based and subgraph-based approaches. Compared to node-based methods, subgraph-based approaches better capture the topological structure between nodes and avoid node isomorphism. Current subgraph-based methods utilize enclosing subgraphs that include the target nodes and their first- or second-order neighbors. However, these enclosing subgraphs can be overly large and susceptible to the influence of central nodes. To address this issue, this paper proposes a link prediction method using simple path graphs. Under certain order constraints, simple path graphs have been proven to be subgraphs of enclosing subgraphs, effectively reducing subgraph size. Furthermore, even when relaxing these order constraints, simple path graphs remain smaller than enclosing subgraphs. Experimental results show that the method based on simple path graphs outperforms other methods on datasets, both with and without node features, and has a better link prediction performance.
Speech recognition technology enables machines to understand human speech using advanced algorithms and signal processing technologies, thereby making communication between humans and machines more convenient. Most existing studies on end-to-end speech recognition focus on optimizing the Conformer model. The Conformer encoder suffers from the issue of insufficient extraction of fine-grained local speech features. To resolve these issues, this study proposes a Chinese speech recognition method based on Max Pooling (MP). First, the output of the gated linear unit in the convolutional module of the encoder is max-pooled along the time dimension to extract fine-grained local features corresponding to the characteristics of multiple speech signal frames. Second, these pooled features are fused with the coarse-grained local features extracted via Depthwise Convolution (DWC) using the element-wise sum method to increase the amount of information on local speech features and improve the speech recognition accuracy of the Conformer model. The experimental results on the public Chinese dataset Aishell-1 show that the improved model can reduce the Character Error Rate (CER) of the baseline model from 5.58% to 5.32% and from 5.06% to 4.92% by decoding using greedy search and attention rescoring, respectively.
Unsupervised Domain Adaptation (UDA) aims to transfer knowledge from a source domain with labeled samples to a target domain without labeled samples. UDA assumes that the source and target domains have the same categories, which is often challenging to achieve in real-world scenarios. The target domain usually contains new categories of samples that are not found in the source domain, this setup is called Open-Set Domain Adaptation (OSDA). In OSDA, the abundance of domain-specific features makes learning domain-invariant representations a significant challenge. Existing OSDA methods tend to ignore domain-specific features and directly minimize domain differences, which may lead to unclear boundaries between categories and weaken the generalization ability of the model. To address this problem, the OSDA method based on a Transition Bridge Mechanism (OSTBM) is proposed. Specifically, the OSTBM adds a transition bridging mechanism to the feature extractor and domain discriminator to reduce the interference of domain-specific features in the overall transfer process and improves the discriminative ability of the domain discriminator. This enables better alignment of the source distribution with the known target distribution in the feature alignment process and pushes the unknown target distribution away from the decision boundary. The experimental results show that the proposed method outperforms existing OSDA methods on multiple benchmark datasets, demonstrating its superior performance.
Graph Neural Network (GNN) excels in node classification tasks, but its message-passing mechanism causes neighbor-fetching latency, limiting deployment in latency-sensitive applications. Despite being less accurate than GNN in node classification tasks, Multi-Layer Perceptron (MLP) is preferred in practical industrial applications owing to its efficient inference. Given the complementary advantages and disadvantages of GNN and MLP, this paper proposes an optimized inference method for GNN based on adversarial training and contrastive representation distillation. This method aims to transfer the knowledge learned from a GNN teacher model to a more efficient MLP student model. This method uses the Fast Gradient Sign Method (FGSM) to generate feature perturbations and combines them with node content features as input for the student model. Adversarial training is conducted under the guidance of real labels and the teacher model's Softmax probability distribution to reduce the student model's sensitivity to node feature noise. The contrastive representation distillation module treats embeddings of the student and teacher models on the same node's output as positive sample pairs and embeddings of different nodes' outputs as negative sample pairs. By minimizing the distance between positive sample pairs and maximizing the distance between negative sample pairs, the student model can capture the relationships between node embeddings output by the teacher model, thereby preserving the global topological structure of GNN. Experiment results on public datasets demonstrate that, when using GraphSAGE as the teacher model, an MLP student model trained by this method achieves an inference speed that is 89 times that of GraphSAGE. Additionally, its accuracy improves by 14.12 and 2.02 percentage points on average compared to those of vanilla MLP and GraphSAGE, respectively, outperforming the two baseline methods.
With the rapid development of information technology, link prediction has been widely applied in various fields. Current link prediction methods are based on subgraph extraction. Models based on Line Graph Transformation (LGT) and Graph Convolutional Network (GCN) achieve excellent results in link prediction. However, two problems remain: 1) the high time complexity of the LGT and the large size of the line graph hinder its wide-spread application; 2) GCN ignores the high-order relationship and local clustering structure between nodes, thereby affecting prediction accuracy. To solve the above issues, this paper proposes a link prediction method based on Hypergraph Convolutional Network (HGCN), called HGLP. This method replaces LGT with Dual Hypergraph Transformation (DHT) to improve system efficiency without losing structural information and applies HGCN to learn the higher-order features of the hypernodes and hyperedges in the hypergraph to obtain higher prediction accuracy. Experimental results show that the proposed method outperforms state-of-the-art link prediction methods on seven real-world datasets from different domains, in terms of Area Under the Curve (AUC) and Average Precision (AP). Furthermore, the proposed method achieves shorter runtimes and less memory usage.
To address the slow convergence and susceptibility of the traditional Firefly Algorithm (FA) to local optima in solving optimization problems, this paper proposes a dynamic firefly algorithm. The proposed algorithm is integrated with neighborhood rough set theory for feature selection, effectively processing continuous values and enhancing the performance of feature selection. The algorithm improves the FA search strategy by incorporating the Precedence Operation Crossover (POX) mutation strategy and threshold settings to control the probability of firefly crossover and mutation, thereby enabling individuals trapped in local optima to escape. Furthermore, it introduces a new information entropy model-the neighborhood granular conditional entropy-by combining neighborhood knowledge granularity with conditional entropy to balance knowledge completeness and granularity. The feature selection algorithm FS_NGHFAPOX, which is based on neighborhood granular conditional entropy and the dynamic firefly algorithm, constructs the fitness function to improve the evaluation of feature subsets. Experiments conducted on several datasets from the UCI repository and built-in databases of the scikit-learn machine learning library demonstrate that the FS_NGHFAPOX algorithm achieves optimal classification performance with a smaller number of selected feature subsets. Specifically, the FS_NGHFAPOX algorithm achieved an average accuracy of 0.83 on the experimental datasets, which is up to 15% higher than those of the other feature selection algorithms.
Current U-Net-based pavement crack detection methods do not fully consider the interaction between the features of each level of the encoder, causing incomplete detection results or missed detections because of information loss during the downsampling process. To address this issue, this study proposes a pavement crack detection method based on multi-level feature fusion. In the encoding stage, the features of cracks at different levels are extracted to form crack feature representations from shallow to deep layers. In the skip connection section, a cross-level fusion strategy based on an improved Channel Cross Transformer (CCT) is adopted to enhance the complementarity between features at each level and enrich the expression of crack features. In the decoding stage, the feature fusion module is used to optimize the decoder's utilization of encoder features, promote the transmission of crack features, and improve the perception ability of crack features. In a series of comparative and ablation experiments on two public datasets, DeepCrack and CRACK500, the proposed method outperforms six other methods, including DeepCrack and Swin-UNet. On DeepCrack, the proposed method increases the F1 value by 2.30 and 2.51 percentage points, respectively, compared to those of DeepCrack and Swin-UNet, while on CRACK500, it increases by 1.65 and 1.00 percentage points, respectively.
Detection of fabric defects is an indispensable step in textile production. However, obtaining large amounts of annotated data in practical situations is difficult. The unsupervised domain adaptation method provides an effective solution to this problem, which can improve the model performance without target domain data annotation. However, although existing methods perform well on classical datasets, their model performance decreases significantly when applied to more complex and textured fabric defect detection tasks. To address this issue, a Texture Knowledge Guided (TKG) cross-domain fabric defect detection method is proposed to enhance the detection performance of the object detection transfer model on fabric images. The TKG method comprises three key components: texture enhancement, joint attention, and consistency-adversarial modules. The texture enhancement module enhances the texture information in the input image via Fourier transform, enabling the model to better capture complex texture features. The joint attention module introduces an attention mechanism that can capture more comprehensive texture and structural information. By adaptively adjusting the weights of different regions and channels, it enhances the attention of the model to key textures and defect areas. The consistency-adversarial module enhances the adaptability of the model to target domain data via consensus training and adversarial training, improving the detection performance of the model in the target domain. The experimental results show that compared with the comparative methods, the TKG method exhibits significant superiority in fabric defect target detection tasks. In the cross-domain detection experiment from twill to plain weave, the TKG method achieves a performance improvement of up to 3.1 percentage points in mAP@0.5, reflecting the excellent cross-domain defect detection capability of this method using actual fabric production environment data.
The spatial information of a Universal Adversarial Perturbation (UAP) intuitively represents the visual characteristics of perturbations, whereas the frequency domain information includes the structure and texture of perturbations. Joint analysis of the spatial and frequency domain information of perturbations helps understand the generation mechanism of UAP and its impact on the robustness of image classification models. Most existing studies have focused on the distribution and changes in perturbed spatial information, neglecting the role of frequency components and limiting the generalization ability of the UAP. To address this issue, a joint optimization method for image UAP generation in the spatial and frequency domains is proposed. This method utilizes the adversarial sample confidence loss, perturbation spatial distance loss, and perturbation frequency guidance loss to train the model from both spatial and frequency perspectives, generating a UAP with high attack and transferability. The adversarial sample confidence loss is used to enhance the aggressiveness of disturbances, disturbance spatial distance loss optimizes the spatial size of disturbances, and disturbance frequency guided loss controls the proportion of the frequency components in disturbances. The experimental results indicate that the low-frequency components of the UAP have a significant impact on attack effectiveness. Within the same perturbation space, the more low-frequency components, the higher the success rate of perturbation attacks. Compared with the baseline method, the UAP generated by jointly optimizing the spatial and frequency domains has strong aggressiveness and transferability. Moreover, it has significant advantages in terms of generation speed.
Remote sensing images usually contain multiple land features and semantic information. Using multi-label learning methods to classify remote sensing images can improve the understanding of image semantics. However, owing to the subjectivity of manual annotation and the complexity of remote sensing image targets, inaccurate image annotation can lead to the introduction of noise (additive noise) or label loss (subtractive noise)—collectively referred to as mixed noise. Their presence can mislead the algorithm training process and reduce classification performance. A multi-label classification algorithm for remote sensing images is proposed to address the issue of mixed noise. First, the images are subjected to strong and weak enhancement transformations, and the two types of enhanced images are fed into two networks with the same structure for collaborative learning. Second, by constraining the consistency within two images and the structural consistency between images and corresponding labels during the training process and then combining the two constraints with Binary Cross Entropy (BCE) loss, the final loss function is formed. Finally, based on the prediction labels, the sorting error is defined to identify and correct noisy labels in the loss function, thereby improving the robustness of the model. To verify the performance of the proposed method, mixed noise multi-label classification experiments are conducted on the remote sensing image multi-label datasets AID, UCM, and DFC15. The proposed method is compared with various multi-label classification methods in terms of image classification indicators and multi-label classification indicators. The results indicate that the overall performance of the proposed method is optimal under different label-to-noise ratios.
The chip industry is critical for national security and economic development, and Integrated Circuit (IC) Reverse Engineering (RE), as a means of analyzing the internal performance of chips, is an important link in the chip industry chain. RE includes steps such as layer-by-layer acquisition of chip images using Scanning Electron Microscopy (SEM), identification of devices, extraction of gate netlists, and inference of their functions. Segmentation of electrical components and metal lines from the IC image background is a prerequisite for identifying devices and other steps. However, traditional image segmentation methods cannot adapt to the complex and ever-changing circuit conditions of IC images owing to the lack of expert experience in learning. To this end, the HE-UNet method is proposed for extracting metal lines and vias from IC images. HE-UNet consists of three steps: first, the U-M2 network is used to extract noisy features from chip images; second, the Hough circle detection algorithm is used to remove noise around the via holes; and third, edge detection pooling is used to remove noise from the via holes. Experiments conducted on IC images with a size of 1 024×1 024 pixels reveal that HE-UNet can effectively segment metal lines and vias, with a mean Intersection over Union (mIoU) of 98.24% and Mean Pixel Accuracy (MPA) of 99.11%, both of which are superior to those achieved by other methods.
The widely used cross-field star Transformer has achieved good results in detecting Human-Object Interaction (HOI). This study proposes a new Transformer network, Knowledge Distillation-based Transformer (KDT), for HOI detection. Owing to the roughness of the overall HOI features modeled by the Transformer network, a basic multi-branched structure exists for the three tasks of HOI detection: prediction of human boxes, prediction of object boxes and object categories, and prediction of interaction categories. The basic multi-branched structure comprises a human instance branch, an object instance branch, and an interaction branch. Human and object branch decoders are used to provide interaction branch decoders with the regional tips of the human object. To provide key semantic and spatial information for the Transformer structure, the semantic features of the object categories and interaction verbs, as well as the spatial features of humans and object boxes, are generated to provide semantic and spatial tips for different Transformer branches, which further improves the feature extraction capability of the decoders. Next, the study proposes another multi-branched Transformer structure as a teacher network. The teacher network decoders output accurate HOI using the generated features as decoder queries. During the training process, the basic multi-branched network is allowed to imitate the output of the teacher network. Finally, the study presents an additional category similarity loss to measure the intra- and inter-category similarities between the output predictions of the two networks, thereby improving the performance of the basic network decoder. Experimental results show that the mean Average Precision (mAP) for all categories, rare categories, and non-rare categories on the HOI benchmark dataset HICO-DET are 32.13%, 28.57%, and 33.19%, respectively, achieving the highest increase of 4.65% compared with the baseline.
Unsupervised person Re-Identification (Re-ID) aims to mine discriminative representations from unlabeled data for person retrieval. Currently, unsupervised person Re-ID methods based on pseudo-labels have achieved remarkable progress. However, the noise introduced during the training process and incomplete utilization of information limit its further development. This paper proposes a multigrained teacher-student network that integrates shallow spatial and frequency information. First, it simultaneously considers global and local features and integrates them into clustering-based contrastive learning, enriching feature representation. A well-trained teacher model is used to guide the student model to converge quickly, thereby reducing the interference of noisy pseudo-labels. Second, a novel spatial frequency interaction module that utilizes useful information in the shallow spatial and frequency domains that is lost during the network deepening process is proposed. Additionally, a recycling strategy is adopted in the training process of the student network, in which some unclustered instances that are directly discarded in the previous methods are recycled as hard samples. The mean Average Precision (mAP) results for three large datasets, Market1501, DukeMTMC-reID, and MSMT17, reach 87.5%, 74.8%, and 41.9%, respectively, proving the superiority of the proposed method.
With the continuous development of the large deep model, the backbone of Siamese-based visual object tracking is strengthening and the number of parameters is increasing. This has led to doubling of the model training time and cost, making deployment of the model on edge devices challenging. This paper focuses on improving the ability of lightweight models to extract target location and semantic information and proposes a lightweight visual object tracking algorithm based on the location and semantic separation attention mechanism. First, the normalized attention mechanism is improved by combining horizontal and vertical convolutions to construct the position attention and embedding it into the shallow features of the backbone network to extract the target position in the formation. Subsequently, the squeeze-and-excitation network and channel direction normalized attention are fused with the deep features of the backbone network to extract semantic information. In contrast to the previous studies on attention mechanism, this study uses the properties of shallow features that are conducive to spatial information extraction and deep features that are conducive to semantic feature extraction in the network to separate location attention and semantic attention and improves the algorithm's ability to extract the target location and semantic information without significantly increasing the number of parameters. Experimental results on a general tracking dataset demonstrate that the proposed algorithm can improve the precision and success rate of a tracking algorithm based on a lightweight Siamese network.
DeepFake-enabled abuse of face forgery technology has given rise to considerable security risks to society and individuals; therefore, DeepFake detection has become a hot topic of research. Current deep learning-based forgery detection techniques exhibit good results on High-Quality (HQ) datasets but show poor performance on Low-Quality (LQ) datasets and across different datasets. To improve the generalization of DeepFake detection performance, this paper proposes a Multi-Scale Dual-Stream Network (MSDSnet) for DeepFake detection. The network input is divided into a spatial-domain feature stream and a high-frequency noise feature flow. First, the Multi-Scale Fusion (MSF) module is used to capture the tampered coarse-grain facial features from images and fine-grained high-frequency noise information from forged images in different situations. The network fully integrates the dual-stream features of the spatial-domain feature stream and high-frequency noise feature flow through the MSF module. The Multi-modal Interaction Attention (MIA) module further interacts to learn the dual-stream information. Finally, a Frequency Channel Attention Network (FcaNet) is used to obtain the global information of the forged face features for complete detection and classification. Experimental results show that the proposed method achieves 98.54% accuracy on the HQ dataset Celeb-DF v2 and 93.11% on the LQ dataset FaceForensics++. Simultaneously, the experimental results are better than those obtained using other methods in cross-dataset experiments.
Cross-chain is an important technology that breaks the ″information silos″ of blockchain networks and facilitates interoperability between different blockchain networks. Cross-chain bridges have become an important technique for asset and information transfer between heterogeneous blockchains. In recent years, attacks against cross-chain bridge vulnerabilities have occurred frequently, and the cross-chain transaction anomalies caused by these attacks have resulted in economic losses of up to billions. However, research on the problem of anomalous transactions in cross-chain bridges is lacking and detection efforts are highly dependent on manually summarized anomalous patterns of transaction sequences. In this study, a cross-chain anomalous transaction detection method based on the Bidirectional Encoder Representations from Transformer (BERT) model is proposed, which overcomes the limitations of existing detection methods that rely on manual experience by providing two detection modes based on feature extraction. The first mode aims to extract features more accurately by automatically extracting cross-chain transaction sequences with key features from cross-chain native transaction data based on the transaction status and then fine-tuning the BERT-Base-Uncased pretrained model to adapt to the anomalous transaction detection task using cross-chain transaction sequence text data. The second mode aims to compensate for the possible feature inadequacies that may occur by considering only key cross-chain transaction sequences and to solve the anomaly detection task by directly fine-tuning the BERT-Base-Uncased pretrained model using the original transaction text data with comprehensive features. The experiments use real cross-chain data from existing studies to evaluate the proposed detection methods. The results show that both detection modes can effectively detect anomalous cross-chain transactions, that is, the precision rate, recall rate, and F1 value reach 100%.
Deep neural networks have achieved significant success in remote sensing image scene classification. However, because of the strong transferability of adversarial samples, the vulnerability of scene classification networks based on remote sensing images cannot be ignored. To enhance the robustness of remote sensing image scene classification networks, ensure their reliability and security in various environments and conditions, and effectively improve their practical application value, this study proposes a Frequency-Domain Quantization (FDQ) adversarial attack method. First, the input image is subjected to a Discrete Cosine Transform (DCT), and a quantization filter is used in the frequency domain to effectively capture the prominent regions of key features that enable the image to be correctly classified in the frequency domain. Then, a class-based attention loss is proposed, which gradually causes the quantization filter to lose these key features that enable correct image classification, and the model's attention gradually deviates from features and regions that are completely unrelated to the original category. The proposed method uses the attention distribution of a model to implement black-box attacks at the feature level. Universal adversarial samples are obtained for remote sensing image generation by identifying common defense vulnerabilities in different networks. Experimental results demonstrate that the FDQ method can successfully attack most of the advanced deep neural networks in remote sensing image scene classification tasks. Compared with the current state-of-the-art attack methods based on remote sensing image scene classification tasks, FDQ's attack success rate based on the RegNetX-400MF architecture on the UCM and AID benchmark datasets increases by 35.43% and 23.63%, respectively. Experiments have shown that FDQ has good attack and transferability, making it more difficult for defense systems to resist.
Lattice-based post-quantum cryptography algorithms demonstrate significant potential in public-key cryptography. A key performance bottleneck in hardware implementation is the computational complexity of polynomial multiplication. To address the problems of low area efficiency and memory mapping conflicts encountered in polynomial multiplication, this study proposes a polynomial multiplication structure based on Partial Number Theoretic Transform (PNTT) and a Coefficient Crossover Operation (CCO). First, the last round of the Number Theoretic Transform (NTT), coefficient multiplication, and the first round of the Inverse Number Theoretic Transform (INTT) are merged into a CCO, reducing two rounds of butterfly operations and 50% of the twiddle factor storage space; consequently, memory access overhead is lowered. Second, lightweight hardware is employed to implement modular addition, modular subtraction, division by two, and enhanced Barrett-based modular multiplication, effectively reducing the logical resource overhead. Simultaneously, the study designs a reconfigurable Processing Element (PE) array using pipeline and time-sharing multiplexing techniques, allowing each operation unit to be efficiently reconnected under different transformations. In addition, the study introduces coefficient grouping storage and special memory mapping methods in the memory mapping scheme. The efficient scheduling of data and twiddle factors is achieved by leveraging address-mapping rules, avoiding memory mapping conflicts, and achieving low-cost memory access. Finally, a First Input First Output (FIFO) structure is employed for data reorganization, which enhances data access efficiency. Experimental results show that the proposed polynomial multiplication structure reduces the Area-Time Product (ATP) of Slices and Digital Signal Processor (DSP) by over 21.7% and 61.1%, respectively, compared to existing works and has a higher area efficiency.
The Oblivious Transfer (OT) protocol is a privacy-preserving two-way communication protocol important for building secure multiparty computations. To exchange symmetric keys, this protocol is typically based on the Rivest—Shamir—Adleman (RSA) or Diffie—Hellman (DH) cryptographic systems, which are used for encryption during the message-encryption phase. However, in existing OT protocols, the generation of multiple pairs of public and private keys and data computations are time-consuming. Using bijective functions to transform the ciphertext within the same ciphertext domain can ensure indistinguishability after decryption and reduce the computational complexity. In the semi-honest model, a ciphertext obfuscation-based OT protocol framework is proposed and OT protocols are instantiated based on RSA and DH within this framework. Compared with RSA-based encryption schemes, the proposed protocol requires only one pair of public and private keys. When the number of public and private keys is reduced to one, the receiver can use the public key in the digital certificate of the sender to implement the OT protocol using an identity authentication function. Compared with the OT protocol based on the DH key exchange, this protocol has a small data transmission volume and low computational complexity. Experimental results show that, compared with existing OT protocols, the efficiency of instantiated protocols in the key exchange stage can be improved by at least 30%. Moreover, this protocol can be used as a basic protocol for privacy set intersections, obfuscation circuits, and OT extension protocols.
With the accumulation of massive amounts of data and continuous improvements in computing power, Deep Neural Networks (DNN) have been widely used in various tasks such as image recognition and text classification. However, studies have shown that DNN-based text classification models are often subjected to adversarial sample attacks that are maliciously constructed by attackers. Attackers can alter the classification results of a model by deleting or modifying the original text, inserting obfuscated statements, or adding punctuation marks. Most existing adversarial sample generation methods sacrifice concealment and adopt a hybrid approach involving a variety of replacement pools to improve attack accuracy, which cannot balance the attack success rate and the concealment of adversarial samples. To solve this problem, this study proposes a Chinese adversarial sample generation method called WordReproduction, which is designed to conceal adversarial samples. The saliency score of the Chinese characters is calculated by combining the parts-of-speech of the characters themselves with the word level dimension. In the keyword replacement module, three glyph replacement methods are used to replace keywords and words: near-word vector space, glyph splitting candidate pool, and word inversion. Based on the morphological characteristics of Chinese characters, the study also designs a glyph similarity evaluation algorithm to better quantify the similarity between adversarial samples and the original text. Experimental results show that the adversarial samples generated by WordReproduction are superior to those generated by the baseline method in terms of the attack success rate and glyph similarity. When using the Transformer model for sentiment classification, compared with the WordHandling method, the attack success rate and glyph similarity score of WordReproduction increase by 51.64 percentage points and 0.53, respectively. The generated adversarial samples not only mislead the classification results of the model but also have high concealment, making them difficult for human readers to detect.
Unlike natural carrier-based coverless information hiding, non-natural carrier-based coverless information hiding does not attempt to generate or utilize natural carriers to hide secret information, which can fundamentally avoid the discrimination problem of natural carriers. However, in existing non-natural-based coverless information hiding methods, marbling painting based information hiding cannot conceal secret bits effectively, whereas fractal-based information hiding requires strict mathematical constraints on global and local self-similarities. To effectively conceal secret information and further improve visual quality by employing natural image vectorization, this paper proposes a stone painting generative information hiding method through triangulation and Bezier curves. First, the foreground region of a cover image is triangulated by removing triangles that are unsuitable for embedding and do not preserve foreground characteristics. Second, stone contours that represent the secret bits and are tangential to the given triangular region are generated, and random colors consistent with the triangular region, user key, and embedded secret bits are filled in to generate the stone painting. Finally, the secret bits are extracted, and color authentication is performed based on the consistent triangular region generated by the key. Theoretical and experimental results confirm that the proposed method can generate stone paintings with semantically rich features consistent with the contours of the cover image. It effectively conceals secret bits without exposing the embedded features, and the extraction relies strictly on the user key, thereby allowing high-precision color authentication. Without the correct key, the secret bits cannot be obtained.
To address the issues during system state estimation in the presence of node failures or anomalies in Wireless Sensor Networks (WSNs), a k-medoids-trust-based distributed H∞ fusion filtering method is proposed to improve the robustness and accuracy of system state estimation in the event of sensor failures. The method has the following primary steps. First, each sensor node independently collects local measurement information and performs distributed H∞ filtering to update its local state estimations. Subsequently, a k-medoids trust mechanism is established to divide the obtained local state estimations into trusted and untrusted estimations after the local state estimations are exchanged between neighboring sensor nodes. Untrusted estimations are discarded, whereas the trusted estimations are retained. A distributed diffusion fusion strategy is then designed that calculates the adaptive weights of the trusted estimations and fuses and updates the local state estimation in real time. The effectiveness and superiority of the proposed state estimation method are demonstrated using a target tracking simulation example. The results from simulated target tracking show that the proposed method is more resilient to sensor node faults or anomalies than the trust-based distributed Kalman filtering algorithm under measurement interference, data replay, and erroneous data injection faults, thus verifying the effectiveness and superiority of the proposed method.
Rail is an important infrastructure of railway transportation system, and its safety is very important to train operation. Regular inspection of rail conditions can help detect potential defects and damages in a timely manner. In recent years, machine vision has been gradually applied to rail inspection. However, owing to the limitations of network and computing resources on railway cars, detection work can only be carried out during the nonrunning time of ordinary trains and real-time detection cannot be performed. To solve the aforementioned problems, a terminal-edge-cloud architecture is adopted. This study proposes mounting high-speed cameras on a train at certain positions. The detection image tasks collected by these cameras are carried to the terminal of the pretrained detection model (cached in advance), the edge server of the rail side, and the cloud server for processing. Based on the discrete composition of the detection tasks and considering the constraints of the detection task distribution ratio, CPU computing power, and task priority constraint time delay, the detection task time delay is used as the optimization objective to construct the objective function. Moreover, the task unloading processing problem is expressed as a maximum-minimization model problem. Finally, a Genetic Algorithm (GA) is used to obtain the optimal task allocation ratio, CPU computing power, task allocation, and minimum task time delay. The experimental results show that in the case of generating a single detection task with a train capturing frequency of 200 Hz, the response time delay based on genetic algorithm collaborative unloading is reduced by 1 287, 515, and 875 ms in terms of the binary cloud, edge, and local response time delays. In the case of 10 detection tasks, the response time delay based on genetic algorithm collaborative unloading is reduced by 2.440 and 3.520 s compared to particle swarm optimization and ant colony optimization, respectively. This method has significant time delay optimization effects in different unloading scenarios.
Via coordinated management of ground networks, satellite networks, and near-earth unmanned aerial vehicle networks, computing-empowered space-air-ground integrated networks can achieve global connectivity and universal intelligence, providing strong support for the development of China's digital economy. Low Earth Orbit (LEO) satellites have the ability of ubiquitous connectivity and edge computing, which provides the basis for an efficient computing system for space-air-ground integration. By synchronizing the Mobile Edge Computing (MEC) to LEO satellite networks to form a service-oriented end-edge-cloud three-level computing architecture, latency-sensitive tasks can be offloaded from terminals to LEO satellites, which improves the task completion rate. However, methods to make efficient offloading decisions and compute power allocation in LEO satellite edge networks must be developed urgently. Aiming at high dynamics in the satellite network environment and the discrete-continuous hybrid action space, this study proposes a Hybrid Proximal Policy Optimization (H-PPO) method based on the generative diffusion model. First, a wireless channel with time-varying characteristics is modeled and service latency, communication, and computation models under different offloading decisions are constructed. Second, under the multiple constraints of offloading decisions, remaining computing resources, and power control, a long-term optimization problem for maximizing the average task completion rate is constructed. Subsequently, the Markov decision process with parameterized actions is established and the generative diffusion model is introduced as the discrete action policy to improve the sampling efficiency and exploration ability of traditional Deep Reinforcement Learning (DRL) methods. Finally, the proposed method is used to jointly optimize the computing offloading, computing power allocation, and power control. The simulation results show that the proposed method has a better convergence performance and is superior to the three comparison methods in terms of task completion rate.
Oriented toward energy harvesting dual sensors and a destination wireless sensor state update system, wherein sensors with finite size batteries collect energy to transmit sensed state updates to the destination. A two-stage communication strategy that integrates the transmission of sensing and control data is proposed to ensure the freshness of data at the receiving end. Specifically, the target transmits control data to the sensor using a selective combination of Hybrid Automatic Repeat Requests (HARQs), and the sensor is triggered to synchronize sensing and send the corresponding state updates to the destination using a maximum proportional combination of truncated HARQs. The system focuses not only on the latest sensed data but also on steadily collected energy. First, the stochastic energy reach model is investigated, and the corresponding energy transfer matrix and its steady-state probability distribution are obtained. Second, the maximum ratio merging and selective merging retransmission techniques are adopted for sensing data and control data, respectively. Preemptive and nonpreemptive transmission schemes are considered, and the explicit expressions of the Age of Information (AoI) with respect to the probability of sensing, the probability of energy reach, and the maximum number of transmissions are derived. The numerical simulation results demonstrate the effects of different network parameters on the AoI of the system. Additionally, the performance of the preemptive and nonpreemptive transmission schemes are compared. The results reveal that the preemptive transmission scheme can achieve a lower AoI to some extent compared to that achieved by the nonpreemptive transmission scheme.
A traction inverter is the core device in a train power system, and its power semiconductor, the Insulated Gate Bipolar Transistor (IGBT), is prone to random intermittent open-circuit faults under long-term vibrations and complex operating conditions. Such faults often disappear after shutdown, making timely detection difficult. To investigate the fault mechanism, this study establishes a simulation model incorporating a traction power supply system, inverter, and motor. Considering the coupling characteristics under multi-motor synchronous control, this study analyzes the current waveforms of different transistors with intermittent open-circuit faults. Simulation results indicate that low-probability faults cause relatively small current fluctuations and thus exhibit concealment, whereas high-probability faults result in significant waveform distortion and may induce abnormalities in adjacent inverters, showing obvious propagation. Furthermore, to address the concealment and propagation of IGBT intermittent open-circuit faults in metro train traction inverters, this study proposes a Causal-Res fault diagnosis method based on causal analysis. This method employs the causal convolution mechanism of Temporal Convolutional Networks (TCNs) to extract causal feature vectors from output current signals and combines the deep feature learning capabilities of Residual Neural Networks (ResNets) to classify these feature vectors, thereby achieving effective fault diagnosis and localization. Validations are conducted on a low-power test platform built according to the topology of a metro train distributed traction system. The results demonstrate that the proposed method achieves fault localization accuracies of 99.99% and 99.95% under low- and high-probability intermittent open-circuit scenarios, respectively. Comparative experiments confirm that the introduction of causal relationships effectively enhances the accuracy and stability of the diagnostic method.
Radiotherapy is an important treatment modality for liver cancer. Deep learning-based image semantic segmentation technology can assist physicians in demarcating radiation target areas and enhancing the accuracy of radiotherapy. However, existing medical image semantic segmentation models are relatively intricate and possess a substantial number of parameters, rendering them challenging to deploy on devices with constrained resources. An analysis of the significance of the parameters of the vision transformer model reveals that the crucial parameters of the different layers of the model exhibit a distinctive distribution pattern. Based on this finding, this study proposes a cross-layer channel pruning method based on variable sequences. According to the distribution pattern of the significant parameters, the significance weights of the Multihead Self-Attention (MSA) and Feed-Forward Network (FFN) layers are measured and these values are adjusted to form a hierarchical sequence of significance weight values. Subsequently, the corresponding pruning rate is set for the sequence to form a variable pruning rate sequence that varies with the depth of the network, thereby achieving fine pruning of the MSA and FFN layers. This new method introduces a cyclic pruning strategy that iteratively updates the variable pruning rate sequence during each round of model pruning to reduce the redundant structures in the MSA and FFN layers adequately. The model is trained and tested using the public liver segmentation dataset, 3D-IRCADb-01. After pruning the vision transformer, the accuracy of the image segmentation does not decline and the Floating-Point Operations (FLOPs) and number of parameters are reduced by 60.26% and 66.07%, respectively. Experimental results indicate that the new method attains a higher pruning rate while guaranteeing segmentation accuracy and is more advantageous than the fixed pruning rate method.
To address the issues of low target pixels, complex background, and limited hardware resources in infrared image target detection, a target detection model that incorporates a multihead cross-attention mechanism with position coding and a two-feature interaction refinement structure is proposed. In the backbone network, a location coding-based cross-attention module called Criss-Cross Attention (CCA) and a Spatial Pyramid Pooling Cross Stage Partial (SPCP) module are introduced. The CCA module transforms the correlation matrix by rows and columns horizontally. This module aggregates contextual information in the horizontal and vertical directions via row and column correlation matrix transformations and enhances feature extraction by sharing the parameters of the recursive interleaving module, which reduces the number of parameters required for the self-attention mechanism. The SPCP module reduces the number of parameters and computations by unifying feature mappings of different sizes and scales, adopting a Cross Stage Partial (CSP) structure, and introducing a squeeze incentive. The attention mechanism selects channels that are more favorable for target detection. In the neck network, frequency-domain information and a Dual Feature Interaction Refinement (DIR) module are introduced to further extract the refined features of small target ships and enhance the feature fusion capability of the model. The improved model achieves 89.5% precision, 97% recall, and 93.1% F1 score on an Infrared Ship Detection Dataset (ISDD). This significantly improves the detection performance compared with that of the benchmark model. Additionally, the proposed model reduces the number of computational parameters compared with other detection models. The experimental results show that the multihead cross-attention mechanism with fused position coding and two-feature interaction refinement structure effectively improves the accuracy of infrared ship target detection.
To address issues such as high false alarm and missed alarm rates in existing target detection algorithms owing to multi-scale variations in complex field cotton and large computational volume of existing detection algorithms, which make their deployment in edge devices challenging, a lightweight field cotton grade detection algorithm, YOLOv8-Cotton, is proposed. This algorithm optimizes feature extraction and fusion and combines model pruning and knowledge distillation techniques. First, a Multi-Scale Convolutional (MSConv) is designed in the feature extraction network, which contains convolutional kernels of different scales and can enhance the feature extraction capability of the network. Second, an Efficient Local Feature Selection (ELS) mechanism is constructed in the neck network to capture horizontal and vertical features in the spatial dimension and suppress irrelevant regions from affecting the prediction results. Then, a novel hierarchical feature fusion network, HL-PAN, is constructed using the ELS mechanism to utilize the complementary information generated by its Upsampling Selection Feature Fusion (U-SFF) and Downsampling Selection Feature Fusion (D-SFF) to guide the feature fusion, which enhances the ability of the model to detect multi-scale changes in cotton. Third, the model is compressed using the Layer-Adaptive Magnitude-based Pruning (LAMP) model pruning algorithm to reduce its weight. Finally, feature distillation is performed using the CWD loss function to enhance the detection performance of the lightweight model. Experimental results show that YOLOv8-Cotton achieves mAP@0.5 and mAP@0.5∶0.95 values of 75.4% and 53.1%, respectively, on the self-constructed dataset, which are 5.1 and 2.1 percentage point improvements over the baseline algorithm. Furthermore, the model size decreases by 4.83 MB and computation is reduced by 5.8×109. Additionally, the results show that the model can be generalized on a publicly available dataset.
To enhance the accuracy of steel plate welding and improve the quality and construction efficiency of ship hulls, this study proposes an Adaptive Golden Sine Crayfish Optimization Algorithm (AGSCOA)-stacking feature-weighted agent modeling approach to solve the problem of welding margin prediction for marine steel plates. First, based on the stacking ensemble learning strategy, a base learner with high predictive accuracy and differentiation is selected from multiple machine learning models according to the proposed PC metrics. Second, a feature weighting method is proposed to improve the generalizability of the model by performing adaptive feature weighting for the prediction performance of the selected base learners. Finally, the traditional crayfish optimization algorithm is improved in various aspects: an orthogonal refractive inverse learning mechanism is proposed to improve population initialization to ensure initial population quality, an adaptive Lévy flight strategy is proposed to optimize the exploration phase to avoid being trapped in local optima, and a golden sine algorithm is proposed to improve the development phase to balance the global search with the local development capability. The improved AGSCOA is used to optimize the agent model with multiple parameters to enhance the model prediction accuracy. Experimental results show that AGSCOA demonstrates excellent performance in terms of optimization and convergence speed. The proposed surrogate model has higher prediction accuracy compared to the linear weighted ensemble learning surrogate model, AGSCOA-SVR, AGSCOA-ET, and AGSCOA-RF, with the Root Mean Square Error (RMSE) reduced by 14.29%, 35.78%, 17.48%, and 22.31% respectively.