Acoustic Scene Classification (ASC) aims to enable computers to simulate the human auditory system in the task of recognizing various acoustic environments, which is a challenging task in the field of computer audition. With rapid advancements in intelligent audio processing technologies and neural network learning algorithms, a series of new algorithms and technologies for ASC have emerged in recent years. To comprehensively present the technological development trajectory and evolution in this field, this review systematically examines both early work and recent developments in ASC, providing a thorough overview of the field. This review first describes application scenarios and the challenges encountered in ASC and then details the mainstream frameworks in ASC, with a focus on the application of deep learning algorithms in this domain. Subsequently, it systematically summarizes frontier explorations, extension tasks, and publicly available datasets in ASC and finally discusses the prospects for future development trends in ASC.
This study investigates the consensus problem, a fundamental issue in distributed systems and network control. Consensus studies have traditionally focused on unweighted networks, overlooking the impact of edge weights in real-world networks. However, networks such as transportation systems, social networks, and power networks exhibit significant weighted properties, and unweighted models fail to fully capture their complex interactions. To address this issuse, this study examines a cluster of pseudo-fractal-weighted networks to determine how edge weights affect consensus. The Laplacian matrix is used to establish a relationship between the Kirchhoff indices and network consensus, providing an in-depth analysis of consensus behavior in weighted networks. Through the calculation of recursive relations for various indices across iterations, precise formulas for key quantities such as the multiplicative Kirchhoff index, additive Kirchhoff index, Kirchhoff index, and network coherence are derived. A numerical analysis shows that as the network size increases, consensus in weighted networks converges to a constant, indicating greater resistance to external noise.
With the rapid increase in the complexity of integrated circuit design, a trend of globalization and division of labor has emerged, necessitating the involvement of an increasing number of third-party Intellectual Property (IP) core providers. The widespread use of third-party IP cores introduces risks of hardware trojans. To detect and evaluate the presence of hardware trojans and their potential functionalities in third-party IP cores, there is an urgent need to explore feasible hardware security evaluation methods for IP cores. The functional identification of digital circuit modules has garnered significant attention as a fundamental research area in hardware trojan analysis. In this study, the task of circuit function detection is transformed into a multiclassification problem. By leveraging the characteristics of the circuit and graph data structures, a gate-level circuit function classification and detection method based on Graph Attention Networks (GAT) is proposed. First, to address the lack of functional identification datasets for gate-level netlists, a representative set of Register Transfer Level (RTL) codes is collected and synthesized to generate gate-level netlists, thereby constructing a gate-level circuit dataset of appropriate scale and diversity. Subsequently, to extract and process the circuit feature information, a software tool based on text recognition is developed. This tool maps the complex interconnections of circuits into a structured and concise JSON(JavaScript Object Notation) format, thereby facilitating neural network processing. Finally, a graph attention neural network is employed to train a multiclassifier using the constructed gate-level netlist dataset. After training, the multiclassifier becomes capable of classifying and identifying unknown gate-level circuits. The experimental results demonstrate that the classifier, after learning from more than 3 000 netlists in the self-built dataset, achieves a classification accuracy of 90% for 645 netlists across six categories.
The key idea behind Graph Neural Network (GNN) is to learn the information representation of a target node by aggregating neighborhood information through the topology of a graph; however, edges that are not relevant to a downstream task or nodes with limited neighbors may limit the representation of the neural network. Existing enhancement methods seldom focus on both structure and features simultaneously when enhancing graph data. Among them, existing local area enhancement methods use generative models to generate features through first-order neighborhoods and cannot obtain more relevant higher-order neighborhood information for nodes. To address this phenomenon, this study presents an effective data enhancement strategy. First, an edge prediction model is used to adjust the topology of a graph to improve the Signal-to-Noise Ratio (SNR) and facilitate the message transfer between nodes. Second, a Personalized PageRank (PPR) algorithm is used to aggregate the effective information in multiorder neighborhoods from a global perspective for global feature enhancement. Finally, the generative model is used to generate more features for local enhancement, which enriches node expression, especially for low-degree nodes. Experiments show that the accuracies of Graph Convolutional Network (GCN) and Graph Attention Network (GAT) models are improved by 3.1 and 1.3 percentage points on average, respectively, on the Cora, CiteSeer, and PubMed datasets with this data enhancement strategy. This result shows that performance improves to an extent when this strategy is applied to neural network architectures with different benchmark sets.
Medical Visual Question Answering (Med-VQA) requires an understanding of content related to both medical images and text-based questions. Therefore, designing effective modal representations and cross-modal fusion methods is crucial for performing well in Med-VQA tasks. Currently, Med-VQA methods focus only on the global features of medical images and the distribution of attention within a single modality, ignoring medical information in the local features of images and cross-modal interactions, thereby limiting the understanding of image content. This study proposes the Cross-Modal Attention-Guided Medical VQA (CMAG-MVQA) model. First, based on U-Net encoding, this method effectively enhances the local features of an image. Second, from the perspective of cross-modal collaboration, a selection guided attention method is proposed to introduce interactive information from other modalities. In addition, a self-attention mechanism is used to further enhance the image representation obtained by selective guided attention acquisition. Ablation and comparative experiments on the VQA-RAD medical question-answering dataset show that the proposed method performs well in Med-VQA tasks and improves feature representation performance compared to similar methods.
Continuous learning ability is an important aspect of human intelligent behavior, which enables humans to acquire new knowledge continuously. However, several studies have shown that conventional deep neural networks do not possess such continuous learning capabilities. After learning new tasks in sequence, they often experience catastrophic forgetting of previously learned tasks, which hinders the continuous accumulation of new knowledge and limits further improvement in intelligence. Therefore, enabling deep neural networks to have continuous learning capabilities is important to achieve strong artificial intelligence technologies. This study proposes a continuous learning algorithm based on block average and orthogonal weight modification, named B-OWM, which uses a set of input sample block average vectors with an extremely optimal number of blocks to represent the input space, combined with the idea of Orthogonal Weight Modification (OWM) to update network parameters. Thus, deep neural network models can overcome catastrophic forgetting of learned knowledge when learning new tasks. Many incremental continuous learning experiments on multiple datasets with nonoverlapping tasks show that B-OWM algorithm significantly outperforms the OWM algorithm in terms of continuous learning performance, with an accuracy improvement rate of up to 80% in continuous learning scenarios with large batch number.
To address the challenges faced by current single-stage object detection algorithms based on convolutional neural networks (such as the YOLO series and VFNet)in high-altitude aerial shooting scenarios-including complex backgrounds, low detection accuracy, and feature overlap, this study proposes an end-to-end object detection algorithm called CSPENet. First, a deep convolutional network, CSPNeXt, with large kernels is used as the model′s backbone, enhancing its capability to capture global context. Second, by introducing a Feature Refinement Module (FRM) in both spatial and channel dimensions, adaptive weights are generated that can effectively suppress overlapping features are generated. It adds a Receptive Field Attention (RFA) mechanism, based on mobile networks in the feature fusion stage to solve the problem of large kernel parameter sharing. Finally, the Efficient Intersection over Union (EIoU) loss function is utilized as the model′s regression loss, separating the influencing factors of the aspect ratios between the predicted and ground truth boxes, which leads to faster convergence and improved localization accuracy. Experimental results demonstrate that CSPENet achieves an average accuracy improvement of 4.4 percentage points compared with the DINO algorithm on the VisDrone-DET dataset, offering a novel solution for research and applications in small object detection algorithms.
To solve efficiency problems in convolution calculations, this paper proposes a convolution calculation optimization method OAC. The objective is to improve the efficiency of convolution calculations to address the increasing demand for high convolution calculation speed in fields such as deep learning. The OAC method is based on vector conversion and involves a series of ingenious steps to optimize convolution calculations. First, the input matrix is concatenated row-by-row into a vector. Subsequently, the convolution kernel is stretched and transformed, and zeroes are padded at appropriate positions according to the width of the input matrix and size of the convolution kernel to form another vector. This transformation is designed to perform correct calculations with the transformed vectors of the input matrix and minimize redundant operations in the calculation process, thereby improving efficiency. Finally, other optimization methods are combined to accelerate the vector calculations. Experimental results show that the calculation speed of the OAC method is 58.9% and 90.1% higher than that of the traditional MEC method and the im2col method. Further, the memory usage is reduced by 53.7% compared with that of the MEC method. The OAC method has not only achieved significant results in computational efficiency, but also provided efficient and feasible solutions for computing tasks such as deep learning scheme.
Aspect sentiment triplet extraction is an important subtask in aspect-level sentiment analysis aimed at extracting aspect words, opinion words, and sentiment polarity from sentences. In recent years, the combination of syntactic dependency trees and Graph Convolutional Networks (GCN) has achieved satisfactory results in triplet extraction tasks. However, most of these methods do not fully utilize or enhance language features, and ignore global contextual core information. Therefore, an aspect sentiment triplet extraction model named Linguistic Feature Enhancement (LFE) based on language feature enhancement is proposed. First, the part-of-speech features of keywords are introduced to fully utilize semantic information; then, the syntactic dependency types are considered and the relative syntactic dependency distance between words is calculated, so that words can focus on the syntactic features of words closer to them. Subsequently, the dual affine attention mechanism combined with GCN is used to enhance semantic and syntactic features. The GCN and dual affine attention mechanism can effectively utilize the structural information of syntactic dependency trees and integrate them into the model. Finally, the global features and language features are fused to ensure that key information in the global context is not ignored, thereby improving the model's robustness. The experimental results show that compared with GCN-EGTS-BERT model, the LFE model improves the F1 values by 3.52, 5.32, 1.97, and 2.63 percentage points on four datasets: Res14, Lap14, Res15, and Res16, respectively, demonstrating its feasibility and effectiveness.
Action detection comprises both action classification and boundary localization, with a predominant focus on action and boundary features. Current methods neglect the significance of spatial features in this task and suffer from ambiguous action boundary prediction, which affects the performance and application of action detection models. To address these challenges, this paper proposes a Salient Object Tracking-based Action Detection (SOT-AD) method. First, to learn salient spatial information at different scales, a hierarchical attention network is introduced to capture salient objects associated with actions, while reducing interference from action-irrelevant information. Second, to ensure consistency in salient object attention across adjacent temporal positions, this paper proposes a salient object tracking loss. Neutral samples are introduced to construct a ″target-sub-target-background″ feature pool to learn temporal contextual information for feature sequences, which facilitates the realization of salient object tracking. Experimental results on two widely used datasets, THUMOS14 and ActivityNet1.3, demonstrate that SOT-AD outperforms mainstream methods with improvements of 0.9 percentage points and 0.6 percentage points in terms of mean Average Precision (mAP), respectively. Notably, on the THUMOS14 dataset, SOT-AD achieves an mAP@0.5 of 72.7%.
Multi-Agent Reinforcement Learning (MARL) plays a crucial role in solving complex cooperative tasks. However, traditional methods face significant limitations in dynamic environments and information nonstationarity. To address these challenges, this paper proposes a Role learning-based Multi-Agent reinforcement learning framework (RoMAC). The framework employs role division based on action attributes and uses a role assignment network to dynamically allocate roles to agents, thereby enhancing the efficiency of multiagent collaboration. The framework adopts a hierarchical communication design, including inter-role communication based on attention mechanisms and inter-agent communication guided by mutual information. In interrole communication, it leverages attention mechanisms to generate efficient communication messages for coordination between role delegates. In inter-agent communication, it uses mutual information to generate targeted information and improve decision-making quality within role groups. Experiments conducted in the StarCraft Multi-Agent Challenge (SMAC) environment show that RoMAC achieves an average win rate improvement of approximately 8.62 percentage points, a reduction in convergence time by 0.92×106 timesteps, and a 28.18 percentage points average decrease in communication load. Ablation studies further validate the critical contributions of each module in enhancing the performance, demonstrating the robustness and flexibility of the model. Overall, the experimental results indicate that RoMAC offers significant advantages in MARL and cooperative tasks, providing reliable support to efficiently address complex challenges.
Aspect Term Extraction (ATE) is a critical task in aspect-level sentiment analysis, and extraction and annotation costs are extremely high. When training and testing samples come from different domains, the performance of traditional methods often degrades significantly owing to the differences between the two samples. Existing methods focus on domain adaptation techniques based on rich semantic information within local contexts to achieve cross-domain ATE. However, they overlook the potential global long-range dependency relationships of aspect terms within the text, thereby limiting the performance, scalability, and robustness of the models. To address these issues, this study proposes a cross-domain ATE model known as CBiLSTM, which does not require additional manual labeling and integrates global and local semantic information. The model leverages semantic information as a pivot and first incorporates external semantic information into word embeddings to construct pivot information for both the source and target domains. It then performs parallel encoding of the global and local contextual semantic information, thereby better capturing comprehensive semantic features and bridging the gap between the source and target domains to achieve cross-domain ATE. CBiLSTM achieves an average F1-score of 53.87%, outperforming the current state-of-the-art model by 0.49 percentage points, on three benchmark datasets. Experimental results demonstrate the superior performance and lower computational cost of CBiLSTM.
In recent years, Graph Neural Networks (GNNs) have been widely used for text classification tasks. Current models based on GNNs first model the text as a graph and then use GNNs to propagate and aggregate the features of the text graph. However, these methods have two notable limitations. First, existing models cannot capture high-order semantic relationships between words because of the limitations of graph structures. Second, existing models cannot capture key semantic information from the text. To address these issues, this paper proposes a text classification model based on the feature fusion of dual hypergraph convolutional networks. On one hand, the original text is used to construct a text hypergraph; on the other hand, external knowledge is introduced for short texts. The text is semantically enhanced using external knowledge based on the SenticNet lexicon, and a semantic hypergraph is constructed. After hypergraph convolution, an attention mechanism is used to fuse the features of the dual hypergraphs for short-text classification. Experimental results on four text classification datasets show that the proposed model outperforms the baseline methods and demonstrates superior text classification performance.
Previous studies have shown increasing interest in understanding how artificial intelligence represents perception and action planning in a hierarchical manner across multiple abstraction levels and timescales. Owing to technological constraints, most studies have been limited to the artificial decomposition of tasks, such as the 3D Bin Packing Problem (3DBPP). In this scenario, heuristic rules guide neural networks in the analysis of the packing points during the task decomposition stage, thus helping the agent decompose the state space. This transforms the originally vast and complex space into individual subspaces, thereby providing the neural network with better alternative solutions. However, these rules cause performance limitations. If the rules cannot perfectly decompose the problem, fixed-rule assistance may restrict the performance of the neural network by overlooking better solutions that the rules may ignore. To address this problem, a heuristic rule fusion strategy is used in this study to improve the original Packing Configuration Tree (PCT) model. This strategy is based on the concept of hierarchical reinforcement learning to layer the problem, in which a graph attention classification model is introduced to determine the optimal spatial point expansion scheme for the current situation. This approach ensures more possibilities for the combination and arrangement of dismantling internal space points and exploring feasible positions. The results of experiments show that the improved model based on heuristic fusion strategy for layered problems performs better than the original model on multiple datasets. In datasets containing additional density information, the average packing utilization rate reaches 77.2%, which is a 1.7 percentage point improvement over the original model. The proposed model provides more optimal solutions within a reasonable amount of time.
Owing to the unordered and discrete nature of point cloud data, traditional dynamic graph convolution method faces significant challenges in processing this data, making it difficult to accurately represent feature correspondences between 3D points. To address this issue, a network called DKSA-Net is proposed, which incorporates deformable kernels and self-attention. This network consists of two main modules: Deformable Kernels edge Convolution (DKConv) and Self-Attention edge Convolution (SAConv). By integrating deformable kernels with edge convolution to construct the DKConv module, the network can dynamically learn point features, generate deformable kernels, and maintain feature correspondences, thereby improving the handling of feature correspondences. In addition, by introducing the self-attention mechanism and combining it with edge convolution to construct the SAConv module, the network can perform finer-grained feature extraction, fully capture important point cloud features, and enhance the discriminative ability of the model. The experimental results show that DKSA-Net achieves excellent performance on the ModelNet40 and ShapeNet datasets, with an Overall Accuracy (OA) of 93.4%, an average Accuracy (mAcc) of 90.7%, and an average Intersection-over-Union (mIoU) of 86.1%. Furthermore, it demonstrates relatively low model complexity and high robustness, showcasing exceptional capabilities in processing point cloud data.
As the Imperialist Competitive Algorithm (ICA) converges rapidly, solutions of high-dimensional complex problems easily fall into local optima, resulting in an insufficient global optimization ability. To address this issue, this paper proposes an improved ICA Algorithm Based on Lens Opposition-based Learning and Differential Evolution (LODE-IICA). First, a dynamic lens opposition-based learning differential evolution mechanism is introduced to periodically provide new evolutionary approaches and balance the forces of various empires for the algorithmic populations to help them jump out of the local optimum. Second, elite preservation strategies are introduced in the algorithmic evolution to redistribute colonies for maintaining population diversity. Finally, dynamic assimilation coefficients are introduced to coordinate the algorithm in different stages of exploration and improve its stability. In the simulation experiments, the standard function test set and the CEC2017 and CEC2020 test sets are used to examine the ability of LODE-IICA to find the optima of different types of functions under multiple dimensions. 15 improved algorithms representative of the standard function test set and the CEC2017 and CEC2020 test sets are selected, and the experimental results are compared with the LODE-IICA results. The results show that the mechanism introduced by LODE-IICA is effective in improving the performance of the algorithm in most cases, along with a better convergence speed and optima-finding ability.
Tibetan text summary enables users to quickly and effectively understand the content of the text. However, the scarcity of public, multi-domain, and large-scale Tibetan summarization datasets hinders the further development of Tibetan text summarization techniques. Furthermore, most studies on Tibetan text summarization adopt models built on Chinese and English text summarization techniques that use words as basic units. However, owing to limitations in Tibetan word segmentation technology, the direct use of words as basic units for text summarization has a significant impact on performance. Therefore, a multi-domain Tibetan short text summarization dataset, TB-SUM, containing 10 523 text-summary pairs is constructed in this study. Based on an analysis of the constituent units of Tibetan texts, a fusion method for different basic units suitable for Tibetan text summarization is proposed. Finally, a Tibetan text summarization model called Fusion_GloVe_GRU_Atten that integrates different basic units is proposed. This method utilizes the Global Vectors for Word Representation (GloVe) module to vectorize Tibetan text and encodes the input vector using the Bi-Gated Recurrent Unit (Bi-GRU) module. The attention mechanism is used to obtain the complete semantic information of the input vector, allowing the decoder to pay more attention to the encoder output related to the current word. GRU is used as a decoder to generate a Tibetan abstract. Experiments on the TB-SUM and Ti-SUM datasets are conducted. The results show that when the fusion of syllables and words is used as the basic unit for model training and syllables are used as the basic unit for testing, the Fusion_GloVe_GRU_Atten model generates a good summary and can achieve high Recall-Oriented Understudy for Gisting Evaluation (ROUGE) scores.
This study presents an improved You Only Look Once version 3 (YOLOv3) algorithm for small object detection, to address problems such as low detection precision for small objects, missed detection, and false detection in the detection process. First, in terms of network structure, the feature extraction capability of the backbone network is improved by using DenseNet-121, with a Densely Connected Network (DenseNet), to replace the original Darknet-53 network as its basic network. Simultaneously, the convolution kernel size is modified to further reduce the loss of feature map information, to enhance the robustness of the detection model against small objects. A fourth feature detection layer with a size of 104×104 pixel is added. Second, the bilinear interpolation method is used to replace the original nearest neighbor interpolation method for upsampling operations, to solve the serious feature loss problem in most detection algorithms. Finally, in terms of the loss function, Generalized Intersection over Union (GIoU) is used instead of Intersection over Union (IoU) to calculate the loss value of the boundary frame, and the Focal Loss function is introduced as the confidence loss function of the boundary frame. Experimental results show that the mean Average Precision (mAP) of the improved algorithm on the VisDrone2019 dataset is 63.3%, which is 13.2 percentage points higher than that of the original YOLOv3 detection model, and 52 frame/s on a GTX 1080 Ti device. The improved algorithm has good detection performance for small objects.
The accurate recognition of fake news is an important research topic in the online environment, where distinguishing information explosion and authenticity is difficult. Existing studies mostly use multiple deep learning models to extract multivariate semantic features to capture different levels of semantic information in the text; however, the simple splicing of these features causes information redundancy and noise, limiting detection accuracy and generalization, and effective deep fusion methods are not available. In addition, existing studies tend to ignore the impact of dual sentiments co-constructed by news content and its corresponding comments on revealing news authenticity. This paper proposes a Dual Emotion and Multi-feature Fusion based Fake News Detection (DEMF-FND) model to address these problems. First, the emotional features of news and comments are extracted by emotion analysis. The emotional difference features reflecting the correlation between the two are introduced using similarity computation, and a dual emotion feature set is constructed. Subsequently, a fusion mechanism based on multihead attention is used to deeply fuse the global and local semantic features of the news text captured by a Bidirectional Long Short-Term Memory (BiLSTM) network with a designed Integrated Static-Dynamic Embedded Convolutional Neural Network (ISDE-CNN). Eventually, the dual emotion feature set is concatenated with the semantic features obtained by deep fusion and fed into a classification layer consisting of a fully connected layer, to determine news authenticity. Experimental results show that the proposed method outperforms the baseline method in terms of benchmark metrics on three real datasets, namely Weibo20, Twitter15, and Twitter16, and achieves 2.5, 2.3, and 5.5 percentage points improvements in accuracy, respectively, highlighting the importance of dual emotion and the deep fusion of semantic features in enhancing the performance of fake news detection.
The SAP-AKA protocol is a secondary authentication protocol that provides key service security for vertical users, based on the Extensible Authentication Protocol (EAP) framework defined by the 3rd Generation Partnership Project(3GPP)standard. In an actual 5G environment, authentication and key negotiation protocols are subject to various network attacks, resulting in illegal access, authentication failure, and identity information leakage. It is uncertain whether the security attributes of the SAP-AKA protocol meet the requirements under major security challenges. This study establishes a formal model of the SAP-AKA protocol using a probabilistic model checking method. The attack rate is introduced during the protocol interaction state transition to quantitatively analyze the degree of impact of the attack rate on the protocol. The protocol properties are described using probabilistic computation tree logic, and the security properties of the protocol are subjected to a quantitative formal analysis using the PRISM probabilistic model-checking tool. The experimental results show that the timeliness, authentication, and integrity of the SAP-AKA protocol are affected by the attack rate among different entities to varying degrees, and the security attributes of the protocol no longer meet the requirements as the attack rate increases. Finally, based on the experimental results, the causes of security defects are analyzed, and an improvement scheme is proposed and formalized. The security attributes of the protocol are effectively improved.
This study analyzes the security fuzzy control problem of time-delay systems under network attacks and investigates the security control problem of time-delay fuzzy systems with sensor saturation, dynamic network scheduling, Randomly Occurring Deception Attacks (RODA), and infinite distributions. A measurement output model is established that could uniformly describe RODA and sensor saturation. To effectively reduce the burden of network communication, the Try-Once-Discard (TOD) protocol is adopted to schedule network nodes. The objective is to design an elastic fuzzy safety controller that enables the control system to satisfy the specified safety requirements when there are simultaneous RODA, sensor saturation, and TOD protocols. Using stability theory, matrix theory, and stochastic analysis techniques, this study first derives sufficient conditions for the control system to meet safety requirements and then obtains the controller gain by solving Linear Matrix Inequalities (LMI). Finally, the feasibility of the proposed design scheme is demonstrated through a numerical example. Experimental results demonstrate that the designed secure and resilient fuzzy controller is robust against network attacks.
Federated learning enables participants to collaboratively model without revealing their raw data, thereby effectively addressing the privacy issue of distributed data. However, as research advances, federated learning continues to face security concerns such as privacy inference attacks and malicious client poisoning attacks. Existing improvements to federated learning mainly focus on either privacy protection or against poisoning attacks without simultaneously addressing both types of attacks. To address both inference and poisoning attacks in federated learning, a privacy-preserving against poisoning federated learning scheme called APFL is proposed. This scheme involves the design of a model detection algorithm that utilizes Differential Privacy (DP) techniques to assign corresponding aggregation weights to each client based on the cosine similarity between the models. Homomorphic encryption techniques are employed for the weighted aggregation of the local models. Experimental evaluations of the MNIST and CIFAR10 datasets demonstrate that APFL effectively filters malicious models and defends against poisoning attacks while ensuring data privacy. When the poisoning ratio is no more than 50%, APFL achieves a model performance consistent with the Federated Averaging (FedAvg) scheme in a non-poisoned environment. Compared with the Krum and FLTrust schemes, APFL exhibits average reductions of 19% and 9% in model test error rate, respectively.
The federated learning framework is hindered by data poisoning attacks from adversaries, causing performance degradation of the global model while preserving the privacy of local data. Currently, mainstream federated learning frameworks assume that client-side local data are clean; however, in reality, attackers can use data pollution strategies to degrade model accuracy. To address these issues, this study proposes a federated aggregation algorithm based on natural nearest neighbors. Unlike traditional federated defense algorithms, this algorithm is designed for federated learning frameworks under non-independent and identically distributed conditions and can defend against targeted attacks. The algorithm introduces search process for natural nearest neighbors, using which it assigns anomaly scores to models and distinguishes abnormal models effectively. Furthermore, the algorithm selects nodes with smaller anomaly scores to participate in training so that the number of normal nodes participating in the training far exceeds that of malicious nodes. Experimental results demonstrate that, under non-independent and identically distributed conditions, this algorithm maintains model accuracy in scenarios involving targeted attacks, such as label flipping and backdoor attacks, thereby enhancing the robustness of the federated learning framework. The performance and reliability of the global model is maintained despite encountering malicious attacks. The proposed algorithm addresses client data pollution issues effectively and offers new insights into the security and stability of federated learning frameworks.
Roadside Units (RSU) detect and sense the road environment to form a Cooperative Perception Message (CPM), and this CPM is transmitted from the RSU to vehicles on a road. This process can expand the sensing range of vehicles and effectively improve the safety index of vehicle traveling. However, in urban scenarios with many vehicles and pedestrians, the volume of CPM data generated by an RSU after sensing road objects (including vehicles and pedestrians) is excessive, including a large amount of Perceived Object Information (POI) with low safety value. Sharing them frequently with vehicles is likely to congest networks, thereby increasing transmission delay and reducing the quality of service of the CPM. To address this issue, this paper proposes a CPM transmission control scheme called TRAC, which identifies the collision risk of road objects through Improved Time-to-Collison (Improved-TTC) to determine the traffic safety value of the POI. The RSU uses this value to selectively transmit the POI. The adaptive CPM transmission control algorithm in TRAC determines the CPM generation frequency based on the real-time network state. The CPM high-value transmission content selection algorithm determines which POI should be transmitted based on traffic safety values. This scheme effectively reduces transmission delays and improves the quality of service of the CPM. Compared with existing schemes, the CPM transmission delay of TRAC scheme is increased by up to 89%, and the quality of service of the CPM is increased by up to 8.1 times.
In recent years, Mobile Edge Computing (MEC) has attracted significant attention from academia and industry because it provides users with low-latency and high-reliability services. The deployment of edge servers plays a crucial role in implementing MEC applications and has important research value. Selecting appropriate placement locations not only meets the computational requirements but also improves system resource utilization and reduces deployment costs. This paper investigates the edge server placement problem under time-varying network conditions. First, edge servers are divided into two categories: static and dynamic. Subsequently, an Improved Snake Optimization (ISO) algorithm is proposed to determine the number and placement locations of servers at each moment to meet the transmission latency requirements for data offloading within a certain range. Finally, an interior point method is employed to further reduce service costs. Experimental results demonstrate that the proposed approach can dynamically deploy edge servers while achieving a reduction in service costs of approximately 20%-43% compared to classical algorithms under the same experimental conditions.
The convenient mobility and flexible deployment of Unmanned Aerial Vehicle (UAV) communication have made it an effective technology for meeting the needs of next-generation cellular users. Using UAV to capture and process videos through edge computing can provide high-quality video services for multiple users. This paper considers a multi-antenna fixed-wing UAV video transmission system. By reasonably planning the UAV trajectory, beamforming technology is used to guide each transmission signal to the best path on the user side, thereby reducing energy consumption and improving the strength of the received signal. The above requirements are modeled as a nonconvex problem that requires the joint optimization of the UAV trajectory, flight time, transmission beamforming, and computing resource allocation to minimize the total energy consumption of the UAV while meeting user Quality of Service (QoS) requirements. To solve this problem, a two-stage algorithm is proposed. In the first stage, path dispersion, alternating optimization technology, and Continuous Convex Approximation (SCA) technology are used to minimize the driving energy consumption of the UAV. In the second stage, the path discrete method and SCA technology are used to minimize UAV communication and computing energy consumption. Simulation results show that, compared with the benchmark scheme, the proposed algorithm significantly reduces the energy consumption of the UAV while ensuring video quality, and it is highly efficient and practical.
Current networks suffer from over-provisioning, redundancy, and congestion, leading to high energy consumption and reduced user satisfaction. The multicast routing problem, which jointly optimizes energy consumption and delay, is a NP-complete problem. A multi-objective multicast routing algorithm based on multi-step Q-Learning is proposed to solve the delay- and energy-consuming multicast routing problem in a Software Defined Network (SDN) architecture. The algorithm aims to reduce the energy consumption and delay of the network while satisfying the network performance and Quality of Service (QoS) requirements. The algorithm is based on multi-step Q-Learning, which can more accurately estimate the long-term rewards for each path. This, in turn, can select optimal actions for nodes by updating the Q-value at each step, and ultimately find the best path. By combining the reward and value functions of multiple time steps, faster convergence to the optimal strategy is possible. In addition, when setting the reward values, different weights are assigned to each objective, which are used to balance the weights occupied by the objectives. Simulation results show that the algorithm can effectively reduce network energy consumption and delay, and improve network performance compared with existing representative algorithms.
The surface quality of strip steel is an important indicator of the quality of steel products. Research on the classification of surface defects throughout the production process can reduce the occurrence of surface defects and improve the accuracy of capturing surface defect information. In the actual production process, obtaining accurate category labels for steel strip defect samples is often difficult. Therefore, unsupervised classification methods that do not rely on labeled data have gradually become a research hotspot. Existing traditional machine learning-based unsupervised classification methods are not robust against noisy data, whereas deep learning-based unsupervised methods depend on data volume. This study combines traditional machine learning and deep learning algorithms to propose an unsupervised Dynamic Weight Joint Classification (DWJC) method for surface defects in steel strips. First, the initial category labels of defect images are obtained using the texture feature clustering algorithm; then, the depth features of the image are extracted through a Convolutional Neural Network (CNN). This study also proposes a dynamic weighted re-labeling method based on KL divergence, which combines initial class labels, Softmax, and constraint clustering to continuously modify the initial class labels during model training, to obtain more stable and accurate defect classification results. In a large number of experiments on the NEU public and Baosteel defect datasets, DWJC achieves average accuracies of 99.5% and 94.3%, respectively.
As basic personal protection items, masks play an increasingly significant role in public health. Existing mask detection algorithms are limited by low precision in complex scenes. To improve precision and training steadiness, this study proposes an improved mask detection algorithm named Mask-YOLO based on YOLOv5n. Specifically, the Softplus activation function is applied to the feature extraction of convolutional blocks in the backbone network, making the model more efficient in reflecting non-linear data and converge faster during training. Coordinate Attention is added to the deep feature extraction backbone by embedding the position information of an object into the channel dimension, helping the model obtain more target features and channel information without high memory usage. Simultaneously, the Spatial Pyramid Pooling Fast (SPPF) module is replaced with the Receptive Field Block (RFB) module in the deep network, enlarging the receptive field of convolutional blocks by various dilation rates and obtaining rich semantic features of the object. Based on the original PANet multi-scale feature fusion process, weighted BiFPN style is introduced to fuse and exchange object features of different scale both semantically and spatially, to further improve the precision of small object detection. The Distance Intersection over Union (DIoU) regression loss function is used to solve the unsteadiness and leakage detection of the model. Finally, Soft-NMS is employed to further improve detection efficiency by reducing the confidence scores of the overlaps from the prediction bounding boxes. Experimental results show that Mask-YOLO improves mAP@0.95 by 8.58% compared with the baseline YOLOv5n, solving the problems of lower precision during object detection, unsteadiness in bounding box regression, and lower convergence during model training, and achieves high efficiency in mask detection.
When human emotions change, the Electroencephalogram (EEG) signals across different channels interact, and distinct brain regions exhibit characteristic interaction features in different frequency bands. To extract global interactive features and comprehensively capture the interdependence of features across various brain regions and frequency bands, this study proposes a principal diagonal nonzero Granger Causality (GC) feature extraction method and a region-specific frequency division Transformer model. First, by addressing the issue of GC values being zero when calculating self-causality measures, this study enhances the Granger causality algorithm to extract nonzero self-causal information for each channel of EEG signals. Subsequently, to overcome the common limitation of emotion recognition models that focus on local characteristics and lack a global perspective, this study leverages the observed associations between different brain regions within the same frequency band. The method partitions causality features into brain frequencies and employs a brain-frequency division Transformer model to capture the interdependence and contribution of features across different brain regions and frequency bands. Experimental results on the TYUT3.0 dataset demonstrate that when using the proposed region-specific frequency division Transformer model for classification, the principal diagonal non-zero GC matrix, compared to commonly used GC matrices, achieves an average recognition accuracy improvement of approximately 1.59 percentage points. This suggests the superiority of the proposed features. When using the principal diagonal non-zero GC matrix as features, the proposed region-specific frequency division Transformer model achieves an average accuracy of 94.50%, surpassing existing models by more than 1.89 percentage points on average recognition accuracy. This indicates the effectiveness of the approach in globally integrating features with dependencies under brain region-specific frequency divisions.
Existing palmprint recognition methods based on direction patterns use predefined filters to obtain line responses in palmprint images. However, this method relies heavily on rich prior knowledge, often ignores important direction information, and results in excessively large dimensionality. To solve the above problems, this paper proposes a palmprint recognition method based on Gabor filter and improved linear discriminant analysis. First, a two-dimensional Gabor filter is used to extract robust convolution differential features in palmprint images. The extracted features more fully describe the changes in the local orientation of each pixel in the palmprint image. Then, a discriminative feature learning model is proposed that learns discriminative features from local directional features by maximizing the inter-class distance and minimizing the intra-class distance, thereby reducing the impact of noise while reducing the data dimensionality. This paper conducts experiments on four public palmprint databases: PolyU, M_Blue, GPDS and IITD. The recognition rates on the two non-contact palmprint databases, GPDS and IITD, reach 96.80% and 99.29%, respectively. Experimental results show that the algorithm proposed in this paper can more effectively extract the discriminative features of palmprint images and significantly improve the accuracy of palmprint recognition.
In electronic laryngoscopy, the variable morphology of lesions and organs, along with unclear boundaries between lesions, organs, and mucosal tissues, leads to unsatisfactory accuracy in image segmentation of lesions and major laryngeal organs. To address this problem, a CNN-Transformer two-stream hybrid network is proposed. The Convolutional Neural Network (CNN) branch extracts fine-grained features, whereas the Transformer branch extracts global semantic features. Specifically, the hybrid network first extracts fine-grained features at multiple scales in the image through the CNN branch and then fuses the extracted features with the global semantic features from the Transformer branch. This approach effectively captures both shallow, local fine-grained representations of features and deep, global information. A dark feature enhancement module is used to enhance the feature details in the darker regions of the image before performing multilevel feature fusion. To validate the effectiveness of the method, 2 425 laryngoscopic surgical images from various medical institutions are used for experiments. The results are compared and analyzed with nine recently proposed methods, demonstrating the superiority of the proposed approach.
Surface defect detection in metal production and manufacturing suffers from problems of low detection accuracy and slow processing speed. To address these problems, this study proposes a metal defect detection method based on an improved You Only Look Once version 8 (YOLOv8) network (TCM-YOLO). This method enhances the coordinate attention mechanism to the Three-Channel Coordinate Attention (TCCA) mechanism and combines it with a second version of the deformable convolutional network, i.e., the Three-channel Deformable Convolution Network (TDCN), thereby enhancing the feature extraction ability of the network. In the feature fusion network, a bidirectional feature pyramid and Dynamic Snake Convolution (DSC) are combined to improve the missed detection rate in steel strip defect detection, and to improve the retention of tiny texture and complex defect structure information. The Minimum Point Distance Intersection over Union (MPDIoU) loss function is used to replace the original loss function to accelerate the convergence speed and improve regression accuracy. Finally, a global attention mechanism is embedded to continuously capture important information regarding the global shape of the defect. Experimental results show that the average accuracy of the TCM-YOLO algorithm on the steel strip defects dataset of Northeastern University is 81.8%, which is 7.4 percentage points higher than that of the original YOLOv8 algorithm, and the accuracy reaches 78.3%, which is 8.9 percentage points higher than that of the original model. The detection speed of the algorithm reaches 61.73 frame/s. On the Tianchi aluminum profile defect dataset, the average accuracy is 4.1 percentage points higher than that of the original YOLOv8 algorithm and 8.7 percentage points higher than that of the original model. The results show that the TCM-YOLO algorithm has high detection accuracy and fast detection speed, which improves the detection capability for metal surfaces.
To address the challenges in vehicle detection accuracy and efficiency using roadside cameras, this study presents an innovative vehicle detection framework that synergizes Convolutional Neural Network (CNN) and the Transformer architecture. Given the intricacies of traffic scenarios, we devise an adaptive spatial Transformer and combine it with ResNet50 to form a robust backbone network capable of managing diverse vehicle orientations and scales. We further refine the Transformer's input using position encodings grounded on angles and distances to ensure optimal spatial information utilization. A channel-space attention mechanism is incorporated to enhance the global contextual understanding of the images. In the decoding phase, the autoregressive approach is eschewed, facilitating parallel decoding of multiple targets, and the target query embeddings are integrated for vehicle detection tasks. Empirical evaluations on the UA-DETRAC, IITM-hetra and a proprietary dataset yield mAP@0.5 scores of 96.42%, 87.82% and 98.64%, respectively, surpassing benchmarked models across various scales. Ablation experiments underscore the pivotal role of each component in achieving superior performance.
The sign prediction of links in signed directed networks can be used to model many real-life problems; however, sign prediction is a core problem in the field of network science. The main theoretical support for sign prediction algorithms for links in signed directed networks is structural balance theory, which has profound research significance. Real-world networks are complicated. They do not precisely follow the structural balance theory, and different networks have their own unique characteristics. This study first analyzes the basic mechanisms affecting the signs of links and explores the network features reflecting the formation of signs. Next, the study defines the balanced index of a node from each remaining node and integrates its features according to Chiang's prediction method. The amount of feature information increases and the sign prediction of links in signed directed networks is achieved without increasing the computational complexity. The network features are divided into three categories and a logistic regression model is used to train and test different combinations of these features. Experimental results on several real network datasets demonstrate that the model exhibits good generalization ability and the inclusion of the node balance index feature significantly improves the predictive accuracy of the model. Finally, a logistic regression model is used to train and test all network features involved. Experimental comparisons are conducted between the proposed algorithm and the current advanced sign prediction link algorithm to validate its effectiveness.
Traffic object detection is a crucial component of intelligent transportation systems. However, existing traffic object detection algorithms can only detect predefined objects and are incapable of handling open-set object scenarios. To address this, a novel open-set traffic object detection algorithm based on a Visual-Language Pre-trained (VLP) model is proposed. First, by leveraging Faster R-CNN as a foundation, the prediction network is modified to adapt to the localization challenges of open-set objects. The loss function is refined to the Intersection over Union (IoU) loss, effectively enhancing the localization accuracy. Second, a new VLP-based Label Matching Network (VLP-LMN) is constructed to perform label matching on the predicted bounding boxes. The VLP model serves as a potent knowledge repository that effectively matches regional images with labelled text. Simultaneously, prompt engineering and fine-tuning of network modules facilitate better exploration of the VLP model's performance, significantly improving the accuracy of label matching. The algorithm achieves an average detection accuracy of 60.3% for new classes on the PASCAL VOC07+12 dataset, demonstrating its commendable performance in open-set object detection. Additionally, the average detection accuracy for new classes on a traffic dataset reaches 58.9%, with only a 14.5% decrease compared with the base classes in zero-shot detection. This underscores the strong generalization capabilities of the algorithm in traffic object detection.
The remote location of precast beam yards, complex scenes, difficulties in data collection due to poor lighting, background interference, and degraded image quality, all create challenges for precast beam processing. This study proposes a dynamic-static mutual learning detection method for precast beam processing. The study establishes a mutual learning framework on a single-stage object-detection model. It uses data augmentation techniques to enhance the ability of a model to manage sample spatial and temporal interference, constructing a dual-branch subnetwork that combines dynamic and static features. Simultaneously, a normalization-based attention channel submodule is introduced into the network to dynamically adjust the channel weights. Through these techniques, the model becomes more adaptable to the complexity of environmental lighting and the randomness of noise interference in real scenes. To fully leverage the respective advantages of the two subnetworks, the study also proposes a positive sample alignment strategy, leveraging the inherent nonunique characteristics of a single real value's predicted bounding box in the object detection model. Consequently, a dual alignment is achieved, addressing both the quantity and distribution of bounding boxes. A precast beam process dataset based on real scenarios is constructed and used to validate the effectiveness of the proposed method. The precision and mean average precision reach 97.2% and 97.7%, respectively, at an inference speed of 78 frame/s, which meets industrial application demands and offers an effective solution for precast beam process detection and recognition.