Crowd intelligence has been highlighted as one of the five major trend directions in China's New Generation Artificial Intelligence Development Plan, and the question ″How does crowd intelligence emerge?″ was listed by Science in 2021 among 125 key scientific challenges for the future. Existing studies on intelligence emergence have largely focused on collective intelligence in biological groups (e.g., flocks of birds and schools of fish), emphasizing how simple individuals following local interaction rules can give rise to globally ordered behaviors. By contrast, crowd intelligence in human societies involves not only behavioral coordination and organization but also higher-order intelligence and richer connotations manifested in knowledge, culture, and innovation. To address this discrepancy and research gap, this study, for the first time, systematically reviews and clarifies the conceptual framework and core connotations of crowd intelligence in human societies. It conducts an in-depth analysis of seven representative emergent phenomena—crowd behavior, wisdom of crowds, consensus formation, social cooperation, social learning, knowledge and culture, and collective intelligence innovation—by summarizing their key mechanisms, major models, driving factors, and evolutionary regularities, thereby constructing a theoretical system for the emergence of crowd intelligence in human societies. Building on this foundation, the study further investigates the mapping pathways and mechanisms from human crowd intelligence to artificial crowd intelligence; extracts representative paradigms and essential implementation essentials of mechanism-driven artificial crowd intelligence systems; and provides fundamental theoretical and methodological support for the design, construction, and future development of artificial crowd intelligence systems oriented toward complex tasks.
Artificial intelligence has made remarkable progress across many fields, encouraging countries to attach great importance to its research and development. However, the rapid development of artificial intelligence has also brought about a series of problems and threats, and overreliance on and blind trust in such models can lead to serious risks. Therefore, interpretable artificial intelligence has become a key element in building trusted and transparent intelligent systems, and its research and development requires immediate attention. This survey comprehensively summarizes the research progress on explainable artificial intelligence at home and abroad comprehensively from multiple dimensions and levels. Based on current research results in the industry, this survey subdivides the key technologies of explainable artificial intelligence into four categories: interpretation model, interpretation method, safety testing, and experimental verification, with the aim of clarifying the technical focus and development direction of each field. Furthermore, the survey explores specific application examples of explainable artificial intelligence across key industry sectors, including but not limited to education, healthcare, finance, autonomous driving, and justice, demonstrating the significant role it plays in enhancing decision-making transparency. Finally, this survey provides an in-depth analysis of the major technical challenges of interpretable artificial intelligence and presents future development trends, in addition to a special investigation and in-depth analysis of the interpretability of large models, which has attracted considerable attention recently.
The in-depth exploration of backdoor attacks in the field of deep learning is important for the security and robustness of deep learning models. With the widespread application of deep learning technology, the use of third-party data and pre-trained models has become common; however, this poses potential security threats. Researchers have found that malicious codes or hidden backdoors may be introduced into a model via unverified third-party resources and may be activated under specific conditions, leading to abnormal model behavior. Currently, backdoor attack methods in the field of imaging are constantly being developed; however, systematic reviews that comprehensively introduce backdoor attack techniques in the field of imaging are rare. To this end, the concepts and basic attack processes of backdoor attacks are introduced in this study. Subsequently, the differences between backdoor and adversarial attacks, as well as data poisoning attacks, are analyzed. Additionally, backdoor attack techniques in the imaging field are classified based on seven aspects: triggers, fusion strategies, target categories, model structure modifications, model weight modifications, code poisoning, and data sorting. The evolution of backdoor attack techniques is discussed, and the characteristics, performance, advantages, and disadvantages of the different techniques are analyzed. On this basis, the results of the present study are summarized and possible future research directions are analyzed from multiple perspectives, emphasizing the importance of building safe and reliable deep learning models.
Perceiving and detecting crowd congestion in public spaces is an extremely challenging task in computer vision. Research on this issue, such as analyzing the motion characteristics of crowds and constructing behavior detection models, can provide valuable insights into the motion traits and behavioral essence of crowd activities in dense scenarios. Additionally, it can assist relevant public safety departments in formulating management strategies and emergency response measures, thereby effectively preventing the occurrence and escalation of crowd-related disasters. To this end, this paper summarizes the research efforts on dense crowd congestion detection. First, an overview of the qualitative characteristics of crowd congestion from the perspectives of crowd dynamics, social force models, and fluid mechanics theory is presented. Second, existing crowd congestion detection algorithms and related computational models are investigated. Next, the public datasets and model performance evaluation methods relevant to this research are presented. Finally, the application scenarios and future research directions for crowd congestion detection are explored. A review of the current research status on the qualitative and quantitative analyses of dense crowd congestion behaviors in public spaces offers valuable references for crowd activity perception, behavior analysis and understanding, and anomaly detection in fields such as computer vision, intelligent surveillance, and artificial intelligence.
In view of missed and false detection phenomena caused by numerous small target instances and occlusions among targets in drone images, this paper proposes a lightweight small target detection algorithm for Unmanned Aerial Vehicle (UAV) images based on an improved YOLOv8. The Triple Feature Encoder (TFE) and Scale Sequence Feature Fusion (SSFF) modules are introduced in the neck to enhance the ability of the network to extract features at different scales. Furthermore, a Small Object Detection Head (SMOH) is designed and fused with the improved neck feature extraction network, and an additional detection head is also introduced to reduce the loss of small target features and enhance the recognition ability of the network for small targets. Additionally, considering the defects of Complete Intersection over Union (CIoU), a regression loss function, Wise-Inner-MPDIoU, is proposed by combining Wise-IoU, Inner-IoU, and Minimum Point Distance based IoU (MPDIoU). Finally, to realize the lightweight application requirements of the algorithm in mobile and embedded systems, amplitude-based layer-adaptive sparse pruning is performed to further reduce the model size while ensuring model accuracy. Experimental results demonstrate that, compared to the original YOLOv8s model, the improved model proposed in this paper improves mAP@0.5 by 6.8 percentage points, while reducing the number of parameters, amount of computation, and model size by 76.4%, 17.1%, and 73.5%, respectively. The proposed model is lightweight, improves detection accuracy, and has strong practical significance.
In the Image Quality Assessment (IQA), no-reference quality assessment methods have demonstrated significant application value and development potential for managing distorted images in real-world scenarios. However, real-world distorted images exhibit high diversity and complexity, which make designing relevant evaluation algorithms more difficult. In recent years, deep learning technology has achieved remarkable success in various subfields of image processing, such as image classification, object detection, and image segmentation. These advancements have motivated researchers to introduce Deep Neural Network (DNN) technology into IQA. Owing to their outstanding feature extraction and learning capabilities, DNNs have provided innovative solutions and made significant progress in the quality assessment of distorted images in real-world environments. Despite these advancements, existing methods still have certain limitations in describing the image quality in real-world scenes, particularly when handling diverse image content. Additionally, many DNN-based IQA methods require the input images to be scaled or cropped to a fixed resolution, which often compromises the original structure and content of the images, thereby affecting the accuracy and generalizability of the quality assessment. To address these issues, this paper proposes an adaptive No-Reference IQA (NR-IQA) method based on Multi-Scale Pyramid Pooling (MSPP-IQA). This method does not require preprocessing and can assess the quality of an image in its original size. Furthermore, by introducing content understanding and attention modules, MSPP-IQA can mimic the working principles of the Human Visual System (HVS), simultaneously perceiving global high-level and local low-level features. Experimental results demonstrate that, compared to current mainstream methods, MSPP-IQA performs well on both real-world and synthetic distortion datasets. These results validate the effectiveness and superiority of MSPP-IQA in addressing the challenges in assessing the quality of real-world distorted images.
In recent years, there has been significant progress in terms of accuracy and robustness of deep-learning-based algorithms for object detection that have been widely applied in industry. However, in the field of small object detection, currently used object detection algorithms suffer from high rates of missed detections and false positives. Therefore, in this study, a YOLO small object detection algorithm, viz., BS-YOLO, which is based on SCConv and BSAM attention mechanism, is developed. First, in response to the problem of the large amount of redundant information generated in the feature extraction network, a new module, viz., C3SC, is proposed to reconstruct the backbone network using SCConv. This module reduces redundant information in both spatial and channel aspects of the extracted feature maps, thereby improving the quality of the feature maps extracted by the backbone network, and in turn enhancing detection accuracy. Second, a new attention mechanism, viz., BSAM, is proposed by combining CBAM and the BiFormer self-attention mechanism, by which weights are allocated reasonably in both spatial and channel aspects, making the feature map more focused on effective information and suppressing background interference. Finally, to solve the problem of uneven distribution of difficult and easy samples in terms of small object detection, Slideloss is used to optimize the loss function, thereby improving the effectiveness of the algorithm for small object detection. The experimental results obtained using the RSOD dataset show that the BS-YOLO algorithm has a precision of 94.2%, a recall rate of 91.6%, and a mAP@0.5 of 95.9%, corresponding to improvements of 3.3, 0.1, and 3.6 percentage, respectively, compared to the original YOLOv5 algorithm. This indicates that the BS-YOLO algorithm can effectively improve the accuracy of small object detection and reduce the missed detection rate.
Existing deep learning models find it difficult to capture cloud motion patterns, resulting in long-term cloud prediction results that are fuzzy and low in accuracy. To address this problem, this study proposes a remote sensing cloud image prediction method based on a Multi-Scale Motion Memory Network (MSMM_Net). This model adopts a dual-branch memory-flow architecture that combines spatial multi-scale and motion-differential memory flows. It extracts high- and low-frequency spatial features and sequence motion features hidden in the input image sequence, thereby simultaneously obtaining global, detail, and motion information of the image. In the prediction stage, dual-branch memory is fused to alleviate the problem of feature loss and enhance the ability of the model to predict the trajectory of cloud clusters. On this basis, a fusion loss function combining pixel and edge losses is used to guide model training, enhance the model's attention to image edge details, and promote the generation of clear predicted images. Experimental results show that, compared with the benchmark model PredRNN, MSMM_Net reduces the Mean Square Error (MSE) by 31.71% on the Moving MNIST dataset and the Learned Perceptual Image Patch Similarity (LPIPS) by 64.7%. On the remote sensing satellite cloud image dataset, the Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM) indicators improve by 5.51% and 5.38%, respectively, indicating that the predicted image sequence generated by the model is more similar to the real image sequence and can effectively improve long-term prediction accuracy.
This study proposes a Few-Shot Object Detection (FSOD) method based on a query-guided strategy and semantic enhancement mechanism to address the following concerns: the lack of prototypical key information, insufficient adaptation to query images in the meta-learning paradigm, and the detector's sensitivity to the variance of the novel class leading to misclassification. The Query Guidance Module (QGM) conditionally couples query-aware information into support features by learning the correlation between the query and support features, aiming to generate specific and representative prototypes for each query image. The Visual Semantic Enhancement Module (VSEM) distils the knowledge from textual semantic information that matches the novel class of visual features and adaptively enhances these features to improve their discriminability and mitigate variance sensitivity for better classification. In addition, the classification and regression tasks are decoupled, and semantic enhancement is performed on the classification branch to facilitate the model's understanding of the target semantics. The experimental results demonstrate that, compared to the currently known state-of-the-art SMPCCNet method, the proposed approach achieves an average improvement of 2.2 percentage points in novel Average Precision (nAP) on the PASCAL VOC dataset and an average improvement of 1.0 percentage points in Average Precision (AP) on the MS COCO dataset, validating its effectiveness.
This study presents a finger-vein recognition method based on a Graph Convolutional Neural Network (GCNN) to overcome the low recognition rates and high computational cost of traditional methods. The study aims to address issues of graph structure instability and degraded matching efficiency in current finger-vein graph models. For this purpose, a Simple Linear Iterative Clustering (SLIC) superpixel segmentation algorithm is utilized to construct a weighted graph, based on which the GCNN is adapted for graph-level feature extraction. A dual-branch multi-interaction deep Graph Convolutional Network (GCN) is proposed to enhance the node's capability to represent higher-order features, to effectively capture these features in the graph data while avoiding oversmoothing. This study first adjusts the graph structure based on node features. Subsequently, by integrating the original and reconstructed graph structures, a dual-branch network architecture is built to fully explore higher-order features. Furthermore, a feature channel interaction mechanism is designed to facilitate information exchange between different branches, thereby improving feature diversity. Experimental results on multiple standard datasets for finger-vein recognition show that the proposed network reduces recognition time per image, improves efficiency, and effectively alleviates oversmoothing. Compared with the single-branch GCN, it improves recognition accuracy by an average of over 1.5 percentage points.
In social networks, users' mobile behavior is jointly driven by temporal periodicity, geographical proximity, and semantic category preferences. However, interaction data are often highly sparse. Existing methods mostly focus on modeling user sequences, often failing to uniformly capture and ensure the complex consistency associations among the above spatiotemporal semantic factors, resulting in insufficient robustness of the learned patterns from sparse data. Therefore, this paper proposes the concept of overall spatiotemporal consistency, which comprehensively considering the temporal and spatial consistency at each stage of the user and Point-of-Interest (POI) joint prediction task, to achieve collaborative geography-wise and category-wise prediction. Specifically, this study considers the three-dimensional feature space of time, geographical coordinates, and semantic categories, as well as the temporal consistency between geography-time and category-time space and the spatial consistency between geography-category space. Corresponding consistency constraints are introduced in the feature space embedding, influence representation, influence decoupling, and influence-based fusion inference stages to construct an improved disentangled graph embedding prediction model. The model first introduces a spatial consistency constraint based on the aggregation dependency between geography-category embeddings. Then, it uses a graph neural network to extract five types of influence factors and achieves disentangled learning based on temporal consistency through a time-space dual-domain parallel influence decoupling method. Finally, it obtains semantic category prediction based on the geographical coordinate prediction results and the category aggregation dependencies, interacting with spatial consistency between geographical and categorical dimensions. Experimental results demonstrate that the proposed method is superior to baseline models on the Foursquare dataset. Removing the embedding layer aggregation module reduces the Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) and log loss of the prediction task by 6.13% and 36.29%, respectively, compared with the best baseline. This is a highly efficient spatiotemporal semantic multi-consistency modeling approach. The gain of the inference layer aggregation module is related to the data scale and can provide fine-grained adjustments to the prediction results. The temporal feature module can provide important behavioral prior information for the model under the condition of sparse check-in data.
In Linux servers, kernel Rootkit can be concealed in the operating system for a long time, causing serious kernel damage. In particular, unknown Rootkit, with random attack occurrence time and spatial distribution, pose a significant challenge to discovering the source of an attack. Because the source code is unknown, conventional methods face difficulties in analyzing its behavioral characteristics and are unable to pre-set detection points at appropriate locations. To address this threat, this study proposes a kernel Rootkit detection method based on multidimensional view tracing. By cross-comparing multiple views in both the spatial and temporal dimensions, the malicious behavior of unknown kernel Rootkit is detected and hidden data are restored. Experiments and analyses show that the proposed method is efficient in detecting kernel Rootkit, with a CPU overhead of only 0.38% in the case of a secure response cycle of 0.1 s.
Android is currently the most widely used operating system for mobile smart terminals; however, the constant emergence of Android malware poses a significant threat to users. Some methods process the features extracted from static analysis to detect Android malware. These methods can reflect some attributes of the software but cannot capture the characteristics of the potential intentions behind malicious behavior; therefore, achieving good detection performance when facing Android malware with evasion capabilities is a challenge. To address this issue, this study proposes an Android malware detection method based on static feature combination in Graph Neural Network (GNN). The function call graph is extracted from the decompiled file. node2vec is used to construct the local structural features of each node, the functions of each node are analyzed, opcodes are extracted and classified, the Katz algorithm is used to calculate node importance, and the importance coefficient of each Application Program Interface (API) node in the graph is calculated for the Android malware and its malicious family according to the TF-IDF algorithm. These features are combined into node features, and feature self-looping is performed on important nodes to enhance the feature differences between nodes. On this basis, a classifier, DAg_MAL, based on a Directed GNN (DGCN) and Graph Attention Network (GAT) is designed. The classifier adopts a gPool layer, which can effectively capture the key call relationships in software behavior and exclude unimportant nodes. Experimental results show that the proposed method achieves good performance in both binary and multi-classification tasks, outperforming other similar methods.
Conditional Privacy-Preserving Authentication (CPPA) effectively supports secure communication in Vehicular Ad-hoc Network (VANET). To address the issues of inefficient message authentication and adversarial threats to data privacy in VANETs, a lightweight blockchain-based CPPA scheme is proposed. The proposed scheme utilizes an improved noninteractive Schnorr signature algorithm based on elliptic curve cryptography in the message verification phase, which avoids time-consuming cryptographic operations. The scheme employs entity-generated short pseudonyms for anonymous communication and introduces a consortium blockchain into the system architecture. The sender's identity is automatically verified during the message verification process using smart contract technology, thereby reducing the communication overhead of the scheme. Security analysis shows that the proposed scheme is secure against forgery under an adaptive chosen-message attack in the Random Oracle Model (ROM) while satisfying the requirements of conditional privacy preservation. Performance analysis measures the execution time of the proposed scheme and the existing CPPA scheme using cryptographic operations with the MIRACL cryptographic library. A Hyperledger Caliper is used to evaluate the transaction latency and throughput of the blockchain test network. The results demonstrate that the proposed scheme achieves lightweight computation and communication overhead in the signature generation and single-message verification phases and effectively supports batch message verification.
There is a large amount of hidden information about cyber attacks or cybercrime in the dark web. Previous studies have mainly focused on analyzing general open source threat intelligence or working on a certain aspect of the dark web threat intelligence, lacking a systematic method to process and analyze dark web information and ignoring its characteristics. In order to analyze, screen, and extract the vast content of the dark web, a high-quality threat intelligence acquisition technology for the dark web is proposed using intelligence related to network security threats. It consists of four modules: information crawling, topic clustering, entity recognition, and novelty detection. Considering the dark web forum as an example, data from multiple forums are crawled by a crawler targeting the dark web forum. Top2Vec is used to embed the forum titles and posts into the same vector space in the form of words and documents, respectively. The discussion topics of the posts are analyzed, and threat intelligence-related contents are screened for coarse grains to remove noise from the crawled information. Then, named entity recognition is used for fine-grained filtering to extract threat intelligence entity words from the posts. On this basis, the information content of the entity words in the open web is calculated to evaluate the importance of the extracted information and ultimately select high-quality network security-related dark web threat intelligence. The experimental results show that this method is effective and can extract network threat intelligence from the collected dark web information.
With the increasing popularity of social media, Multimodal Sentiment Classification (MSC) has received widespread attention in recent years. Target-oriented Multimodal Sentiment Classification (TMSC) is an important task in the field of multimodal sentiment analysis, which aims to predict the sentiment polarity of a referred entity by combining multiple modal information, such as text and images. Although many scholars have proposed numerous modeling methods for this task, these methods are still unable to achieve accurate entity alignment between text and images, which directly affects model accuracy on a target task. To address this problem, this study proposes a model for target-entity sentiment classification with Image-Text Multimodal Entity Alignment (ITMEA). The model first adopts Adjective-Noun Pairs (ANPs) extracted from an image to design sentiment auxiliary information such that the key sentiment information of the target entity in an image can be expressed more intuitively. Simultaneously, feature description information is designed by adopting the multimodal Large Language Model (LLM), LLaMA-Adapter V2, achieving accurate intermodal target entity alignment. Moreover, the model constructs a gating mechanism in the intermodal feature fusion stage to prevent irrelevant information from introducing additional interference, by dynamically controlling the input of information other than text. Experimental results on two Twitter benchmark datasets, Twitter-2015 and Twitter-2017, show that ITMEA improves accuracy by approximately 1.00 and 0.57 percentage points, respectively, in comparison with the optimal method among compared baselines, thus validating the effectiveness and superiority of the methods designed in this study.
Intent recognition is important in natural language understanding. Previous research on intent recognition has primarily focused on single-modal intent recognition for specific tasks. However, in real-world scenarios, human intentions are complex and must be inferred by integrating information such as language, tone, expressions, and actions. Therefore, a novel attention-based multimodal fusion method is proposed to address intent recognition in real-world multimodal scenarios. To capture and integrate the long-range dependencies between different modalities, adaptively adjust the importance of information from each modality, and provide richer representations, a separate self-attention mechanism is used for each modality feature. By adding explicit modality identifiers to the data of each modality, the model can distinguish and effectively fuse information from different modalities, thereby enhancing overall understanding and decision-making capabilities. Given the importance of textual information in cross-modal interactions, a multimodal fusion method based on a cross-attention mechanism is employed, with text as the primary modality and other modalities assisting and guiding the interactions. This approach aims to facilitate interactions among textual, visual, and auditory modalities. Finally, experiments were conducted on the MIntRec and MIntRec2.0 benchmark datasets for multimodal intent recognition. The results show that the model outperforms existing multimodal learning methods in terms of accuracy, precision, recall, and F1 score, with an improvement of 0.1 to 0.5 percentage points over the current best baseline model.
Owing to the heterogeneity and complexity of Android malware, traditional static analysis methods that rely on single features such as permissions or API often struggle to accurately differentiate between benign and malicious applications. To address this limitation, this study proposes a novel feature construction method based on multi-modal feature fusion based on in-depth research of Android software features such as permissions, API, bytecodes, and opcodes. The bytecode is transformed into RGB images and visual representations are extracted using the pretrained EfficientNetV2B3 model to capture the high-level characteristics of Android applications. Additionally, Locality-Sensitive Hashing (LSH) is employed to extract opcode sequence features that represent low-level, detailed characteristics of the application. These heterogeneous features are then fused using a Multimodal Factorized Bilinear pooling (MFB) algorithm to create a more discriminative representation of the malware. Building on this enhanced feature representation, a Transformer Encoder-based Android Anomaly Detection (TEAAD) model is introduced. By leveraging the transformer architecture, the TEAAD effectively learns to detect anomalies in Android malware. The experimental results demonstrate that the TEAAD model based on fused features outperforms other deep-learning models, achieving a detection accuracy of 96.87%. The MFB feature fusion method exhibits superior malware identification capabilities compared with other research methods.
To achieve faster and wider dissemination effects, social media platforms often use multimodal information, such as text, voice, and images, to publish cyberbullying comments. Multimodal information can express the emotions of information publishers in greater detail and provide multidimensional information sources for researchers to automatically detect cyberbullying. Current multimodal network bullying speech detection models primarily focus on the complex fusion of large-scale interactive spaces and lack an analysis of potential commonalities and differences between modalities. Therefore, multimodal network bullying detection based on simple feature fusion does not achieve ideal performance, and model training is significantly time-consuming and difficult to converge. This study proposes a multimodal detection model based on spatial features to address this issue. First, features are extracted for each single mode, and then the features are fused using the hierarchical attention mechanism of the Hadamard product by constructing shared and specific feature spaces. The fusion process does not simply rely on output attention scores for simple weighting but independently reassigns attention weights so that modalities do not interfere with each other and the feature integrity of shared and specific spaces are preserved. Finally, a dual layer perceptron structure is used to detect cyberbullying speech. Results show that the model achieves good detection performance and convergence on both the CMCAD and CMU-MOSI datasets.
Deep learning-based recommendation systems are commonly used to provide personalized recommendations. In a common storage—compute disaggregated inference architecture, the inference speed of the recommendation system is limited by the internode network transmission bottleneck caused by embedding queries. The emerging SmartNIC technology enables complex traffic control without contending for host Central Processing Unit (CPU) resources, offering new possibilities for optimizing the embedding layer in disaggregated recommendation systems. This study proposes SmartNIC-offloaded Worker Node (SmartWN), a disaggregated recommendation system worker node optimized via SmartNIC. By leveraging the independent computing and communication capabilities of SmartNICs, SmartWN implements embedding query reordering and preparation, along with traffic-aware dynamic cache management for multiple embedding tables without impacting host resources. This significantly improves communication efficiency and cache utilization during recommendation inference, reduces embedding query latency, and enhances overall system performance. This study implements SmartWN on an NVIDIA BlueField-2 SmartNIC and demonstrates its performance improvements. Compared to existing technologies, using SmartWN as a compute node in a disaggregated recommendation system significantly enhances the embedding layer query throughput by 2.13x and reduces query latency by approximately 50.6%.
The Posit format, a novel floating-point representation, offers significant advantages over the IEEE 754 standard in terms of dynamic range and rounding error management. However, its hardware implementation, particularly the design of the mantissa multiplier, poses challenges. Therefore, this paper introduces an enhanced Wallace tree algorithm named 3L-Wallace tree, which reduces the number of stages in partial product summation, thereby decreasing both hardware resource consumption and overall latency. This improvement is achieved by adding specific counters, redesigning the layout of the partial product summation stage counters, and enhancing the adders used in the final summation stage. Furthermore, the paper implements the 3L-Wallace tree in the optimization of the Posit multiplication unit. Additionally, a modular design approach is introduced, dividing large bit-width multipliers into smaller, more manageable modules, thereby simplifying the design process and easing implementation difficulties. A dynamic selection algorithm is also designed, which dynamically selects multipliers of appropriate bit-width based on runtime mantissa width to avoid hardware resource waste. Experimental results show that the 3L-Wallace tree algorithm reduces hardware resource consumption by an average of 9.5%, power consumption by an average of 8.1%, and latency by an average of 10.4%, outperforming traditional methods, particularly in the implementation of large bit-width multipliers.
With the continuous increase in the scale of global data, the effective and inexpensive improvement of data access performance is an important challenge faced by storage systems. An effective solution is to build cache systems using low-latency, high-bandwidth Solid-State Drives (SSD) and low-cost, high-storage-density Shingled Magnetic Recording (SMR). However, the inherent mechanical motion and multitrack stacking characteristics of SMR result in poor write performance, and the frequent write-back of dirty data in SSD to SMR may cause severe long-tail latency owing to the large number of Read-Merge-Write (RMW) operations. To this end, a cache replacement optimization strategy combining a reinforcement learning Q-Learning algorithm is proposed based on the SSD-SMR hybrid storage architecture. By learning the empirical relationship between the I/O load status and the latency of the SMR devices, write operations to the SMR can be controlled. When the SMR load is high, controlling the eviction of dirty data in the cache can reduce the number of RMW operations caused by SMR write-backs, thereby optimizing the tail latency overhead of the system under different loads. The Q-Learning algorithm is combined with the data-popularity-based caching algorithm LRU and the SMR aware caching algorithm SAC and tested using real enterprise Trace and simulated Trace generated by YCSB. The experimental results show that the proposed method can effectively improve the performance of existing caching algorithms, reducing the average latency by 57.06% and tail latency by 87.49%.
In high-energy physics experiments, data processing typically involves a compute-storage separated computing model. During the computation process, data must be transferred between the computing and storage nodes. The continuous growth in experimental data and data analysis demands has led to data transfer bottlenecks, reducing the overall processing efficiency of these systems. This paper proposes a computational storage system for high-energy physics. First, the storage software EOS is extended. Computational storage plugins are introduced by building on the original architecture. After parsing user commands, the storage server executes local computations based on the file I/O, thereby reducing data movement, alleviating network pressure, and enhancing data processing efficiency. Second, a computational storage server based on a Central Processing Unit-Field Programmable Gate Array (CPU-FPGA) heterogeneous computing architecture is constructed. Considering the lower computational complexity of I/O-intensive tasks, tasks suitable for parallel computing are offloaded to the FPGA via the PCIe bus, thereby extending the computational capabilities of the storage server. Experimental evaluations show that the computational storage system eliminates queuing time and network latency, thereby shortening the overall execution time of computational tasks. Moreover, leveraging FPGA-based hardware acceleration effectively compensates for the weak computing performance of CPUs in storage servers, thereby enhancing the algorithmic versatility of computational storage devices. In tests based on decoding by LHAASO, the computational storage system achieves a speedup of approximately sixfold.
In recent years, with the development of Unmanned Aerial Vehicle (UAV) technology and its widespread application in military, logistics, agriculture, and other fields, the problem of UAV swarm trajectory planning has received extensive attention. Traditional optimization algorithms such as simulated annealing, genetic algorithms, and particle swarm optimization can achieve good results in some cases. However, when dealing with larger and more complex UAV swarm tasks, they often face issues such as low computational efficiency and getting stuck in local optima. Quantum annealing, which has the unique advantage of quantum tunneling, can effectively avoid local optima. Therefore, this study proposes a UAV swarm trajectory planning algorithm based on quantum annealing. The trajectory planning problem is converted into a Quadratic Unconstrained Binary Optimization (QUBO) problem. Using a two-stage processing strategy, the quantum annealing method clusters the task points and simulates the trajectory for each category, effectively reducing time complexity. Results show that quantum annealing has a higher probability of finding better paths than simulated annealing, demonstrating a better ability to escape the local optima problem. Additionally, the study considers four common scenarios that UAV swarms encounter during missions, designs corresponding dynamic task allocation schemes and modifies the objective function and constraints of quantum annealing. Results indicate that the UAV swarm trajectory planning algorithm can handle common scenarios effectively, ensuring that the UAV swarm can flexibly respond and efficiently complete tasks collaboratively.
The significance of big data computation frameworks such as Apache Spark for large-scale data analysis is becoming increasingly prominent. However, handling data-intensive jobs by relying solely on local computing resources is difficult. Therefore, a feasible solution is to rent cloud resources from public cloud service providers and fully deploy Spark clusters in the cloud. However, this operation incurs high deployment costs. To reduce costs, an increasing number of users are choosing to use local and cloud resources together to build hybrid cloud computing clusters. However, in Spark clusters deployed in hybrid clouds, scheduling jobs while simultaneously meeting multiple service-level agreement requirements (such as minimizing costs and ensuring job deadlines at the same time) is challenging. Existing research mainly focuses on ways to reduce cluster usage costs or improve job deadline satisfaction rates, without considering the balance between these two goals. This paper proposes a Deadline-Cost Aware Ant Colony Optimization (DC-ACO) algorithm to solve the job scheduling problem in hybrid clouds. DC-ACO can optimize the pricing of different Virtual Machine (VM) instances in a hybrid cloud deployment cluster while maximizing the percentage of job deadlines met. In extensive simulation experiments, DC-ACO is compared with baseline methods. The results demonstrate that the proposed algorithm exhibits robust scalability, achieving an approximately 20% increase in the job deadline fulfillment percentage, coupled with a notable 10% reduction in VM usage costs for hybrid clusters.
Sparse Matrix-Vector multiplication (SpMV) is the computational core and bottleneck of sparse linear systems, and its computational efficiency affects the overall performance of iterative solvers. Its optimization has long been a research hotspot in the fields of scientific computing and engineering applications. The discretization of partial differential equations produces sparse diagonal matrices, and because of their diverse distributions of nonzero elements, no single method can achieve optimal time performance across all matrices. To solve these problems, a Graphics Processing Unit (GPU)-based sparse diagonal matrix adaptive SpMV optimization method called Adaptive SpMV Tuning (AST) is proposed. This method designs a feature space and constructs a feature extractor to extract fine-grained features of the matrix structure. By analyzing the correlation between these features and SpMV methods, it establishes a scalable set of candidate methods and forms a mapping relationship between the features and optimal methods. Subsequently, a performance prediction tool is built to efficiently predict the optimal method for the matrix. The experimental results show that AST can achieve a prediction accuracy of 85.8%, with an average time performance loss of 0.09. Compared to Diagonal (DIA), Hacked DIA (HDIA), Hybrid of DIA and Compressed Sparse Row (HDC), DIA-Adaptive, and Divide-Rearrange and Merge (DRM), AST can achieve an average speedup in kernel runtime of 20.19, 1.86, 3.06, 3.72, and 1.53 times, respectively, and a speedup in floating-point performance of 1.05, 1.28, 12.45, 1.94, and 0.97 times, respectively.
Researchers have introduced the theory of tolerance relation rough sets to address the data filtering problem in distributed Incomplete Information Systems (IIS). With the continuous growth in data volume, it is necessary to achieve scalable parallel computing through distributed computing. Consequently, distributed tolerance relation rough sets have emerged, among which the Block Set is the core method for computing approximate sets. However, the Block Set only uses set operations during computation, with no structure between the data, and the process involves a large number of repetitive calculations, resulting in low computational efficiency. This study proposes a Tolerance Relation rough sets Distributed (TRDG) algorithm, which is based on graph optimization, to address this issue. Using the existing concepts of reliable and disputed elements, a hierarchical directed acyclic graph is constructed with data in the IIS as nodes and asymmetric tolerance relationships as edges. The graph structure is used to organize the data. To improve the computational efficiency of the Block Set in distributed environments, the study proposes a strategy that uses the nearest tolerance relationship instead of the general asymmetric tolerance relationship to remove redundant edges, simplify the graph structure, and obtain a Block Set based on the path from reliable elements to zero-degree disputed elements. Distributed graph optimization and path search algorithms are then implemented on the Spark platform, which ultimately completes the design of the TRDG algorithm. Experimental results show that the TRDG algorithm exhibits good parallel acceleration performance. Compared to traditional general tolerance rough approximation set solving algorithms, TRDG can save computing resources, increase the average computing speed by approximately 40 times, and increase the amount of data that can be processed by more than 50 times.
Cloud database technology has been widely used because of its flexible expansion, ease of management, and on-demand charging. Businesses usually select cloud database products based on their specific application scenarios and requirements. Service providers determine the usage of different types of resources, such as computing and storage, to satisfy service requirements. Accurate prediction of cloud product usage is critical for improving resource usage efficiency, reducing operational costs, and ensuring Quality of Service (QoS). However, predicting cloud database product usage is complex. A usage sequence typically comprises multiple interrelated components with complex entanglements. Additionally, the behavioral characteristics of different businesses vary according to cloud products and billing items, which poses a significant challenge for accurate usage prediction. To solve this problem, this study proposes a cloud database product usage prediction model based on component decomposition and multimodal fusion. This model effectively decomposes a time-series with complex entanglement, fuses multimodal demand data, builds a mapping relationship between demand and usage trends, and automatically adjusts the weight parameters of its components to obtain accurate prediction results. In this study, real production data from four major cloud database products from the Ali cloud computing service providers are used to evaluate the prediction effect and the performance is compared with that of five other prediction algorithms. Analyses of evaluation metrics, such as the Mean Absolute Percentage Error (MAPE), reveal that the proposed model improves prediction accuracy to different degrees in the four cloud database products, approximately 18.6%-51.8%. Therefore, this model can be applied to cloud database product usage prediction scenarios and help cloud service providers in improving the accuracy of resource capacity planning.
Multitarget accurate fish counting is crucial for the intelligent monitoring of water ecology and the intensive cultivation industry, playing a significant role in the protection of aquatic ecological environments and the modernization of fish farming. Existing methods for accurate tracking and counting fish with multiple targets are primarily suitable for ideal situations such as clear fish appearance, slow swimming speed, and stable direction. However, they often prove ineffective in complex real-life situations such as mutual occlusion, rapid swimming, and changeable direction of fish. Therefore, the lightweight target detection model YOLOv5n is combined and a method for tracking and counting fish based on the matching mechanism of horizontal similarity is proposed. This method regards the fish counting problem as a multitarget detection and tracking problem, proposes a horizontal similarity matching mechanism, and optimizes the Simple Online and Realtime Tracking (SORT) algorithm. The horizontal distance of the center point of the detection frame is limited using the position relationship between individual fish in the high-speed water flow to effectively solve the problem of target matching confusion in the SORT algorithm and significantly improve the tracking performance. The results show that the performance of the proposed method is significantly better than that of existing methods on a multitarget tracking dataset. Additionally, the target tracking performance significantly improves under the conditions of target occlusion and direction change. The proposed method has the advantages of simple structure and easy application.
The large-scale grid integration of new energy sources such as wind power is an important measure to accomplish the goal of the ″double carbon″. Reliable wind power prediction is important to ensure the safe operation of the power grid. Therefore, an ultra-short-term wind power combination prediction model is proposed. First, the original sequence of wind power is screened for outliers and corrected. The corrected data validates the objective law. Second, the two-layer decomposition algorithm is used to decompose the original sequence. The application of the modal decomposition algorithm can achieve sub-sequences with more predictable trends, which reduces the difficulty of wind power prediction. Subsequently, the Multi-Objective Improved Slime Mould Algorithm-Support Vector Machines (MOISMA-SVM) is constructed to accurately predict the subsequences and perform additive reconstruction. MOISMA optimizes multiple objective functions while optimizing SVM parameters to obtain wind power prediction results. Finally, the MOISMA-SVM model is applied to further correct the absolute error of these predictions, with the error correction results added to the initial wind power forecasts to produce the final point predictions. Experimental results demonstrate that the proposed model achieves the best error metric performance across both datasets, with Mean Absolute Errors (MAE) of 0.505 7 MW and 0.672 6 MW, representing improvements of 98.79% and 98.50% over the baseline SVM model, respectively. This highlights the high accuracy and robustness of the proposed approach. Based on the point prediction results, an improved kernel density estimation interval prediction model is also established, which generates prediction intervals with high reliability and narrow bandwidth. The Coverage Width-based Criterion (CWC) values for the two datasets are 0.002 4 and 0.002 8, respectively, enabling a more precise characterization of wind power fluctuations and enhancing the overall practicality of the model.
Traffic flow prediction is crucial for intelligent transportation systems; however, the existing methods cannot accurately capture the temporal and spatial correlation of traffic data. To further explore the complex spatiotemporal correlation of road networks and improve prediction performance, a spatiotemporal graph attention network GL-STAGGN model considering global-local spatiotemporal perception is proposed. First, the spatiotemporal heterogeneity of traffic flow is represented by embedding the spatiotemporal location of the input data to enhance the feature representation of spatiotemporal data; subsequently, global-local time-aware multi-head self-attention synchronization is used to mine the global and local spatiotemporal dynamic correlation. Second, a graph attention network and a dynamic graph convolutional network based on the attention mechanism are introduced to aggregate local node features and dynamically adjust the spatial correlation intensity for capturing the internal correlation between global and local spatial correlations in depth. Finally, the GL-STAGGN model is constructed using an encoder-decoder architecture to fuse the spatiotemporal components. Experimental results on real-world highway traffic datasets, PEMS04 and PEMS08, show that compared with the advanced method DSTAGNN, which does not consider the global-local spatiotemporal relationship and spatial heterogeneity, the average Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Mean Absolute Percentage Error (MAPE) decreased by 2.8%, 2.3%, and 3.3%, respectively. Furthermore, GL-STAGGN performs better than most existing baseline models in terms of supporting intelligent transportation systems.
Distributed satellite formation mission planning can simultaneously manage multiple Earth observation missions experiencing time and resource conflicts. However, as the number of satellites and missions increases, these conflicts severely reduce observation benefits and the quality of mission completion. To address this issue, this study proposes a partitioned Space Projection Particle Swarm Optimization (SPPSO) algorithm to adapt the constructed hybrid integer programming model for mission planning. First, the algorithm partitions the population into different search spaces based on fitness levels. Subsequently, it uses a projection strategy based on Fast Fourier Transform (FFT) to reconstruct the population within the search space and employs a perception operator to guide particles with lower fitness toward the optimal space. This approach enhances the convergence speed and effectively reduces the risk of becoming trapped in the local optima. To validate the effectiveness of the SPPSO algorithm, it was compared with state-of-the-art PSO variants and other well-known scheduling algorithms for similar planning problems using international standard test functions. According to the Wilcoxon rank-sum and Friedman test results, the SPPSO algorithm achieved the highest average ranking for both the unimodal and multimodal functions. Furthermore, in the simulation test cases of four different scales (25-100), the SPPSO algorithm consistently achieves the highest observation benefit values and mission completion rates. Compared with the suboptimal algorithm, the SPPSO algorithm improved the observation benefit value and mission completion rate by 6.8% and 7.5%, respectively, for the largest-scale tasks, thus validating its effectiveness in increasing the convergence speed and mitigating the risk of local optima.
Maize is a vital economic crop that is widely used in industries, animal husbandry, and grain-oil processing. Timely identification of maize diseases is crucial for ensuring a stable yield. Currently, deep learning methods such as Convolutional Neural Networks (CNNs) have been widely applied to disease recognition. However, most existing methods rely solely on image information, overlooking the features of other modalities. Moreover, their large parameter sizes and high deployment costs hinder their practical applications. To address these challenges, we propose a lightweight image-text multimodal cache model, MF-cache, that contains only 61 000 parameters, ensuring both low computational cost and high recognition accuracy. The model leverages the multimodal pre-trained model CLIP to extract image and text features, which are fused in parallel to form a key-value cache structure enriched with domain knowledge. Additionally, a weighted two-stage fusion mechanism is introduced to dynamically adjust the contribution of each modality to the classification outcome, thereby enhancing both stability and interpretability. To improve robustness, various data augmentation strategies have been employed to increase sample diversity and mitigate overfitting in low-data scenarios. Experimental results on a self-constructed dataset, CornI&T, and the public PlantVillage dataset demonstrate the effectiveness of the proposed method, achieving 99.72% and 98.80% accuracy, respectively. These results indicate that the method achieves an excellent recognition performance while maintaining a low computational overhead, thus offering an efficient and practical solution for crop disease detection. Furthermore, it highlights the potential of combining multimodal pretrained models with few-shot learning in intelligent agricultural applications.
This study proposes a skin melanoma segmentation algorithm, YOLOv8-Skin, designed to address the issue of imprecise results in existing algorithms caused by diverse shapes and blurred edges. YOLOv8-Skin combines multiscale feature extraction and enhanced edge segmentation based on YOLOv8. First, the backbone network CSPDarkNet53 of YOLOv8 is replaced with U-Net v2, which is more suitable for medical image segmentation. This change introduces rich semantic information into low-level features and refines high-level features, enabling precise delineation of lesion boundaries and effective extraction of small structures in melanoma images. Second, a Deformable-Large Kernel Attention (D-LKA) mechanism is introduced into the neck's C2f, enhancing the model's ability to capture irregular image structures through deformable convolutions and improving multilevel feature fusion using large kernel convolutions. Finally, a Diverse Branch Block (DBB) is incorporated into the head, forming a new segmentation head that enhances the representation capability of single convolutions by combining diverse branches of different scales and complexities. This enriches the feature space and improves feature extraction. Experiments conducted on the ISIC2017, ISIC2018, and PH2 datasets verify the algorithm's effectiveness. On the ISIC2017 dataset, the Dice coefficient, Specificity, Sensitivity, and Accuracy reach 88.86%, 91.34%, 97.24%, and 96.29%, respectively. On the ISIC2018 dataset, they reach 91.64%, 95.42%, 96.69%, and 95.83%, respectively. On the PH2 dataset, they reach 95.92%, 95.43%, 97.02%, and 96.13%, respectively. The algorithm demonstrates stronger segmentation performance and is better suited for melanoma segmentation tasks compared to existing methods.
As important infrastructures, bridges may face considerable safety hazards due to the long-term influence of the natural environment and daily loads. Therefore, the health status of bridge structures must be monitored and predicted in real time. In existing studies, issues such as easy errors in prediction errors, poor stability, and lack of real-time performance monitoring have been identified that inhibit the the prediction of health status of complex bridge structures. To resolve the aforementioned issues, this study proposes a Stacked Gated Recurrent Unit (GRU) with Attention and Auto-Cycle (SGRUA) model based on a stacked GRU encoder-decoder. It improves the accuracy and stability of prediction by better capturing long-term dependencies and important features in time-series data and uses a smaller number of parameters to increase prediction speed, making predictions in real time. First, missing values are filled, and outliers are detected and processed for actual bridge monitoring data to ensure that the data meet the integrity and availability requirements for time-series prediction. Subsequently, the SGRUA model is used to predict the bridge dynamic strain index in the time-series, and the effectiveness of the model is verified through comparative tests and ablation experiments. The experimental results show that, compared with the TSMixer time-series prediction model, the SGRUA reduces the Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Symmetric Mean Absolute Percentage Error (SMAPE) indicators by 11.07%, 11.02%, 11.00%, and 10.96% on the Bridge B dataset. The SGRUA provides a new and effective method for bridge structure health monitoring and prediction. Additionally, it provides useful solutions for health monitoring problems of other similar structures.
For stable operation of the power system and to meet its demand for short-term power load forecasting accuracy, a short-term power load forecasting method based on the improved Convolutional Neural Network and Gated Recurrent Unit (CNN-GRU) model is proposed. A Kernel Principal Component Analysis (KPCA) is used to process the multidimensional input data, and the primary influencing factors are effectively extracted as inputs for the subsequent prediction model. A CNN-GRU combination model with an improved Osprey Optimization Algorithm (OOA) is constructed for training and prediction, and an attention mechanism is introduced to strengthen the influence of important information for enhancing the prediction performance of the prediction model. Finally, the eXtreme Gradient Boosting (XGBoost) algorithm optimized by Bayesian Hyperparameters (BH) theory is used to optimize the prediction error, a simulation model is constructed for comparison with multiple models, and the effectiveness of the proposed method is verified based on the obtained prediction effect curves and various performance indexes. The experimental results show that the Mean Absolute Percentage Error (MAPE) of the proposed CNN-GRU model during training and testing are 1.56% and 1.99%, respectively, indicating that the proposed model has improved prediction accuracy.