In the context of ongoing advancements in educational informatization, constructing precise and efficient curriculum knowledge graphs has become key to promoting personalized education development. As a structured knowledge representation model, curriculum knowledge graphs reveal complex relations between curriculum content and learning objectives to optimize the allocation of educational resources, and tailoring personalized learning paths for learners. This survey presents a discussion around the techniques used to construct curriculum knowledge graphs, starting with an explanation of the basic concepts; intrinsic connections; and significant differences among general, educational, and curriculum knowledge graphs. It then delves into the key technologies used for building curriculum knowledge graphs, covering aspects such as curriculum ontology design, entity extraction, and relation extraction, and provides a detailed analysis and summary of their evolution, key features, and limitations. Furthermore, it explores the application value of curriculum knowledge graphs in scenarios such as learning resource recommendation, learner behavior profile and modeling, and multimodal curriculum knowledge graph construction. Finally, it focuses on the challenges in constructing curriculum knowledge graphs, such as data diversity and heterogeneity, difficulties in quality evaluation, and the lack of cross-curriculum integration, and provides future-oriented insights based on cutting-edge technologies such as deep learning and Large Language Models (LLMs).
Password leakage incidents often involve the leakage of user passwords and identity information. Because users are accustomed to reusing passwords across multiple network services, attackers can tweak leaked passwords to accurately attack user accounts. This is called a credential tweaking attack. By analyzing large-scale leaked passwords and the corresponding user identity information, this study finds that user strategies for creating passwords are often associated with user identity information. However, current research on credential tweaking attacks relies only on leaked password structures and ignores leaked user identity information when predicting password tweaking strategies. To improve the accuracy of credential tweaking attacks, this study designs a credential tweaking attack optimization method based on user identity information. In the preprocessing phase, username and regional information is extracted from the user identity information and the probability of users' different password creation strategies in different regions is statistically calculated. In the training phase, regional information is combined to learn users' character-level editing operations on leaked passwords. In the password generation phase, a password generation method that integrates character-level editing operations, structure-level editing operations, and username information is designed. The experimental results show that in an attack with 103 guesses, the cracking rate of this method has a maximum improvement of 41.8% compared to the existing best method (PassBERT), highlighting the threat posed by credential tweaking attacks based on user identity information to password security.
As multivariate time series data become increasingly prevalent across various industries, anomaly detection methods that can ensure the stable operation and security of systems have become crucial. Owing to the inherent complexity and dynamic nature of multivariate time series data, higher demands are placed on anomaly detection algorithms. To address the inefficiencies of existing anomaly detection methods in processing high-dimensional data with complex variable relations, this study proposes an anomaly detection algorithm for multivariate time series data, based on Graph Neural Networks (GNNs) and a diffusion model, named GRD. By leveraging node embedding and graph structure learning, GRD algorithm proficiently captures the relations between variables and refines features through a Gated Recurrent Unit (GRU) and a Denoising Diffusion Probabilistic Model (DDPM), thereby facilitating precise anomaly detection. Traditional assessment methods often employ a Point-Adjustment (PA) protocol that involves pre-scoring, substantially overestimating an algorithm's capability. To reflect model performance realistically, this work adopts a new evaluation protocol along with new metrics. The GRD algorithm demonstrates F1@k scores of 0.741 4, 0.801 7, and 0.767 1 on three public datasets. These results indicate that GRD algorithm consistently outperforms existing methods, with notable advantages in the processing of high-dimensional data, thereby underscoring its practicality and robustness in real-world anomaly detection applications.
With the rapid development of electric vehicles, a large number of charging demand will bring about problems such as increased peak-valley difference of distribution network load and uncertainty of charging load. To this end, an adaptive discrete charging scheduling strategy is proposed for power grid load stability. A discrete charging scheduling model is constructed to minimize the peak-valley difference in distribution network load by jointly optimizing the state decision variables and charging power during the charging process of electric vehicles. In order to meet the real-time charging demand of electric vehicles, an adaptive adjustment method of vehicle charging interval is designed. According to the arrival time and departure time of different electric vehicles, the charging scheduling interval of electric vehicles is adjusted in real time. However, the state decision variables and charging power in the discrete charging scheduling model are highly coupled, which is a Mixed Integer Nonlinear Programming (MINLP) problem. In order to solve this problem, first, the charging load allocation rate of the time slot is solved by calculating the charging load margin; then, based on the dynamic allocation of loads in different time slots, the charging state decision variables are iteratively updated; finally, based on the updated state decision variables, the time-discrete charging power is optimized. Simulation results show that the proposed scheduling strategy can effectively reduce the peak-valley difference of distribution network load, improve power grid stability, and flexibly meet the real-time charging needs of electric vehicles.
This study presents a facial emotion recognition network based on UniRepLKNet to address the difficulty in effectively capturing feature information and preventing key facial information from occupying a more prominent position in the facial emotion recognition process. Moreover, to extract facial emotional features more accurately, the study designs a masked polarized self-attention module that combines U-Net and a polarized self-attention mechanism. This module can deeply mine the dependency between channels and spaces. It can also strengthen the influence of local key information of the face on emotion recognition through a multi-scale feature fusion strategy. The study optimizes UniRepLKNet, a universal large kernel Convolutional Neural Network (CNN), and proposes the EmoRepLKNet neural network structure. In EmoRepLKNet, the mask-polarized self-attention module enables the network to extract key information for facial emotion recognition. Combined with the wide receptive field of large kernel CNN, facial emotions can be recognized effectively. Experimental results show that on the facial emotion recognition dataset FER2013, EmoRepLKNet achieves an accuracy of 76.20%, outperforming existing comparison models and significantly improving facial emotion recognition accuracy compared to that of UniRepLKNet. Additionally, on the single-label portion of the RAF-DB dataset, the proposed method achieves an accuracy of 89.67%.
The core challenge of Knowledge Distillation (KD) lies in extracting generic and sufficient knowledge from the Teacher model to effectively guide the learning of the Student model. Recent studies have found that building upon learning soft labels, further exploration of inter-instance relations in the deep feature space contributes to enhancing the performance of Student models. Existing inter-instance relation-based KD methods widely adopt global Euclidean distance metrics to measure the affinity between instances. However, these methods overlook the intrinsic high-dimensional embedding characteristics of the deep feature space, where data is distributed on a low-dimensional manifold, exhibiting locally Euclidean-like structures but with complex global structures. To address this issue, a novel instance spectrum relation-based KD method is proposed. This strategy eliminates the limitations of the global Euclidean distance and instead constructs and analyzes similarity matrices between each instance and its k-nearest neighbor in the Teacher model's feature space to reveal potential spectral graph structure information. An innovative loss function is designed to guide the Student model to learn not only the probability distribution output by the Teacher model but also simulate the inter-instance relation represented by this spectral graph structure. The experimental results demonstrate that the proposed method significantly improves the performance of the Student model, with an average classification accuracy improvement of 2.33 percentage points compared with baseline methods. These findings strongly indicate the importance and effectiveness of incorporating the spectral graph structure relation between samples in the KD process.
The high computational and storage requirements of Convolutional Neural Networks (CNNs) limit their application in resource-limited mobile edge devices. Model compression techniques can significantly reduce the computational effort and parameters of CNNs without degrading network performance. Channel pruning has been proven to be effective for model compression. However, the pruning criteria of most existing channel pruning methods are based on assessing the importance of the channels or manually setting the evaluation criteria. The implementation of such methods requires the inclusion of more hyperparameters, and the pruning methods themselves lack automaticity. To address these limitations, a novel automatic channel-pruning method based on the Zebra Optimization Algorithm (ZOA) is proposed. This method begins with cluster pruning using k-medoids to form an initial compressed network structure, which is then utilized to iteratively optimize the network structure formed by the initial compression to search for the best compact network structure. Experimental results show that on the CIFAR-10 dataset, the Top-1 accuracy of this method improves by 0.24 percentage points over the baseline, while achieving Floating-Point Operations (FLOPs) and parameter pruning rates of 59.3% and 56.7%, respectively, on ResNet-56.
In recent years, knowledge graphs have gradually become the cornerstone of downstream tasks such as question answering, information retrieval, and recommendation systems. Knowledge graph reasoning is a key research topic in knowledge graph technology, and the accuracy of its reasoning results determines the quality of the knowledge graph and the effectiveness of its services. Recent research on knowledge graph reasoning has mainly focused on using knowledge embeddings as carriers of knowledge, as well as learning entity and relation embeddings that can represent the implicit semantics of factual knowledge through powerful neural network models. The emergence of massive heterogeneous knowledge and its continuous growth have brought about challenges such as missing knowledge structures, a long-tail distribution of knowledge (with significant skewness), and weak interpretability in knowledge graph reasoning. This study proposes TSNet, a novel knowledge graph reasoning model based on textual and multi-perspective local structural features. Effectively fusing entity and relation text features and multi-perspective local structure features in the knowledge graph helps mitigate the problem of missing structures and the long-tail distribution of data. Experimental results demonstrate that TSNet achieves competitive results on four common knowledge graph reasoning datasets: FB15k, WN18, FB15k-237, and WN18RR.
As core data for maritime traffic, ship trajectory data can be used for trajectory prediction, early warning, and other tasks with pronounced temporal characteristics. However, owing to factors such as harsh marine environments and poor communication reliability, missing ship trajectory data is a common problem. Learning from time series containing missing data can affect the accuracy of time series analysis significantly. The current mainstream solution is to approximate the imputation of missing data, mainly based on convolutional models, to reshape the time series along a timeline to capture its local features of the time series. However, the ability to capture the global features of long time series is limited. The Transformer enhances the ability of a model to capture the global features of a time series by capturing the relationships between various time points in the time series through its core self-attention mechanism. However, because its attention is calculated through matrix multiplication, it ignores the temporal nature of the time series, and the obtained global feature weights do not have a time span dependency. Therefore, to address the issue of capturing global features in long time series, this study proposes the GANet, a variant network based on the self-attention mechanism. GANet first obtains the basic global feature weight matrix from the time series points through the self-attention mechanism and then uses gated recurrent units to forget and update this global feature weight matrix on the timeline, thereby obtaining a global feature weight matrix with time span dependency, which is then used for data reconstruction to impute the missing data. GANet combines the self-attention mechanism and gating mechanism to capture global features while considering the impact of the time span on different time points, making the captured global feature time span dependent. Experimental results show that compared with existing models, such as Autoformer and Informer, GANet achieves better interpolation performance on Trajectory, ETT, and Electricity datasets.
Extracting entities and relations with precision from the copper-based composite material literature is imperative for constructing knowledge graphs and propelling research in materials science. The complex nature of entities in this domain, such as nested and discontinuous entities, along with the prevalence of Single Entity Overlap (SEO) relations, renders existing techniques for entity and relation extraction inadequate. To address this issue, this study presents a dedicated dataset for entity relation extraction from copper-based composite materials and introduces a novel two-stage extraction method. The initial phase combines inter-word relation classification with Bidirectional Gated Recurrent Unit (BiGRU) and multi-scale dilated convolutional networks, thereby augmenting the model's capacity to discern entity boundaries. The second phase involves annotating entity spans within text sequences and incorporating an entity type attention mechanism into a relation classification model. This method leverages multifaceted feature representation to classify relations. On three established public datasets—Matscholar, SOFC, and MSP—as well as the CBCM-IE dataset curated for this research, the proposed method outperforms baseline methodologies with improvements of 5.91 (Precision), 3.56 (Recall), and 3.63 (F1 score) percentage points, demonstrating its efficacy for entity relation extraction in the context of copper-based composite materials.
Research on abnormal human behavior is an important safeguarding task to deal with potential dangers and emergencies. In view of the fuzzy definition of abnormal human behavior and the lack of standard datasets, this study defines six high-frequency abnormal human behaviors based on life scenarios—namely Headache, Fall, Twitch, Lumbago, Punch, and Kick—and independently constructs a dataset known as HABDataset-6. The attention mechanism in TimeSformer can be used to process this self-built dataset; however, it suffers from high loss and incomplete time series modeling, making it difficult to extract features from complex samples. Therefore, this study uses the Accelerating Stochastic Gradient Descent (ASGD) optimization algorithm to improve the cross-entropy loss, that is, a CAS module is proposed that reduces the loss value of the original algorithm. Second, a Temporal Shift Module (TSM) is embedded in the backbone network of the original algorithm to improve the perception ability of the time series to extract better features for model training. Then, the study integrates CAS and TSM and proposes the TS-AT algorithm, achieving good results on the self-built dataset with a reasoning accuracy of more than 80% for each behavior category. The usability of the TS-AT algorithm is tested on the public dataset, UCF-10, and on the elderly abnormal behavior data, and it achieves average test accuracies of 99% and 84%, respectively, exceeding those of advanced algorithms. These results show that the TS-AT algorithm has higher accuracy and good robustness in identifying abnormal human behavior and is expected to improve the ability to respond to potential dangers and emergencies and further ensure people's safety and health.
Knowledge graph embedding technologies aim to convert complex semantic information into computationally efficient, low-dimensional vector representations. This process not only reveals potential similarities between entities and relationships but also helps a computer understand the content of knowledge graphs and promotes further processing. However, current knowledge graph embedding models face challenges in effectively capturing complex relationship patterns and exhibit limitations in handling aspects such as symmetry, antisymmetry, composition, and hierarchical structures. The hierarchical-aware model, HAKE, addresses this by mapping entities to a polar coordinate system and utilizing concentric circles, thereby capturing relationships between entities at the same level. Nevertheless, constraints remain in modeling other intricate relationships. To overcome these challenges, this study proposes a knowledge graph embedding model called ComHA. Building on the principles of HAKE, ComHA integrates geometric transformation techniques to enhance the vector space representations of entities and relationships by using translation, rotation, and scaling operations. Subsequently, link prediction experiments are conducted on publicly available datasets, including WN18, WN18RR, FB15k, FB15k-237, and YAGO3-10. The results demonstrate significant performance improvements achieved by ComHA. This underscores the effectiveness of ComHA in capturing complex relationships and hierarchical structures within knowledge graphs while providing new research directions and methodological insights for future research in knowledge graph embedding model design.
Heterogeneous hypernetworks can model various high-order tuple relations found in the real world, which represent heterogeneous high-order information within the hypernetwork. However, heterogeneous hypernetworks have different degrees of indecomposability, and existing research methods do not fully consider the indecomposability of high-order tuple relations regarded as hyperedges. To address this issue, a heterogeneous hypernetwork representation learning method based on importance sampling, called HRIS, is proposed, which incorporates close high-order tuple relations into hypernetwork representation learning. First, it proposes judgment nodes, and incorporates indecomposable factors and tuple similarity to improve the sampling of important nodes through random walks to capture tight high-order tuple relations within the hypernetwork. Second, to make the sequences more global and diverse, the random swap method in data augmentation is introduced for solving overfitting problems, and a random deletion method based on node degree is proposed to improve robustness. Finally, a skip-gram model with negative sampling enhancement, called NSE-skip-gram, is proposed to obtain high-quality node representation vectors. Experiments conducted on four real hypernetwork datasets reveal that for the link prediction task, the HRIS demonstrates a significant improvement over other baseline methods; for the hypernetwork reconstruction task, the HRIS exhibits an average improvement of 3.75 and 9.79 percentage points compared to the optimal baseline method on the Global Positioning System (GPS) and drug datasets at all reconstruction ratios, respectively.
With the development of social networks, people are increasingly expressing their emotions through multimodal data, such as audio, text, and video. Traditional sentiment analysis methods struggle to process emotional expressions in short videos effectively, and existing multimodal sentiment analysis techniques face issues such as low accuracy and insufficient interaction between modes. To address these problems, this study proposes a Multimodal Sentiment Analysis method based on Dense Co-Attention (DCA-MSA). First, it utilizes the pre-trained Bidirectional Encoder Representations from Transformers (BERT) model, OpenFace 2.0 model, and COVAREP tool to extract features from text, video, and audio, respectively. It then employs a Bidirectional Long Short-Term Memory (BiLSTM) network to model the temporal correlations within different features separately. Finally, it integrates different features through a dense co-attention mechanism. The experimental results show that the model proposed in this paper is competitive in multimodal sentiment analysis tasks compared to some baseline models: on the CMU-MOSEI dataset, the highest increase in binary classification accuracy is 3.7 percentage points, and the highest increase in F1 value is 3.1 percentage points; on the CH-SIMS dataset, the highest increase in binary classification accuracy is 4.1 percentage points, the highest increase in three-classification accuracy is 2.8 percentage points, and the highest increase in F1 value is 3.9 percentage points.
As the Internet continues to evolve, the performance requirements of various applications have become increasingly diverse. For example, applications such as cloud storage and file-sharing services depend on high throughput to achieve high-speed data transmission, while multiplayer online games emphasize low latency to ensure real-time interaction and a high-quality user experience. However, most existing network switching devices rely on fixed hardware architectures and predefined forwarding rules to handle data flows. With the increasing diversity of Internet application scenarios, these rigid architectures and limited functionalities hinder their adaptability to different types of network traffic, making it difficult to provide differentiated services for various applications. To address these limitations, this paper proposes a traffic scheduling algorithm that employs multiple priority queues. By monitoring flow characteristics in real time through the data plane of switches and dynamically allocating different types of traffic to queues with distinct forwarding priorities, the algorithm enables more effective handling of diverse traffic types and facilitates flexible resource allocation tailored to different application scenarios. Simulation results demonstrate that the proposed approach of isolating traffic using differentiated priority queues achieves multiple objectives, including low latency for small flows, high throughput for large flows, and reduced packet loss rates. These findings provide strong support for meeting the performance demands of various applications.
Adversarial examples can be used to perform transferable attacks on black-box models using surrogates, without knowing the internal structure and parameters of the black-box model. Previous studies have reported relatively low transferability of targeted attacks on black-box models. This study proposes a method for enhancing the transferability of image-directed targeted attacks based on feature fusion. First, adversarial examples are generated via ensemble attacks. Subsequently, using the gradient direction of existing adversarial examples as a baseline, clean features extracted from the original image are used as perturbations to fine tune the existing adversarial examples for improving the transferability of targeted attacks. For model ensembling, a gradient adaptive module is introduced based on the contribution of each model to the overall adversarial objective. To reduce the gradient differences among different models, a gradient filter is proposed for synchronously controlling the gradient direction. Using the feature fusion module, the clean features of the original image are mixed to fine tune the gradient direction of the existing adversarial examples for mitigating the issue of overfocusing on specific features. Experiments on the ImageNet-Compatible dataset reveal that, compared to the Clean Feature Mixup (CFM) method, the proposed method improves the average attack success rate by 7.7 percentage points for non-robustly trained models and by 5.3 percentage points for robustly trained and Transformer models, demonstrating the effectiveness of the method.
The widespread use and diversification of embedded devices have introduced unparalleled convenience and formidable security vulnerabilities, particularly in firmware security. The intricate nature of embedded device firmware, coupled with its sheer volume and the adoption of encryption and obfuscation techniques, presents a formidable challenge for security analysts seeking to uncover hidden vulnerabilities efficiently. In response to this challenge, this study proposes an innovative targeting analysis technique customized to heterogeneous firmware. First, the study explores multi-granularity analysis methods, automatic document categorization, key information extraction, and target delineation techniques to enable nuanced and depth-controllable firmware analysis. Next, it establishes a comprehensive file system feature library and introduces a novel target recognition approach based on eigenvalue matching, enhancing the discernment capabilities for obscure firmware and expanding the breadth of file system identification. Furthermore, the study develops a specialized crawler to procure firmware from diverse vendors, leading to the construction of a 10 000-level firmware library that is crucial for targeted decryption based on neighboring versions. An FTA automated firmware parsing system is conceptualized and empirically validated, showing significant enhancements over mainstream firmware analysis tools such as Binwalk. Specifically, FTA's multi-granular analysis method elevates the firmware parsing speed by an average of 42.59%, whereas the optimized output mode facilitates targeted file extraction and extends recognition capabilities across multiple file system feature values. FTA provides robust support for firmware parsing within the domain of embedded system security.
Security of data transmission is one of the most significant security threats faced by Wireless Sensor Network (WSN), for which the Authentication Key Agreement (AKA) protocol is an effective solution. However, existing protocols are not resistant to offline key guessing and replay attacks. Therefore, an enhanced AKA protocol, which is based on Elliptic Curve Cryptography (ECC), is proposed for WSN in this paper to realize secure session key negotiation between sensor nodes and the server. Through Burrows—Abadi—Needham (BAN) logic and non-formal security analysis, mutual authentication of the protocol is proven to provide perfect forward security, effectively resisting offline key guessing and replay attacks. Comparing this protocol with existing lightweight AKA protocols in terms of security attributes, computational overhead, and communication overhead, the performance analysis results show that the proposed protocol not only meets the lightweight requirements of WSN but also has stronger security attributes and communication advantages.
To minimize the data transmission latency in the link layer and Physical Coding Sublayer (PCS) of interconnect networks in High-Performance Computing (HPC), a configurable bypass error correction method is typically used at the physical coding sublayer. However, adapting to the variability of the physical medium Bit Error Rate (BER) and to the granularity variability of the link layer packets and physical coding sublayer Forward Error Correction (FEC) blocks is challenging. Therefore, a new method called FEC-ABP for the adaptive bypass forward error correction decoding process is proposed. FEC-ABP optimizes data processing on the receiving side to replicate the locked and reordered data in two paths. Subprocess A enters the link layer via complete FEC decoding and other data processing mechanisms (i.e., deleting alignment markers and checksums, descrambling, 257/264 decoding, 66/64 decoding, and rate matching). By contrast, subprocess B completely bypasses the FEC decoding and only enters the link layer via other data processing mechanisms. The link layer determines the path from which the packet is received based on its Cyclic Redundancy Check (CRC) code and sequence number. It uses the Go-back-N mechanism to ensure reliable retransmission of uncorrectable packets. Based on the FEC-ABP method, a lower latency can be achieved for error-free packet transmission by bypassing FEC decoding and a reliable correctable packet transmission can be achieved owing to FEC decoding error correction. The experimental results indicate that the FEC-ABP method optimizes the average transmission latency with low resource consumption, which helps in achieving the lower-latency data transmission for HPC interconnect networks.
The via pillar is a new technology using advanced technology nodes, which has a significant effect on optimizing the delay of via in routing solution; however, it requires increased routing resources. Therefore, reasonable assignment of the limited routing resources without affecting the routability is challenging. Additionally, fully using the delay optimization ability of the via pillar technology for the layer assignment algorithm in advanced technology nodes is difficult. This study focuses on high-performance layer assignment problem under advanced via pillar technology and proposes three improvement strategies. Here, an initial routing order priority definition strategy is proposed. This strategy comprehensively considers the number of sinks and the total length of the nets to determine the priority of the nets and establish a good routing foundation for the subsequent stage. Moreover, a calculation strategy without overflow is proposed considering historical cost. This strategy effectively prioritizes all the edges for use and reduces the use of congested edges. Additionally, a redistribution order strategy for illegal nets is proposed. This strategy comprehensively considers the total length of the nets, number of sinks, and delay of the last iteration to increase the security of the routing order at this stage. The experimental results demonstrate that the effectiveness, the number of vias, the overflow and delay of the timing-critical net of the proposed algorithm are optimized through the aforementioned strategies improments.
Owing to the characteristics of the instrument, surrounding environment, and scanned target, some noise in the point cloud data is inevitable, the most common being Gaussian noise. To address the problem of a large normal estimation error in the case of Gaussian noise in the point cloud model, this study proposes a point cloud normal estimation method based on statistical jump regression analysis. First, a regression model is established based on point cloud data and the curvature value of the current point is estimated based on local linear kernel smoothing. Second, to determine whether the current point is on the edge of the surface, the local neighborhood of the current point is divided into two parts along the direction perpendicular to the gradient and the surface value of the point is estimated again by observing the two parts of the neighborhood. Finally, the Weighted Root Mean Square residual (WRMS) of the current point is analyzed and calculated to determine the surface value and normal vector of the point. Via numerous experiments, such as simulation and public dataset experiments, the results show that the proposed method is more accurate and robust than the conventional point cloud normal vector estimation methods in the case of Gaussian noise.
To address the issues of incomplete feature extraction due to varying sizes and irregular shapes of potholes and the problem of image capturing not satisfying the perspective of road inspection vehicles during pothole detection, we have collected and created a pothole dataset from diverse sources, perspectives, and pixel resolutions, and further improved the model. First, we introduced DCNv3 into the C2f structure of the Backbone section to capture richer and more complete pothole features. Second, we integrated the attention mechanism of Squeeze-and-Excitation (SE) model to enhance the ability to extract pothole features. Third, we fused the BiFPN structure in the Neck section to reduce the computational complexity of the network. Finally, we used Focal-EIoU as the loss function of the improved model to minimize the impact of complex backgrounds on network detection performance. Compared to the unimproved network, the enhanced YOLOv8-master network achieved a 4.06% improvement in pothole detection accuracy, a detection speed boost to 85 frame per second, and a 19.54% reduction in floating-point operations. The results demonstrate that the proposed improvement method effectively enhances the original network's performance in detecting potholes and possesses certain advancements compared to currently mainstream object detection algorithms.
Because the compression process of the network channel is unknown, attacks on the image after social network transmission are difficult to predict. Therefore, studying the steganography algorithm resistant to the compression distortion of a channel is challenging. However, the quality factor used by existing algorithms that resist JPEG compression during training is fixed, while in practical applications, the quality factor of JPEG compression is not fixed, but depends on the characteristics of the original image. Therefore, to design a method resistant to JPEG distortion in social networks, the quality factor distribution of social network compression channels must be examined. This study aims to design a robust steganography algorithm resistant to JPEG compression distortion during social network compression. This study investigates the distribution characteristics of the JPEG compression quality factor in social network transmission using a Gaussian Mixture Model (GMM). During the training process of the steganography model, the quality factor is no longer fixed as in the traditional steganography method but is smoothly sampled from the GMM. The experimental results show that the proposed steganography model can significantly improve the visual quality of an image and reduce the error rate of information extraction. Moreover, compared with the other models, the steganography security of this model is improved.
Deep learning has significant application value in the auxiliary diagnosis of myocardial ischemia. However, traditional deep learning networks for medical image classification suffer from limitations such as the inability to capture subtle inter-class differences in myocardial Computed Tomography (CT) scans and the loss of three-dimensional (3D) structural information from CT data. To address these issues, this study proposes DBTMed3D, a network that improves the convolutional modules in the conventional Med3D architecture through 3D bilinear fine-grained pooling, thereby enabling the processing of multimodal medical imaging data, including both CT and MRI. By emulating the ResNet design, skip connections are introduced within the modules to fuse fine-grained second-order image features with those extracted by convolutional blocks, allowing the network to preserve global characteristics while focusing on local details. Additionally, 3D class activation maps are incorporated to overlay heat maps onto the original myocardial CT slices, highlighting the regions of primary interest identified by the model. Furthermore, the study designs a 3D hierarchical multi-head self-attention module to resolve fine-grained classification challenges in 3D medical images by capturing localized image features. Experimental results demonstrate that DBTMed3D achieves an 86.4% classification accuracy on the myocardial CT dataset, which is a 6.7 percentage points improvement compared with the accuracy of the baseline 3D ResNet-50 model, thereby validating its superior classification performance.
In real-world environments, the accuracy of facial expression recognition is typically low because of factors such as varying illumination intensities, facial occlusions, and pose variations during the detection of facial images. To address this robustness issue, this study proposes a facial expression recognition method that integrates a key region attention mechanism. Drawing inspiration from the facial perception mechanism of the human visual system, this method combines key facial regions with the overall facial region to enhance the recognition of complex and subtle expressions. During the key region extraction phase, the MTCNN algorithm is employed to sequentially feed facial data through three cascaded networks, thereby obtaining the positional information of facial keypoints. Based on anatomical studies of the face, this study introduces a Local Region Cropping (LRC) method to process the positional information and crop key facial region images. Subsequently, both the overall facial image and cropped key facial region images are separately input into a ResNet-50 network, followed by feature fusion. A Coordinate Attention (CA) mechanism, which encodes precise positional information, channel relationships, and long-range dependencies, is incorporated to direct the model's focus toward facial regions that contribute more significantly to expression classification. Experimental results on publicly available datasets, CK+ and FER2013, demonstrate that the proposed method achieves recognition accuracies of 96.9% and 73.22%, respectively. Compared with existing state-of-the-art methods, the method achieves significant improvements in accuracy, indicating that it offers valuable insights into network architecture and performance.
To address the challenges of subtle and complex structures, blurred boundaries, and high computational costs associated with retinal vasculature, this study proposes a retinal vessel segmentation model named GAC-UNet, based on a multi-attention mechanism. First, a Channel Attention Spatial Pooling (CASP) attention module, designed to extract interchannel relationships and spatial position information, is embedded into skip connections. By integrating this module with residual connections, an Attention Residual Unit (ARU) is formed to optimize feature processing between the encoder and decoder, thereby highlighting important features. Subsequently, a New Graph Attention Network (NGAT) is introduced into the encoder architecture for rationally allocating attention. This NGAT is combined with the CASP attention module to construct the GACA integrated attention module, which enables multi-faceted attention to vessel details and edges. Multiple GACA modules are stacked within the encoder to internally accumulate graph attention information within the NGAT modules, thereby enhancing the ability of GAC-UNet to model global information and enrich the edge feature information. Finally, the feature information extracted by the different attention modules is aggregated at the corresponding levels in the decoder architecture and the final segmentation result is obtained using upsampling operations. Experimental evaluations conducted on three public retinal datasets—DRIVE, CHASE_DB1, and STARE—demonstrate that the proposed model achieves specificities of 97.76%, 99.16%, and 98.66%, and accuracies of 96.80%, 96.81%, and 96.34%, respectively. These results indicate that GAC-UNet effectively identifies subtle and complex vessel structures with blurred boundaries while maintaining a relatively small model parameter size.
Traditional methods for human behavior recognition based on RGB videos face numerous challenges when dealing with complex backgrounds, lighting effects, and variations in appearance. By contrast, methods that leverage human skeletal information for behavior recognition are less affected by these factors. However, the current mainstream skeleton-based behavior recognition methods struggle to balance accuracy and complexity. To maintain high recognition accuracy while addressing issues such as large model parameter size and high computational complexity, a lightweight network structure comprising three novel encoding blocks is proposed. First, efficient multiscale attention modules are incorporated into the self-attention graph convolutional module for spatial modeling and the multiscale temporal convolutional module for temporal modeling, enhancing the ability of the model to recognize and utilize temporal and spatial feature information, thereby enriching skeletal data features. Second, a multifeature fusion adaptive module is employed to strengthen the feature fusion and generalization capabilities. Finally, an iterative feature fusion enhancement module is utilized to further improve the understanding of complex feature relationships. Experimental results demonstrate that, on the large-scale NTU-RGB+D60 dataset, the proposed method achieves accuracy rates of 91.1% and 95.4% under Cross-Subject (CS) and Cross-View (CV) evaluations, respectively. On the NTU-RGB+D120 dataset, it attains accuracy rates of 87.3% and 88.8% under CS and Cross-Setup (SS) evaluations, respectively, with a parameter count of 0.72×106 and a floating-point operation count of 0.6×109. Comparative experiments indicate that the proposed algorithm outperforms several mainstream algorithms in recent years in terms of parameter size, floating-point operation count, and recognition accuracy, effectively balancing the relationships among these metrics and providing a lightweight network model for precise human behavior recognition.
The rapid increase in video data volume poses severe challenges when available bandwidth is limited, necessitating an improvement in video coding efficiency. Video pre-coding processing techniques can reduce video data volume without altering the core algorithms and parameter settings of the encoder, thereby enhancing video coding efficiency while demonstrating good compatibility. This paper proposes a Degradation Compensation and Multi-dimensional Reconstruction (DCMR) pre-processing method, which focuses on extracting features from video images across multiple dimensions that are closely related to the subsequent coding process and reconstructing these features into video images. First, a degraded compensation model is designed to remove coding noise while restoring the image degradation caused during transmission. Second, a lightweight multi-dimensional feature reconstruction network is constructed that combines the principles of residual learning and feature distillation to extract coding-related features from both the spatial and channel dimensions and reconstruct the extracted features. Finally, to restore the high-frequency details lost during the denoising process, an auxiliary branch incorporating a weighted guided filter-based detail enhancement convolution module is added to DCMR. In terms of loss functions, a combination of the Mean Absolute Error (MAE) loss and Multi-Scale Structural Similarity Index Measure (MS-SSIM) loss is selected to achieve multi-objective optimization by assigning different weights. During the deployment phase, DCMR can be directly integrated into any existing standard video encoder without modifying the coding, streaming media, or decoding settings. Experimental results demonstrate that the DCMR method can achieve average performance gains of 21.6% and 6.98% in terms of BD-rate (VMAF) and BD-rate (MOS) under H.266/VVC.
Unsupervised Domain Adaptation (UDA) person Re-Identification (Re-ID) aims to transfer labeled source domain knowledge to an unlabeled target domain, which is very challenging owing to existing problems such as pseudo-label noise and domain gaps. Therefore, a Heterogeneous Teacher—Student network with Attention mechanisms (HTSA) is proposed to effectively reduce the influence of pseudo-label noise and focus on the key information of pedestrians while filtering out irrelevant background information. This study adopts Domain-Specific Batch Normalization (DSBN) to attenuate the performance degradation caused by domain gaps. Additionally, a novel data augmentation method is adopted to independently process two equally sized parts of the input image after width splitting, thereby enhancing the generalization ability. The experimental results reveal that mean Average Precision (mAP) and Rank-1 on DukeMTMC-reID→MSMT17 reach 40.3% and 71.0%, respectively, whereas those on Market-1501→MSMT17 reach 37.7% and 67.7%, respectively. This result demonstrates the effectiveness of the proposed method.
Current mainstream video super-resolution algorithms are primarily applied in business scenarios such as server-side or offline video conversion. When deployed on mobile devices, challenges such as complex computations and slow inference speeds are observed. Although these mainstream super-resolution algorithms can satisfy the accuracy requirements of image quality, satisfying the performance requirements in terms of processing time is challenging, which affects the practical application of the algorithms, particularly in Real-Time audio and video Communication (RTC) business scenarios. This paper proposes a real-time video Super-Resolution technology based on OpenGL ES (OGSR) and Convolutional Neural Network (CNN) improvement and optimization. First, by using grouped convolution and channel obfuscation, the neural network model is optimized without significantly reducing the quality of the super-resolution image, which exponentially reduces the computational cost of forward inference. Subsequently, the OpenGL ES graphics acceleration interface is used to outline the model parameters and channel data into the fastest sampled texture data, which is uploaded to the graphics memory for parallel computing on the GPU. Finally, using the Shader of the GPU, the channel index and model parameter index are calculated in reverse by rendering pixel coordinates to achieve the core module of the super-resolution algorithm, thereby achieving fine-grained concurrency at the pixel level. The experimental results show that the triple super-resolution amplification of QVGA (320×240 pixels) and nHD (640×360 pixels) resolution video frames can achieve a frame rate of 15—30 frame/s on mobile phones of various models. Moreover, the quality error of the enlarged image is within 2% of that of the standard CNN model, which meets the requirements of real-time business scenarios and significantly improves performance.
Today, fingerprint recognition technology is widely used and holds the largest market share in personal identity authentication. Gender is one of the most fundamental characteristics that distinguishes individuals, and gender classification is crucial for investigating criminal offenses and gender impersonation. Currently, many fingerprint gender recognition methods use physical features such as fingerprint ridges; however, applying traditional, manual feature-based recognition methods in complex and changing scenarios is difficult. To address this issue, this paper proposes a fingerprint gender recognition method, FGRNet, which is based on the multi-scale attention mechanism and a multi-model fusion strategy. First, introducing depthwise separable convolution and the Convolutional Block Attention Module (CBAM) attention mechanism in dense blocks improves the depth and breadth of the network without increasing the number of parameters. Second, a multi-scale structure is introduced in CBAM to learn attention weights with lower model complexity and effectively integrate local and global attention, thereby establishing remote channel dependencies and enabling the network to extract features that are more discriminative. Finally, utilizing the complementarity between different models, a multi-model fusion strategy based on evidence theory is designed to further improve the recognition accuracy. Experimental results show that FGRNet achieves accuracies of 82.655 8% and 91.149 0% on the public dataset SOCOFing and a self-built dataset, respectively. The proposed model is robust and achieves good recognition performance even on fingerprint images containing a large amount of irrelevant noise.
Common operations between different processes in a multi-process layout workshop lead to resource waste. To address this problem, this study establishes a multi-objective optimization model for the collaborative scheduling of multi-workshop job tasks with the objectives of minimizing the makespan, total processing cost, and total processing energy consumption, to improve the utilization rate of workshop resources and achieve cost reduction and efficiency improvement. The study also proposes a new genetic fusion algorithm, TSNSGA-Ⅱalgorithm, which combines tabu search and fast non-dominated sorting. After the crossover of the genetic algorithm, the chromosomes are used to generate new individuals using the tabu search mutation strategy to enhance the exploration ability of the search space. Finally, a hierarchical analysis method is used to weigh the three objectives from the factory perspective to select the optimal scheduling solution. The effectiveness of TSNSGA-Ⅱ algorithm is verified on a simulated dataset, and its performance is compared to those of the MOGWO and ENSGA-Ⅱ metaheuristic algorithms on standard datasets of different sizes. Next, ablation comparison is performed with a single NSGA-Ⅱ and a single TS module. The results show that when the total processing cost is the highest priority, the algorithm obtains the lowest total processing cost on 90% of the mk examples in Brandimarte dataset, and the solution time is shorter than that of the ENSGA-Ⅱ algorithm, which is 1.6% higher than that of the NSGA-Ⅱ algorithm before improvement. When the makespan is the highest priority, the proposed algorithm obtains the minimum makespan on 80% of the datasets, which is 2.2% higher than that of the NSGA-Ⅱ algorithm before improvement.
Accurately predicting the future trajectories of surrounding vehicles is crucial for Autonomous Driving Vehicle (ADV) to understand complex dynamic environments. However, existing pooling strategies rely solely on historical position feature encoding in the Euclidean coordinate system, which fails to capture latent variables, such as vehicle maneuver intentions, effectively. To address this issue, this study proposes a vehicle trajectory prediction method that considers maneuver intentions. First, this study constructs a pooling mechanism based on polar coordinate feature representation and high-order feature encoding to capture intervehicle dependencies. Next, a position and acceleration maneuver type discrimination strategy is designed based on Gaussian probability distribution to simulate the expected maneuvers in structured road scenarios accurately. Furthermore, the study develops a trajectory planning and historical trajectory coupling encoding module based on random sampling, which enhances the interaction feature capture capabilities of the model while avoiding redundant encoding. Finally, the study builds a trajectory prediction model, StructNet, based on an encoder-decoder framework and validates the effectiveness of the algorithm on real-world road datasets from NGSIM. Multiple comparative and ablation experiments demonstrate that the proposed vehicle trajectory prediction model achieves a root mean square error of less than 3.5 m at 5 s, representing a 15.3% improvement over the baseline model, thereby significantly enhancing prediction accuracy.
Autonomous mobile robots employ intelligent algorithms for path planning in complex environments. However, the ″memory wall″ problem in traditional computers increases the running time of the algorithms substantially. To address this problem, this study proposes path planning for intelligent inspection robots in open-pit coal mines based on a Memristor Array and Physarum Polycephalum Algorithm (MA-PPA). A memristor device can reduce the running time of an algorithm because of its ″memory and computation integration″. The Physarum polycephalum algorithm can self-organize and efficiently locate the shortest path. By leveraging the advantages of both and based on the positive feedback property of the memristor resistance varying with current, the Physarum polycephalum algorithm is implemented for path planning in a two-dimensional global environment using a memristor array. Parallel computing is applied to the Physarum polycephalum algorithm on a memristor array, which significantly reduces the running time of the algorithm. Experimental results show that compared with traditional bio-inspired algorithms, the proposed algorithm reduces time complexity and finds the shortest path with fewer turns.
To solve the problems of poor detection effect, high misdetection and omission rate, and weak generalization ability of urban vehicle target detection algorithms, this study proposes an improved YOLOv8 urban vehicle target detection algorithm. First, an Efficient Multi-scale Attention (EMA) mechanism is incorporated into the tail of the backbone network, which helps the model better capture the detailed features of a target vehicle. Combined with a 160×160 pixel small-target detection layer, it enhances the detection capability of small targets and aggregates pixel-level features through dimensional interaction to enhance the mining capability of the target vehicle. Second, the study designs a new Multi-scale Lightweight Convolution (MLConv) module for the lightweight network, and the C2f module is reconstructed based on MLConv, which significantly improves the feature extraction capability of the model. Finally, to suppress the harmful gradients generated by low-quality images, the study uses the Wise-Intersection over Union (WIoU) loss function instead of the Complete Intersection over Union (CIoU) to optimize the network's bounding box loss and improve the model's convergence speed and regression accuracy. On the Streets vehicle dataset, the algorithm improves mAP@0.5, mAP@0.5∶0.95, and recall by 1.9, 1.4 and 2.4 percentage points respectively, compared with the YOLOv8n benchmark model. In validations on a domestic vehicle dataset and the VisDrone2019 small target dataset, these performance indexes improve to different degrees, proving that the improved algorithm has good generalization and robustness. Compared with other mainstream algorithms, the improved algorithm exhibits higher accuracy and detection rate, indicating that the algorithm performs better in urban vehicle target detection.
Distribution network planning is important in power systems because it directly affects the reliability, efficiency, and economy of power supply. Good planning ensures that power resources are allocated efficiently while reducing operating costs and power losses. However, as power demand and system complexity increase, traditional decision-making methods are no longer applicable. To improve the efficiency and reliability of equipment selection, connection configuration, and grid layout, this study proposes an intelligent distribution network planning method based on Knowledge Graphs (KGs) and Graph Convolutional Neural Networks (GCNNs), i.e., KG-GCNN. This method leverages the advantages of KG, Graph Neural Networks (GNNs) and Convolutional Neural Networks (CNNs). This study also provides an intelligent distribution network planning method for power system planners to better understand, analyze, and optimize the equipment configuration, connection, and physical layout of power systems. The study first establishes the KG of the power network, which includes the equipment, properties, and interrelationships, and provides the basis for subsequent analysis and optimization. Then, it uses a GNNs to analyze the structural data of the power network to capture the relationship and influence between the devices and to provide important information for equipment configuration and connection decisions. Finally, it introduces CNNs to improve the physical layout of the grid to determine the best location and connection for the devices in the grid, thereby improving its performance and reliability. The experimental results show that, compared with decision trees, Support Vector Machines (SVMs), and Recurrent Neural Networks (RNNs), the proposed method can effectively match the complex topologies of power grids and is suitable for optimizing the physical layout of power grids.