The rapid development of Artificial Intelligence (AI) has empowered numerous fields and significantly impacted society, establishing a solid technological foundation for university informatization services. This study explores the historical development of both AI and university informatization by analyzing their respective trajectories and interconnections. Although universities worldwide may focus on different aspects of AI in their digital transformation efforts, they universally demonstrate vast potential of AI in enhancing education quality and streamlining management processes. Thus, this study focuses on five core areas: teaching, learning, administration, assessment, and examination. It comprehensively summarizes typical AI-empowered application cases to demonstrate how AI effectively improves educational quality and management efficiency. In addition, this study highlights the potential challenges associated with AI applications in university informatization, such as data privacy protection, algorithmic bias, and technology dependence. Furthermore, common strategies for addressing these issues such as enhancing data security, optimizing algorithm transparency and fairness, and fostering digital literacy among both teachers and students are elaborated upon in this study. Based on these analyses, the study explores future research directions for AI in university informatization, emphasizing the balance technological innovation and ethical standards. It advocates for the establishment of interdisciplinary collaboration mechanisms to promote the healthy and sustainable development of AI in the field of university informatization.
Smart contracts are the core of the second-generation blockchain Ethereum. They involve a large amount of cash flow but are vulnerable to hacking because of being deployed on a public chain. Therefore, potential vulnerabilities in contracts must be detected to ensure their security. However, existing detection methods have difficulty coping with the structural deception of attack codes, in-depth analysis of program logic, and mitigation of state-space explosions. To address these issues, this study first proposes a smart contract attacker modeling detection framework based on Petri nets; the framework uses abstract semantic rules and the dynamic operation characteristics of the nets to capture attack behaviors accurately and ensure high adaptability and accuracy of smart contract detection. Second, the study presents a unified detection method for multilevel vulnerabilities that combines the key features of vulnerabilities at each level to derive attack likelihoods and their potential impacts. Finally, the study presents an on-demand state-space generation mitigation mechanism for the state-space explosion problem; this mechanism improves detection efficiency and resource utilization significantly. Experimental results demonstrate that the proposed method is feasible and practical.
With the rapid advancement of multimedia and data collection technologies, multi-view data is becoming increasingly prevalent. Unlike single-view data, multi-view data offers richer descriptive information and enhances the efficiency of structural information mining. In response to the multi-view clustering challenge, this study proposes a multi-view subspace clustering algorithm based on dual cross-view correlation detection. Considering the effects of noise disturbance and high-dimensional data redundancy on multi-view clustering, the proposed algorithm employs linear projection transformation to derive a potential low-redundancy representation of the original data. The accurate view-specific subspace representation is learned from the latent feature representation based on the self-representation property. To fully leverage the complementary information present in multi-view data, the proposed algorithm simultaneously detects cross-view correlations in both feature and subspace representations. Specifically, latent features are treated as low-level representations, enabling their diversity to be explored and retained by the Hilbert-Schmidt Independence Criterion (HSIC). For high-level clustering structures, the proposed algorithm ensures consistency among multi-view subspace representations by imposing a low-rank tensor constraint, which facilitates the exploration of high-order correlations and complementary information. The study employs an alternating direction minimization strategy with an augmented Lagrange multiplier to address the optimization problem. Experimental results on real datasets demonstrate that the proposed algorithm significantly outperforms suboptimal methods, achieving improvements in clustering accuracy of 3.00, 3.60, 1.90, 2.00, 7.50, and 1.90 percentage points across six benchmark datasets, respectively. These results validate the superiority and effectiveness of the algorithm.
Cerebral vessels in brain CT Angiography (CTA) images exhibit diverse morphologies and distributions with significant variations among patients. The standard U-Net often struggles to adapt to local vessel morphology, leading to the loss of small target information during down-sampling and neglecting the correlations among scattered objects. To address these challenges, this study enhances the U-Net architecture and introduces the BVU-Net, a cerebral vessel segmentation network that utilizes multi-scale aggregation and high-resolution enhancement. The BVU-Net designs a Multi-Scale Feature Aggregation (MSFA) module in its bottleneck layer, which captures local vessel features at various scales as well as global correlation features. This module integrates the Dilated Deformable Pyramid (DDP) path and the Global Attention (GA) path. In addition, a High-Resolution Feature Enhancement (HRFE) module is incorporated into the skip connection paths, allowing for the effective use of advanced features with richer semantic information. This enhancement improves the representation of high-resolution features and supplements the information on small vessels. The performance of the BVU-Net is evaluated on the public dataset 3D-IRCADb and the private dataset GLCTA, achieving Dice scores of 0.787 2 and 0.924 8 and Mean Intersection over Union (MIoU) scores of 0.832 2 and 0.932 1, respectively. These results demonstrate that the BVU-Net outperforms other improved U-Net segmentation models and exhibits notable generalization capabilities, providing valuable insights for future clinical treatment and prognosis analysis.
Network anomaly detection aims to promptly identify and respond to malicious activities and potential threats within networks. Most existing graph-embedding-based methods are designed for static graphs and neglect fine-grained temporal information, thus failing to capture the continuity of dynamic network behaviors and diminishing the effectiveness of network anomaly detection. To enhance the efficiency and accuracy of dynamic network anomaly detection, this study proposes a novel method integrating dynamic graph embedding and Transformer autoencoders. This method leverages temporal-walk-based graph embedding to capture the topological structure and detailed temporal information of the network. It incorporates a Transformer autoencoder with contrastive loss to optimize node embeddings and effectively capture long-term dependencies and global information. This integration enhances the model's ability to perceive dynamic networks, facilitating better detection of time-evolving events and the identification of malicious behaviors. The effectiveness of this method is validated through extensive experiments conducted on two publicly available datasets in network security. Its superior performance on the LANL-2015 dataset is indicated with a True Positive Rate (TPR) of 94.3%, False Positive Rate (FPR) of 5.7%, and an Area Under the Curve (AUC) of 98.3%. Further, on the OpTC dataset, the method achieves a TPR of 99.9%, a FPR of 0.01%, and an AUC of 99.9%. These results demonstrate that the proposed method effectively learns the topology and temporal dependencies of dynamic networks, thereby accurately identifying network anomalies.
Unmanned Aerial Vehicle (UAV) Multi-Object Tracking (MOT) technology is widely used in various fields such as traffic operation, safety monitoring, and water area inspection. However, existing MOT algorithms are primarily designed for single-UAV MOT scenarios. The perspective of a single-UAV typically has certain limitations, which can lead to tracking failures when objects are occluded, thereby causing ID switching. To address this issue, this paper proposes a Multi-UAV Multi-Object Tracking (MUMTTrack) algorithm. The MUMTTrack network adopts an MOT paradigm based on Tracking By Detection (TBD), utilizing multiple UAVs to track objects simultaneously and compensating for the perspective limitations of a single-UAV. Additionally, to effectively integrate the tracking results from multiple UAVs, an ID assignment strategy and an image matching strategy are designed based on the Speeded Up Robust Feature (SURF) algorithm for MUMTTrack. Finally, the performance of MUMTTrack is compared with that of existing widely used single-UAV MOT algorithms on the MDMT dataset. According to the comparative analysis, MUMTTrack demonstrates significant advantages in terms of MOT performance metrics, such as the Identity F1 (IDF1) value and Multi-Object Tracking Accuracy (MOTA).
Reinforcement Learning (RL) has become an important solution to sequential continuous decision-making problems, such as root cause localization of fault alarms; however, existing methods suffer from low sample efficiency and high exploration costs that hinder their wide application. Studies have shown that introducing causal knowledge offers great potential for improving decision interpretability and sampling efficiency of RL agents. However, most existing methods do not implicitly model causal relationships and fail to directly utilize the knowledge of causal structures. Therefore, this study proposes a two-stage causal RL algorithm, whereby the first stage explicitly models environmental variables using causal models based on observational data, and the second stage constructs causal masks based on the learned causal structure to augment policy, which helps narrow the decision space and reduce exploration risks. Considering the lack of public benchmark environments that allow direct causal reasoning, this study designs a root cause localization task in a simulated fault alarm environment and demonstrates the effectiveness and robustness of the proposed algorithm through comparative experiments in environments of different dimensions. The experimental results showed that in a low-dimensional environment, the proposed algorithm improved indicator of cumulative rewards by 13% with respect to the existing mainstream RL Soft Actor-Critic (SAC) algorithm, and in a high-dimensional environment by 79%, requiring only a few explorations for the policy to converge. The sample efficiency increased by 27% and 52% in low- and high-dimensional environments, respectively.
To address the security issues arising from the degradation of image quality and monitor the effectiveness of surveillance cameras in low-light campus environments, a low-light Salient Object Detection (SOD) method is proposed to enhance target detection capability under low-light conditions. Given the challenges of weakened salient features and the lack of large-scale annotated data in low-light images, a Source-Free Domain Adaptation (SFDA) method for low-light SOD is proposed to transfer the model knowledge trained on normal-lighting images (source domain) to low-light images (target domain). The proposed method employs a two-stage strategy. In the first stage, pseudo-labels for low-light images are generated using the source domain model. To improve the quality of the pseudo-label generation, an ensemble entropy minimization loss is proposed to suppress high-entropy regions. In addition, a selective voting method is introduced to enhance pseudo-label generation. In the second stage, a teacher-student network self-training method based on enhanced guided consistency is employed to refine the saliency maps, further improving the accuracy of the detection results. Experimental results on the SOD-LL dataset show that the proposed method outperforms other image saliency detection methods in low-light scenarios. Compared to normal-light SOD methods, the Mean Absolute Error (MAE) is reduced by 15.15%, and the Weighted F1 value(wFm) is increased by 4.73%.
In indoor environments, the same object may have completely different uses depending on the room category. Thus, designing target-driven navigation tasks with room category constraints has important applications in robot navigation, smart home, and other fields. To improve the success rate of room category constrained target navigation task, a modular navigation algorithm is designed, combining search and motion control strategies with mapping and room classification modules. Given a navigation task as input, the mapping module combines RGB-D camera data and pose information, to construct an online semantic map that remembers environments that have been explored. The concept of boundary point cluster is proposed to quickly locate the most likely coordinates of the target object on the map as soon as possible when implementing the search strategy based on the proximal policy optimization algorithm framework. The central coordinates of these clusters are used as relay points. According to the number of boundary points contained in each cluster, the exploration value of the central point is evaluated and sorted and used to constrain the global target points. Furthermore, the concept of boundary points is introduced into the reward function of the search policy, to improve the search efficiency when the target points fall within the explored area. In response to the issue of the robot's inability to recognize room categories, YOLOv8_cls is trained to develop a room classification module based on the motion control strategy, to guide the robot towards the global target point to assist in decision-making, thereby better fulfilling navigation requirements. The feasibility of the navigation task and the effectiveness of the algorithm were verified in both simulated and real environments. Experimental results demonstrate that compared to the Semantic Exploration (SemExp) algorithm which employs Deep Reinforcement Learning (DRL) for search strategies, The proposed algorithm achieves faster map exploration and increased navigation success rates for two types of navigation tasks, with and without room category constraints by 2.0% and 4.0%, respectively. It demonstrates a better understanding of semantic information in the environment, enabling the completion of navigation tasks such as target object search in unknown environments.
The attention mechanism has been widely employed in the field of Speech Emotion Recognition (SER). However, traditional attention modules, while enhancing model performance, also significantly increase the model parameter count. Although the Efficient Channel Attention (ECA) mechanism has a small number of parameters, it can only generate attention weights for the channel dimension. In response to this challenge, an Improved ECA (IECA) module is proposed. IECA module generates corresponding weights for various dimensions of input feature maps with a relatively small number of parameters, enabling the model to more effectively focus on and utilize crucial information within the feature maps. Additionally, to further enhance recognition rates, spectrogram and IS10 features are separately extracted from the speech data. Employing a fusion network, predictions from different branches are combined to yield the final prediction. The proposed model obtained Weighted Accuracy (WA) of 91.63% and 92.46% and Unweighted Average Recall (UAR) of 91.25% and 92.33% on EMODB and CASIA datasets, respectively, which are higher by 2.69-8.43 percentage points and 4.16-10.69 percentage points, respectively, than those reported in previous research.
In Internet of Things (IoT) scenarios, data are susceptible to noise during collection and transmission, resulting in outliers and missing data. Existing temporal regularized matrix factorization models typically consider the squared loss as a measure of reconstruction errors, ignoring the fact that the quality of matrix factorization is also a key factor affecting a model's prediction performance when dealing with multidimensional time series in the presence of anomalous data. Therefore, this paper proposes a Time Aware Robust Non-negative Matrix Factorization multidimensional temporal prediction framework (TARNMF) based on the L2, log norm. TARNMF establishes the spatiotemporal correlation of multidimensional time series data through Nonnegative Matrix Factorization (NMF) and autoregressive temporal regular terms with learnable parameters. In the presence of outliers, data obey the Laplace distribution. Based on this assumption, the L2, log norm is used to estimate the error between the original data and the reconstructed matrices in the nonnegative robust matrix factorization to minimize the interference of the anomalous data on the prediction model. The L2, log norm is as robust as existing metric functions, solves the problem of approximating the L1 loss, and reduces its effect on the objective function by compressing the residuals of the outliers. The paper also proposes a projected gradient descent-based optimization method to optimize the model. Experiments on a high-dimensional Solar dataset show that TARNMF is scalable and robust, and the relative mean absolute error of the suboptimal results is reduced by 8.64%. Meanwhile, results on noisy data verify that TARNMF can efficiently process and predict IoT time series data in the presence of anomalous data.
This paper presents a residual behavior recognition model based on Spatio-temporal Shuffle Attention(SAT) mechanism, to improve the effectiveness of 3D convolution extraction of spatio-temporal features in deep learning models. The SAT mechanism is a lightweight multidimensional hybrid attention mechanism composed of submoudule that combines channels and temporal attention and spatial attention submodule, which adds the dimension of time combination to obtain time and channel information in channel attention. The spatial attention submodule compresses redundant time information, improves the attention to spatial features, carries out channel scrambling and reorganization on extracted features, improves the data representation ability of the model, and reduces the parameter count. In this model, a Resnext residual network is used to extract spatio-temporal features, the spatio-temporal permutation attention mechanism module is embedded into the residual module, and the attention module is used to independently learn the weight parameters of different feature maps. The extracted features are weighted in the channel, time, and space domains to enhance the network's ability to express human behavior, and Focal Loss, which is an improved cross-entropy function, is used as the loss function to solve the uneven sample distribution problem in datasets. Experimental results show that the model achieves a recognition accuracy of 96.3% and 71.6% on the UCF101 and HMDB51 datasets, respectively, which is a significant improvement over other models.
Few-Shot Relation Classification (FSRC) refers to the use of a small number of labeled instances to classify various relations within a task, and it can be quickly applied to categorize completely new classes. However, existing few-shot classification algorithms show limited generalization capabilities when distributional differences exist between the test and training domains, which leads to significant performance degradation. Therefore, a knowledge-enhanced adaptive prototyping network is proposed for domain adaptation tasks, which helps improve the robustness of the model by exploring connections between instances while learning a priori knowledge about relations and intrinsic semantics to obtain interpretable prototypes. Specifically, the correlations between supporting and query instances are captured by introducing an interactive attention mechanism to highlight key instances and generate interactive instances. The adaptive prototype fusion mechanism generates adaptive hybrid coefficients using relational information as anchors and combines instances with relational information through feature fusion to generate hybrid prototypes. Experiments are conducted on FewRel 1.0 and FewRel 2.0 datasets, and the effectiveness of the method is demonstrated. The experimental results show that the classification accuracy of the proposed network model is significantly higher compared to that of the baseline model. The proposed model has better classification performance and stability.
The Density Peak Clustering (DPC) algorithm excels in diverse fields, is adept at identifying clusters of any shape, and is noise-resistant. However, the algorithm needs help with manual cluster center selection and underperforms on datasets with uneven densities. This paper introduces a novel Gaussian distribution-based adaptive DPC algorithm to overcome these challenges. This approach involves multiplying the local density by the relative distance θi and mapping this θi into a two-dimensional Gaussian space using Z-score standardization. Uniquely, the algorithm adaptively selects cluster centers based on the standard deviation of the Gaussian distribution and assigns data points to their nearest centers for initial clustering. This paper also introduces a suture factor model to facilitate the merging of similar sub-clusters. When the suture coefficient is greater than the threshold, merge the most similar clusters in the preliminary partition results and update the similarity matrix until the merging process is completed to obtain the final result. The experimental results on artificial and real datasets indicate that compared with DBSCAN algorithm, DPC algorithm, and ICKDC algorithm, the proposed algorithm has higher clustering accuracy and better clustering performance.
Training deep neural networks without skip connection structures is challenging when the depth of the networks is high. Thus, to address optimization issues and enhance generalization performance, skip connection structures have been integrated into the most recent deep neural network models. However, the effect of skip connection structures on feature extraction in deep neural networks has not yet been clarified; in most cases, these models are considered black boxes. Toward the elucidation of this effect, this study focuses on perturbation-based methods and introduces a method called Grid-Shuffled Blurring (GSB). This method aims to reduce the fine-grained details within an image, while maintaining its overall color distribution and contour characteristics. This study employs the Activation Maximization (AM) method for feature visualization and the GSB perturbation method to analyze classic deep neural network models such as VGG 19, ResNet 50, and DenseNet 201 in image classification tasks, which have different levels of skip connection structures. Experimental results show that the neural networks without the skip connection structures extract only stronger features from images, resulting in fewer extracted features, whereas those with the skip connection structures extract more features from images, albeit weaker ones. Moreover, the skip connection structures cause the models to focus more on the local color distribution and global contours of images, rather than the detailed features of images. The more the skip connection structures, the stronger is the trend.
Transformer-based object tracking methods are widely used in the field of computer vision and have achieved excellent results. However, object transformations, object occlusion, illumination changes, and rapid object motion can change object information during actual tracking tasks, and consequently, the underutilization of object template change information in existing methods prevents the tracking performance from improving. To solve this problem, this paper presents a Transformer object tracking method, TransTRDT, based on real-time dynamic template update. A dynamic template updating branch is attached to reflect the latest appearance and motion state of an object. The branch determines whether the template is updated through the template quality scoring header; when it identifies the possibility of an update, it passes the initial template, the dynamic template of the previous frame, and the latest prediction after cropping into the dynamic template updating network to update the dynamic template. As a result, the object can be tracked more accurately by obtaining a more reliable template. The tracking performance of TransTRDT on GOT-10k, LsSOT, and TrackingNet is superior to algorithms such as SwinTrack and StarK. It outperforms to achieve a tracking success rate of 71.9% on the OTB100 dataset, with a tracking speed of 36.82 frames per second, reaching the current leading level in the industry.
In current speech emotion recognition systems, the insufficient extraction of emotional features and inadequate modeling ability of models for complex emotional expressions have resulted in decreased recognition accuracy. This paper proposes a method for speech emotion recognition based on memory capsules and attention to improve the current speech emotion recognition accuracy. First, five features of speech, namely, the Mel Frequency Cepstrum Coefficient (MFCC), Root Mean Square (RMS) of energy, Mel-spectrogram, Zero-Crossing Rate (ZCR), and Chromaticity distribution (CHROMA), are extracted. Next, the first-, second-, and third-order differential dynamics of the MFCC are extracted on the basis of the MFCC features, which are then stitched together. Finally, these features are stacked into the form of one-dimensional vectors, and the classification of speech emotion recognition is completed by introducing the model constructed by the memory capsule and attention mechanism. The experimental results show that the proposed model exhibits enhanced generalization and robustness, which effectively improves the accuracy of speech emotion recognition. The accuracies achieved on three datasets, RAVDESS, EMODB, and IEMOCAP, reached 95.87%, 98.82%, and 98.23%, respectively, and the recognition accuracies are effectively improved compared with existing methods.
Developing exploits for vulnerabilities is the primary method of evaluating the exploitability of kernel vulnerabilities. Heap spraying objects are widely used in the exploitation process to execute malicious behaviors, such as malicious content injection and memory layout manipulation. Currently, the basic types of heap spraying objects have received limited attention, and code that can edit the content of heap spraying objects has not been generated. Therefore, this paper proposes the automated technplogy for heap spraying objects manipulationg code for kernel vulnerabilities exploitation. This technology includes heap spraying object recognition based on usage-definition chain analysis and heap spraying object control code generation based on guided fuzzy testing. Usage-definition chain analysis is used to statically identify heap spraying objects within the target kernel and the key code positions that can manipulate these objects. Using the identified key codes as target points, guided fuzzy testing technology is applied to dynamically generate control codes for the target heap spraying object to assist in vulnerability exploitation. Experimental results show that the techniques can identify and generate the control code of 28 heap spraying objects in Linux 5.15, which covers all heap spraying objects identified in existing works. 23 generated codes can control the heap spraying object to achieve the expected target with a success rate of 82.1%. The case analysis shows that the manipulating code generated by these techniques can be used to exploit real-world kernel vulnerabilities.
Individuals and businesses are more inclined to store encrypted data in the cloud because cloud servers offer powerful storage and computing capabilities. Ciphertext retrieval using homomorphic encryption has become a research hotspot to address the issue of difficult ciphertext data retrieval. However, existing schemes mainly focus on single-keyword retrieval, which results in high communication and computation overheads owing to fewer retrieval constraints and lower search accuracy. In addition, owing to the hosting of data on untrustworthy cloud hosts provided by third parties, malicious situations such as the deletion, modification, or return of untrue and incomplete search information may occur. Therefore, a novel ciphertext retrieval scheme is proposed based on full homomorphic encryption and an inadvertent pseudo-random function. By constructing an encrypted keyword index and hash table, the scheme can support multi-keyword conjunctive retrieval. The identification and size of the file are used to generate authentication tags that enable the data receiver to verify the correctness and integrity of the retrieval results. Theoretical analysis and experimental results show that, compared to the single-keyword retrieval scheme based on full homomorphic encryption, the efficiency of searching ciphertexts is improved by 36.2%-45.9% when retrieving 2-3 multiple keywords, and the proposed scheme exhibits better overall performance when retrieving more keywords.
Existing Base Station (BS) location privacy protection schemes in Wireless Sensor Networks (WSN) are hindered by poor privacy and imbalanced network energy consumption. To address these problems, this paper proposes a Ring-based Base-station location privacy protection Routing Protocol (RBRP) to protect BS location privacy effectively. RBRP solves the energy imbalance problem by establishing a network topology model with a multi-ring structure and designing data routing based on the ring structure to extend the network lifecycle. The proposed protocol injects a Fake Base Station (FBS) into the network and generates traffic at locations far from the BS, which prevents attackers from determining the BS location through traffic analysis. Simulation experiments show that compared with existing schemes, RBRP further improves the location privacy of BS in WSN and provides an advantage in terms of transmission delay when the source node and BS are in the same or adjacent ring. It can also effectively balance energy consumption and extend the network lifecycle.
This paper discusses the significance of crime news topic analysis and identifies the limitations of existing methods. The paper presents a novel topic analysis model called the Bidirectional Encoder Representations from Transformers-based Embedded Crime Topic Model (BERT-ECTM) to address identified issues. The model leverages crime charges from legal documents as supervision signals and combines them with crime news text as input to enhance the accuracy and crime preferences of crime news topic information. The model adopts a BERT-based embedded topic analysis approach to capture contextual semantic features effectively. This paper also introduces a variation inference method that approximates the posterior distribution for improved distribution results, to address the challenge of complex marginal distribution estimation during model training. The proposed model is significantly more effective and accurate than traditional methods in analyzing specific crime news topics.
This paper introduces RHotStuff, a reputation-based consensus algorithm for Internet of Vehicles (IoV), to address the issues of high communication overhead and arbitrary selection of master nodes in traditional consensus algorithms for IoV. This algorithm treats vehicles and Road Side Units (RSU) in the IoV as nodes, forming a consensus network. Indicators such as voting activity, historical influence, and reputation punishment factors are introduced to implement the reputation mechanism. This mechanism evaluates the reputation scores of nodes and measures their credibility. Based on their reputation scores, the nodes are divided into master, slave, and candidates. Before the consensus begins, only a subset of nodes with higher reputation scores are selected as master and slave nodes to participate in the consensus. This reduces communication overhead and improves consensus performance. The master node is selected from the node with the highest reputation score, reducing the predictability of the master node. After the consensus is reached, the reputation scores are recalculated, and the next round of nodes participating in the consensus is selected accordingly. Additionally, the master node sends the consensus result to all other nodes during the Reply phase to synchronize the reputation scores and blocks. Experimental results demonstrate that RHotStuff has an O(N) communication complexity, and its consensus success rate is approximately 30% higher than that of C-HotStuff. This helps improve the consensus performance. When there are 93 nodes, the consensus throughput of RHotStuff is 11.68% higher than that of R-PBFT, whereas its consensus delay is reduced by 11.74%. Overall, RHotStuff optimizes the selection method of the master node and has low communication overhead and consensus delay while also obtaining a high consensus success rate and throughput, which is of great significance for improving the communication efficiency of the IoV and the development of intelligent transportation.
Removing single image reflections is an important task in computer vision. However, existing image reflection removal models are based on the premise that reflection pollution areas are fuzzy types, which means the reflection areas retain the original image content information. In the case of spot reflection in a contaminated image, the original content information of the image is completely lost, leading to the failure of existing models in extracting the original image transmission layer information from the spot region. To address this problem, this study proposes a new model that can simultaneously remove spots and fuzzy reflections. By utilizing a self-defined reflection classifier and structure restorer, the model predicts the gradient map of the image transmission layer and uses it as an auxiliary condition to generate an ultimately pure transmission-layer image. Experiments show that our model has a good generalization performance on different categories of reflected images. Experiments on art images, specifically Tangka, demonstrate that our model outperforms the state-of-the-art removal model in terms of Structure Similarity Index Measure (SSIM) and Peak Signal-to-Noise Ratio (PSNR), which increase by 1.6% and 5.5%, respectively. Experiments on public natural scene datasets also indicate that our model is comparable to state-of-the-art models.
Trams, owing to their operation in a shared right-of-way and reliance on driver visual observation, are more prone to collisions with intruding obstacles than urban rail transit systems, such as subways and maglev trains. Therefore, to ensure the operational safety of trams, a method for calculating the space-time distance of obstacles based on instance segmentation and monocular vision is proposed. First, the contour points of the obstacles and track area are extracted using an instance segmentation model. Subsequently, a monocular vision ranging model is established based on the principles of monocular vision. By incorporating a standard gauge length of 1.435 m as prior knowledge, the longitudinal distance between the obstacles and tram is calculated without camera calibration. Finally, the lateral distance between the obstacle and track area is calculated based on the point on the obstacle closest to the track area and the corresponding track endpoint. This method fills a research gap in the field of rail transit by calculating the space-time distance of obstacles using the standard gauge length of trams as prior knowledge. Additionally, by introducing an instance segmentation model, the key points for obstacle distance measurement are determined with pixel-level accuracy, enabling the precise calculation of the space-time distance of obstacles. The feasibility of the proposed method is verified using experimental data captured in real-world scenarios. The results show that the maximum positive and negative errors of longitudinal distance calculation are 1.60 m and 1.05 m, respectively, indicating a high level of accuracy in the distance calculation results.
Factors such as irregular shapes, different sizes, and complex backgrounds of steel surface defects significantly increase the difficulty in detecting steel surface defects. To overcome the drawbacks of existing methods, such as low detection accuracy, low detection speed, and difficulty in detecting small target defects, an improved steel defect detection method, namely RFB-YOLOv5-E, based on the enhanced fusion of Receptive Field Block (RFB) and YOLOv5 features, is proposed to improve the recognition rate of steel surface defects. First, the C3 module in YOLOv5 is modified and upgraded to the C3s module to obtain more gradient information by adding more gradient flow branches, which improves the accuracy of the model. Second, the shallow feature extraction network is improved, and feature enhancement functions are added, which increase the difference between background and defect. A downsampling layer and a detection head are added to expand the depth and receptive field of the network in order to improve the feature extraction and detection capabilities. In addition, the RFB module is improved, and it replaces the Spatial Pyramid Pooling-Fast (SPPF) module in the YOLOv5 backbone network. By simulating human vision, the receptive field is further enlarged, and the feature extraction capability of the network is further enhanced. The experimental results show that the mean Average Precision (AmAP) of the RFB-YOLOv5-E algorithm on the NEU-DET dataset reaches 79.2%, which is 8.5% higher than that of the original YOLOv5 algorithm. Further, the detection speed is 122 frames per second, which indicates that a better balance between detection speed and detection accuracy is achieved.
The effective detection of road surface cracks is key to maintaining road safety and prolonging road life. To address the problems of difficulty in identifying small cracks, segmentation fractures, and low segmentation accuracy for traditional road surface crack detection methods, an improved DeepLabv3+ road surface crack detection method is proposed to simultaneously reduce the number of model parameters and improve the accuracy of crack detection. First, the backbone of the DeepLabv3+ model is replaced with an optimized MobileNetv2 network to reduce the number of parameters and complexity of the model, which speeds up the operation. Second, the Strip Pooling Module (SPM) is integrated into the Atrous Spatial Pyramid Pooling (ASPP) module to enable the network to capture more crack context information and preserve the characteristics of small parts of the crack. Finally, a Convolutional Block Attention Module (CBAM) is introduced to make the network focus more on the pixel region that plays a decisive role in crack detection, which enhances the feature expression ability of crack images. According to the experimental results, the improved DeepLabv3+ model achieved a Mean Pixel Accuracy (MPA) of 87.85%, Mean Intersection over Union (MIoU) of 80.53%, accuracy of 97.51%, precision of 88.65%, and F1-Score of 88.24%; compared with the basic DeepLabv3+ model, the improvements are 1.77%, 2.03%, 0.30%, 2.25%, and 1.51%, respectively. These indices of the proposed model are higher than those of the U-Net, HR-Net, and PSP-Net models. In addition, the number of parameters of the improved model is 6.382×106, which is 88.3% of that of the basis model, and the real-time performance is better, making it more suitable for road surface crack detection.
The accuracy of stereo matching directly determines the precision of subsequent 3D scene information recovery, and enhancing the accuracy of disparity maps has attracted considerable attention among researchers. Traditional stereo matching methods inadequately represent the local structure of images, particularly within regions of similar structures, at the junctions of the foreground and background, and in areas containing erroneous cost points. To address these issues, this paper proposes a stereo matching method based on a quad-gradient, multi-feature cost, and quad-weight filtering. This method constructs a multi-feature space composed of image intensity and gradients in four directions. It employs quadratic encoding to calculate the cost of the multi-feature census transform of images and then combines it with the multi-feature Absolute Difference (AD) cost to enhance the accuracy of the local structural representation. A filter kernel constructed with four weights, namely, spatial proximity, pixel intensity similarity, regional similarity, and cost similarity, is used for cost aggregation to mitigate the aggregation weight of abnormal costs. The initial disparity is calculated using the Winner-Take-All (WTA) algorithm and is preliminarily corrected through a left-right consistency check, followed by disparity optimization using an adaptive window and a disparity threshold. Results of experiments on the Middlebury V3 stereo platform indicate that the algorithm significantly outperforms existing traditional stereo matching algorithms. It yields a weighted average bad4.0 value (percentage of ″bad″ pixels having an error greater than 4.0 pixels) of 14.7% in non-occluded regions and 20.6% in all regions.
In recent years, significant progress has been made in the field of skeleton-based human behavior recognition using Graph Convolutional Networks (GCNs). However, most of the existing GCNs concatenate temporal and spatial convolutions in a straightforward manner, which leads to suboptimal spatiotemporal feature fusion. In addition, existing models face challenges in terms of efficiently extracting temporal features. To address these issues, this paper proposes an Extended Temporal and spatiotemporal Feature Fusion Graph Convolutional Network (ETFF-GCN). This network employs channel aggregation to fuse dynamic spatial topology and temporal features in a two-stage fusion process, followed by the application of attention mechanisms for further enhancement. In addition, multiple convolutional kernels of varying sizes are utilized to construct temporal graph convolutions for capturing multiscale and multigranular temporal characteristics. Furthermore, an effective compression excitation module is used for feature enhancement, which leads to improved feature representation capabilities. Experiments on three large datasets demonstrate that the proposed approach outperforms existing methods.
Cloth-Changing person Re-Identification (CC-ReID) aims to identify target pedestrians wearing different outfits. Existing methods incorporate additional information (such as contours, gait, and 3D information) to assist the model in learning the clothing-agnostic features of pedestrians. However, owing to factors such as lighting and pose variations, the extracted biometric features may contain errors. To enhance accuracy, this paper explores the application of Contrastive Language-Image Pre-training (CLIP) and proposes CLIP-driven Fine-grained Feature Enhancement (CFFE) for CC-ReID. This method first models the potential intrinsic relationship between the class text features and image features extracted by CLIP. Subsequently, it uses a salient feature retention module and a saliency feature guiding module. The saliency feature retention module utilizes attention masks to locate foreground regions relevant to clothing and erases these features to ensure that the network focus on effective non-clothing features. Next, the saliency feature guidance module focuses on the important local and global features of pedestrians through attention mechanisms. The CFFE method achieves detection accuracies of 42.1%, 71.1%, and 89.9% on the LTCC, PRCC, and VC-Clothes datasets, respectively. Compared with algorithms such as AIM and CAL, CFFE extracts more robust features, showing significant improvements across multiple metrics.
Pedestrian detection in intelligent community scenarios needs to accurately recognize pedestrians to address various situations. However, for persons who are occluded or at long distances, existing detectors exhibit problems such as missed detection, detection error, and large models. To address these problems, this paper proposes a pedestrian detection algorithm, Multiscale Efficient-YOLO (ME-YOLO), based on YOLOv8. An efficient feature Extraction Module (EM) is designed to improve network learning and capture pedestrian features, which reduces the number of network parameters and improves detection accuracy. The reconstructed detection head module reintegrates the detection layer to enhance the network's ability to recognize small targets and effectively detect small target pedestrians. A Bidirectional Feature Pyramid Network (BiFPN) is introduced to design a new neck network, namely the Bidirectional Dilated Residual-Feature Pyramid Network (BDR-FPN), and the expanded residual module and weighted attention mechanism expand the receptive field and learn pedestrian features with emphasis, thereby alleviating the problem of network insensitivity to occluded pedestrians. Compared with the original YOLOv8 algorithm, ME-YOLO increases the AP50 by 5.6 percentage points, reduces the number of model parameters by 41%, and compresses the model size by 40% after training and verification based on the CityPersons dataset. ME-YOLO also increases the AP50 by 4.1 percentage points and AP50∶95 by 1.7 percentage points on the TinyPerson dataset. Moreover, the algorithm significantly reduces the number of model parameters and model size and effectively improves detection accuracy. This method has a considerable application value in intelligent community scenarios.
In flexible job shop scheduling, where manufacturing cells are no longer the only option and processing time is uncertain, multiple Automated Guided Vehicles (AGVs) play an important role. However, charging becomes a crucial factor when an AGV consumes excessive power or takes too long to complete its tasks. This study aims to solve the Flexible Job Shop Scheduling Problem (FJSP) involving multiple AGVs while considering battery constraints. This study comprehensively considers constraints such as manufacturing unit processing time, AGV transportation time, and AGV charging status with the goal of optimizing the maximum completion time. A mathematical model is established for this problem, and a hybrid Memetic Algorithm (MA) combining MA with an adaptive variable neighborhood search algorithm is proposed. The algorithm utilizes a cultural genetic algorithm as a framework and introduces a critical path method based on a disjunctive graph to solve the problem of high idle rates of manufacturing units and AGVs. Additionally, to improve the algorithm's search capability and avoid becoming trapped in local optimal solutions, an adaptive variable neighborhood search is used to enhance the best solution of the current iteration. Multiple neighborhood structures that break recombination are designed to find the optimal value. The simulation results show that the algorithm can find the optimal solution and has a better overall performance than other algorithms, verifying its effectiveness.
Based on deep learning, fish fry detection in aquaculture presents a potential for automated and precision management. To address the challenges of low device performance and high real-time requirements in fish fry detection, this paper proposes an improved lightweight fish fry detection algorithm called FD-YOLO. This algorithm replaces the original CSPDarkNet feature extraction network in YOLOv8 with a FasterNet variant and introduces Partial Convolutions (PConv) to reduce redundant computations and memory access. In the feature fusion stage, Depthwise Separable Convolutions (DWConv) are adopted, and the standard convolution process is decomposed into two relatively simple depthwise and pointwise convolutions executed in parallel, thereby further reducing the model complexity and computational resource demands. The model employs the Focal-EIoU loss function, enhancing detection accuracy and robustness. The experimental results demonstrate that the improved detection model significantly reduces the number of parameters and computational load by 91% and 85%, respectively. Moreover, the inference speed on the CPU is tripled compared with the baseline. The optimized fish fry detection algorithm effectively balances high precision with real-time performance, making it suitable for deployment on hardware platforms with limited resources. The enhanced algorithm demonstrates superior adaptability and practicality for addressing the critical needs of real-world aquaculture applications.
The traditional forecasting model cannot accurately determine the spatial correlation and temporal dependence of time-series load data, resulting in low forecasting accuracy. To address this issue and the non-stationarity of power load data, a time-series power load forecasting method based on Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) and a spectrogram convolutional network is designed and implemented in this study. First, the target load sequence is decomposed into multiple Intrinsic Mode Function (IMF) components using CEEMDAN, and the IMF is reconstructed by calculating the fuzzy entropy. Subsequently, the temporal correlation and spatial dependence of the reconstructed components are mined using the spectral time graph convolutional network, and the prediction results for each component are obtained. Finally, these prediction results are summed linearly to obtain the final prediction results. The experimental results show that the proposed method achieves a Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Mean Absolute Percentage Error (MAPE) of 0.72 KW, 0.89 KW, and 0.92%, respectively. The prediction accuracy of the proposed method is 37.9%, 17.2%, 20.8%, 22.5%, and 12.1% greater than that of StemGnn, TCN, LSTM, Informer, and FEDformer, respectively. The results demonstrate that the proposed prediction method can effectively reduce the influence of non-stationarity on the prediction results and accurately obtain the spatial correlation and temporal dependence of the time-ordered load data, leading to improved prediction accuracy.
Traffic prediction faces three primary challenges: traditional spatiotemporal modeling methods struggle to capture long-range dependencies effectively, fixed time-window mechanisms cannot adapt to dynamic temporal patterns, and conventional statistical-based models exhibit limitations in modeling complex topological relationships. To address these issues, this study proposes a Temporal-enhanced Efficient Graph Attention Network (T-EGT). First, an Efficient Multi-head Self-Attention (EMSA) mechanism is designed, employing parameter sharing and sparse computation strategies to reduce the computational complexity of attention heads from O(N2) to O(NlogaN). Second, a linear temporal extension module is developed, extending the temporal perception range from fixed K steps to an elastic window of K+Δ through learnable temporal convolution kernels, where Δ∈ serves as an adaptive adjustment parameter. Finally, a dynamic graph inference architecture is constructed by utilizing the neighborhood aggregation characteristics of Graph Neural Networks (GNNs) to automatically generate topological relationship matrixes containing 83 traffic elements at each time step. Experiments on five benchmark datasets, including PeMSD4 and METR-LA, demonstrate that T-EGAT significantly outperforms 16 baseline models (including Diffusion Convolutional Recurrent Neural Network (DCRNN), GraphWaveNet, and Attention Based Spatial-Temporal Graph Convolutional Network (ASTGCN)), achieving a 2.77%-5.97% reduction in Mean Absolute Error (MAE), 3.12%-6.44% improvement in Root Mean Square Error (RMSE), and 1.41%-2.3% decrease in single-step prediction time. Ablation studies quantify the module contributions: EMSA accounts for a 42% accuracy improvement, the temporal extension module reduces long-term prediction errors by 17%, and the dynamic graph generation mechanism enhances the topological modeling accuracy by 29%. The model demonstrates enhanced robustness in sudden traffic accident scenarios, achieving an anomaly detection F1 value of 0.873, indicating a 21.5% improvement over conventional methods. These findings provide a new technical framework for real-time traffic management systems with an elastic temporal modeling mechanism and efficient attention architecture, offering universal solutions for spatiotemporal prediction tasks.
Most existing selective crowdsourcing models assume that goods are collected from a centralized distribution center or transfer station and then delivered to their final destinations. However, this approach fails to meet the actual demands of industrial Internet platforms, which require collecting goods from distributed manufacturing enterprises and delivering them to industrial users. To address the multi-vehicle, multi-start point pickup, and delivery path planning problems in selective crowdsourcing for logistics services, we propose an integer linear programming model. This model incorporates constraints related to the start and ending points of social and dedicated vehicles, the correspondence between pickup and delivery points, and other relevant parameters. The primary objective is to minimize the total cost, which includes the logistics service fees of social vehicles and the delivery costs of dedicated vehicles. An Improved Memetic Algorithm (IMA) is designed, which includes probability-based Mixed Positive and Negative Crossover (MPNC) operators, hybrid strategies combining Inter-vehicle Neighborhood Search (VNS) and Inter-path Neighborhood Search (PNS), and the corresponding two-stage path repair methods. Results from the experiments indicate that the newly developed MPNC crossover operators achieve higher population diversity in less time compared to traditional partial crossover operators, whereas the VNS and PNS hybrid strategies generate more feasible solutions than single neighborhood search. The results from artificial examples at different scales show that the IMA outperforms traditional algorithms such as Genetic Algorithm (GA), Simulated Annealing (SA), and improved Particle Swarm Optimization (PSO) in terms of optimization performance and local problem-solving ability. The IMA adopts a selective crowdsourcing model, which reduces actual logistics service costs compared to the use of either pure social or pure dedicated vehicles.
This paper proposes a path-planning method based on hybrid A* and modified RS curve fusion to address the issue of unmanned transfer vehicles in limited scenarios being unable to maintain a safe distance from surrounding obstacles during path planning, resulting in collisions between vehicles and obstacles. First, a distance cost function based on the KD Tree algorithm is proposed and added to the cost function of the hybrid A* algorithm. Second, the expansion strategy of the hybrid A* algorithm is changed by dynamically changing the node expansion distance based on the surrounding environment of the vehicle, achieving dynamic node expansion and improving the algorithm's node search efficiency. Finally, the RS curve generation mechanism of the hybrid A* algorithm is improved to make the straight part of the generated RS curve parallel to the boundary of the surrounding obstacles to meet the requirements of road driving in the plant area. Subsequently, the local path is smoothed to ensure that it meets the continuity of path curvature changes under the conditions of vehicle kinematics constraints to improve the quality of the generated path. The experimental results show that, compared with traditional algorithms, the proposed algorithm reduces the search time by 38.06%, reduces the maximum curvature by 25.2%, and increases the closest distance from the path to the obstacle by 51.3%. Thus, the proposed method effectively improves the quality of path generation of the hybrid A* algorithm and can operate well in limited scenarios.