基于声谱图时间分辨率优化与残差空间金字塔网络的车辆识别

doi:10.19678/j.issn.1000-3428.0068662

摘要/Abstract

摘要：

车辆分类是智能交通系统的关键技术之一, 是道路交通监控系统的一个重要研究领域。由于声学传感器具有效率高、成本低、可昼夜工作、隐蔽性强等优势, 因此基于车辆声音特征的车辆分类引起了研究人员的广泛关注。然而, 现有研究中的车辆声音信号仅包含单一车辆, 对于混合的双车辆声音信号的分类缺乏讨论。为此, 设计一种网络模型对单车辆和双车辆共12种类别的噪声信号进行分类。针对声音频谱特征的固定分辨率并非最优的问题, 基于网络训练得出的注意力得分和时间转换矩阵, 控制噪声频谱时间大小, 设计频谱时间分辨率优化模型。分类网络依据卷积递归神经网络(CRNN)架构, 卷积网络部分(多尺度信号分析模块)参考高效空间金字塔模块对特征进行双分支融合处理, 由于循环神经网络(RNN)等不利于并行化, 运算速度慢, 因此将因果时间卷积网络(TCN)转换为非因果循环TCN。在自制数据集中进行实验, 结果表明, 该模型的平均精度均值(mAP)达到0.98, 远高于相当参数量下的CRNN网络, 与MobileNetV3性能相当, 但是相比MobileNetV3参数量减少了1.7×10⁶。分析结果表明, 所提模型适用于长时间声音信号处理任务, 能提取深层次的特征。

关键词: 车辆识别, 声音信号重建, 卷积循环神经网络, 高效空间金字塔模块, 时间卷积神经网络, 时间分辨率优化

Abstract:

Vehicle classification is a key technology in intelligent transportation systems and a vital research area in road traffic monitoring systems. Owing to the advantages of acoustic sensors, such as high efficiency, low cost, round-the-clock operations, and strong concealment, vehicle classification based on vehicle sound characteristics has been extensively researched. However, existing vehicle sound signals only contain a single vehicle, with limited discussion on classifying mixed two-vehicle sound signals. To address this research gap, a network model is developed to classify noise signals from single and double vehicles. To address the issue of suboptimal fixed resolution in sound spectral features, a spectral time-resolution optimization model is designed using the attention score and frame warpage matrix obtained from network training. The classification network is based on a Convolutional Recurrent Neural Network (CRNN) architecture, with the convolutional component (multiscale signal reconstruction module) utilizing an efficient spatial pyramid for double-branch fusion. Since Recurrent Neural Network (RNN) and other cyclic networks are unsuitable for parallelization and have low operation speeds, the causal Time Convolutional Neural Network (TCN) is converted to a non-causal cyclic TCN. The mean Average Precision(mAP) of the model on the self-made dataset reaches 0.98, significantly outperforming the CRNN network with a comparable parameter count. Its performance is comparable to MobileNetV3 but with 1.7×10⁶ fewer parameters. Experimental results indicate that the designed model is effective for processing long-term sound signals and extracting deep features.

Key words: vehicle recognition, sound signal reconstruction, Convolutional Recurrent Neural Network (CRNN), Efficient Spatial Pyramid (ESP) module, Time Convolutional Neural Network (TCN), time resolution optimization

刘伟娜, 赵红东, 史剑锋, 张学志, 赵一鸣. 基于声谱图时间分辨率优化与残差空间金字塔网络的车辆识别[J]. 计算机工程, 2024, 50(12): 376-385.

LIU Weina, ZHAO Hongdong, SHI Jianfeng, ZHANG Xuezhi, ZHAO Yiming. Vehicle Recognition Based on Spectral-Temporal Resolution Optimization and Residual Spatial Pyramid Network[J]. Computer Engineering, 2024, 50(12): 376-385.

https://www.ecice06.com/CN/Y2024/V50/I12/376

图/表 12

图1 车辆分类网络结构

Fig.1 Vehicle classification network structure

图2 信号多尺度分析模块结构

Fig.2 Structure of signal multiscale analysis module

图3 模块中间一维空洞卷积层

Fig.3 One dimensional hollow convolutional layer in the middle of the module

图4 非因果卷积

Fig.4 Non-causal convolution

图5 车辆噪声信号的初始对数梅尔频谱图

Fig.5 Initial logarithmic Mel spectrogram of vehicle noise signal

图6 平均精度曲线

Fig.6 Average precision curves

图7 对比实验中各模型的混淆矩阵

Fig.7 Confusion matrices of various models in the comparative experiment

图8 对比实验中各模型的P-R曲线

Fig.8 P-R curves of various models in the comparative experiment

参考文献 27

1	KOZHISSERI S, BIKDASH M. Spectral features for the classification of civilian vehicles using acoustic sensors[C]// Proceedings of the IEEE Workshop on Computational Intelligence in Vehicles and Vehicular Systems. Washington D.C., USA: IEEE Press, 2009: 93-100.
2	KANDPAL M, KAKAR V K, VERMA G. Classification of ground vehicles using acoustic signal processing and neural network classifier[EB/OL]. [2023-09-05]. https://ieeexplore.ieee.org/abstract/document/6719846.
3	WILLIAM P E, HOFFMAN M W. Classification of military ground vehicles using time domain harmonics' amplitudes. IEEE Transactions on Instrumentation and Measurement, 2011, 60(11): 3720- 3731. doi: 10.1109/TIM.2011.2135110
4	WU H W, MENDEL J M. Classification of battlefield ground vehicles using acoustic features and fuzzy logic rule-based classifiers. IEEE Transactions on Fuzzy Systems, 2007, 15(1): 56- 72. doi: 10.1109/TFUZZ.2006.889760
5	杜绍研. 基于模糊神经网络的车辆声音信号识别研究. 自动化与仪器仪表, 2016,(6): 3- 4.
	DU S Y. Research on vehicle sound signal recognition based on fuzzy neural network. Automation & Instrumentation, 2016,(6): 3- 4.
6	YASSIN A I, MOHD SHARIFF K K, KECHIK M A, et al. Acoustic vehicle classification using mel-frequency features with long short-term memory neural networks. TEM Journal, 2023, 12(3): 1490- 1496.
7	SUN L, ZHANG Z B, TANG H Y, et al. Vehicle acoustic and seismic synchronization signal classification using long-term features. IEEE Sensors Journal, 2023, 23(10): 10871- 10878. doi: 10.1109/JSEN.2023.3263572
8	MOHINE S, BANSOD B S, BHALLA R, et al. Acoustic modality based hybrid deep 1D CNN-BiLSTM algorithm for moving vehicle classification. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(9): 16206- 16216. doi: 10.1109/TITS.2022.3148783
9	李翔, 王艳, 李宝清. 基于FVC-CNN模型的野外车辆声信号分类. 中国科学院大学学报, 2023, 40(2): 208- 216.
	LI X, WANG Y, LI B Q. Field vehicle signal classification based on FVC-CNN. Journal of University of Chinese Academy of Sciences, 2023, 40(2): 208- 216.
10	ABDEL-HAMID O, MOHAMED A R, JIANG H, et al. Convolutional neural networks for speech recognition. ACM Transactions on Audio, Speech, and Language Processing, 2014, 22(10): 1533- 1545.
11	范裕莹, 李成娟, 易强, 等. 基于改进TCN模型的野外运动目标分类. 计算机工程, 2021, 47(9): 106- 112. URL
	FAN Y Y, LI C J, YI Q, et al. Classification of moving targets in fields based on improved TCN model. Computer Engineering, 2021, 47(9): 106- 112. URL
12	CAKIR E, PARASCANDOLO G, HEITTOLA T, et al. Convolutional recurrent neural networks for polyphonic sound event detection. ACM Transactions on Audio, Speech, and Language Processing, 2017, 25(6): 1291- 1303.
13	TAKAHASHI N, MITSUFUJI Y. Multi-scale multi-band densenets for audio source separation[C]//Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. Washington D.C., USA: IEEE Press, 2017: 21-25.
14	SHI Z Q, LIN H B, LIU L, et al. End-to-end monaural speech separation with multi-scale dynamic weighted gated dilated convolutional pyramid network[EB/OL]. [2023-09-05]. https://www.semanticscholar.org/paper/End-to-End-Monaural-Speech-Separation-with-Dynamic-Shi-Lin/99ae5aaf 6bfcf32ec3779f206dec3d099ee8a2c1.
15	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2016: 770-778.
16	MEHTA S, RASTEGARI M, CASPI A, et al. ESPNet: efficient spatial pyramid of dilated convolutions for semantic segmentation[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 561-580.
17	赵艺. 基于路径签名的改进时空图卷积网络. 计算机工程与科学, 2022, 44(12): 2213- 2219.
	ZHAO Y. Signature spatial improved temporal graph convolutional network. Computer Engineering & Science, 2022, 44(12): 2213- 2219.
18	BAI S, KOLTER J Z, KOLTUN V. Deep equilibrium models[C]// Proceedings of the 33rd International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2019: 690-701.
19	ALSALLAKH B, KOKHLIKYAN N, MIGLANI V, et al. Mind the pad—CNNs can develop blind spots[EB/OL]. [2023-09-05]. http://arxiv.org/abs/2010.
20	MESAROS A, HEITTOLA T, DIKMEN O, et al. Sound event detection in real life recordings using coupled matrix factorization of spectral representations and class activity annotations[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Washington D.C., USA: IEEE Press, 2015: 151-155.
21	ZHANG H Y, CISSE M, DAUPHIN Y N, et al. Mixup: beyond empirical risk minimization[EB/OL]. [2023-09-05]. http://arxiv.org/abs/1710.
22	PARK D S, CHAN W, ZHANG Y, et al. SpecAugment: a simple data augmentation method for automatic speech recognition[EB/OL]. [2023-09-05]. https://arxiv.org/pdf/1904.08779.
23	KONG Q Q, CAO Y, IQBAL T, et al. PANNs: large-scale pretrained audio neural networks for audio pattern recognition. ACM Transactions on Audio, Speech, and Language Processing, 2020, 28, 2880- 2894.
24	TOKOZUME Y, USHIKU Y, HARADA T. Between-class learning for image classification[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2018: 5486-5494.
25	FERRARO A, BOGDANOV D, JAY X S, et al. How low can you go? reducing frequency and time resolution in current CNN architectures for music auto-tagging[C]//Proceedings of the 28th European Signal Processing Conference. Washington D.C., USA: IEEE Press, 2021: 131-135.
26	HOWARD A, SANDLER M, CHEN B, et al. Searching for MobileNetV3[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Washington D.C., USA: IEEE Press, 2019: 1314-1324.
27	RADOSAVOVIC I, JOHNSON J, XIE S N, et al. On network design spaces for visual recognition[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Washington D.C., USA: IEEE Press, 2019: 1882-1890.

选择文件类型/文献管理软件名称

选择包含的内容