基于双源域迁移学习的肺音信号识别

doi:10.19678/j.issn.1000-3428.0065690

摘要/Abstract

摘要：

针对目前肺音识别中因肺音数据集规模较小所致模型过拟合分类精度低的问题，提出一种基于双源域迁移学习的肺音识别方法。一方面，将音频数据集Audio Set上的预训练模型VGGish网络迁移至肺音识别中，融合高效通道注意力ECA-Net以增强识别能力，然后提取肺音的对数梅尔频率谱特征，使用VGGish网络按照时序学习谱图中的信息，并对VGGish网络输出的特征向量经不同大小和扩张率的一维卷积核进行特征增强，将增强后的特征图输入双向门控循环单元以捕获肺音的时序信息；另一方面，将图像数据集ImageNet上的预训练模型VGG19迁移至肺音识别中，将肺音波形数据转换为谱图的形式输入并训练。将两方面的模型训练后作为特征提取器，融合具有高层语义的特征向量融合并输入集成学习算法CatBoost，实现最终的分类。实验结果表明，该方法对Coswara新冠数据集中肺音识别的特异性、敏感性指标和准确率分别达到80.66%、77.69%和79.18%，对ICBHI-2017数据集中肺音识别的特异性、敏感性指标和ICHBI-score分别达到88.75%、72.04%和80.39%，均优于对比的常见识别方法。

关键词: 肺音识别, 迁移学习, 通道注意力, 对数梅尔频率谱, 集成学习

Abstract:

In current lung sound recognition methods, problems include model overfitting and low classification accuracy caused by the small size of the lung sound dataset.Therefore, a lung sound recognition method based on dual-source domain transfer learning is proposed.On the one hand, the VGGish network pre-trained on the Audio Set dataset is transferred for lung sound recognition and ECA-Net to enhance recognition ability.The Logarithmic Mel spectrum(Log-Mel) features of the extracted lung sounds and a VGGish network are used to learn the information in the spectrum according to the time sequence.The feature vectors output by the VGGish network are enhanced using one-dimensional convolution kernels of different sizes and expansion rates.The enhanced feature map is input into a bidirectional gated recurrent unit to capture the timing information of the lung sounds.On the other hand, the VGG19 model pre-trained on the ImageNet dataset is transferred to lung sound recognition, and the lung sound waveform data are converted into a spectrum, which is subsequently used for training.Finally, the two models are trained as feature extractors, and the feature vectors with high-level semantics are fused into the ensemble learning algorithm CatBoost to achieve the final classification.The experimental results show that the specificity, sensitivity indexes, and accuracy of the lung sound recognition method on the Coswara COVID-19 dataset reach 80.66%, 77.69%, and 79.18%, respectively, the specificity, sensitivity indexes, and ICHBI-score of the proposed method for lung sound recognition on the ICBHI-2017 lung sound dataset reach 88.75%, 72.04%, and 80.39%, respectively, which are better than the comparison common recognition methods.

Key words: lung sound recognition, transfer learning, channel attention, Logarithmic Mel spectrum(Log-Mel), ensemble learning

包善书, 车波, 邓林红. 基于双源域迁移学习的肺音信号识别[J]. 计算机工程, 2023, 49(9): 295-302, 312.

Shanshu BAO, Bo CHE, Linhong DENG. Lung Sound Signal Recognition Based on Dual-Source Domain Transfer Learning[J]. Computer Engineering, 2023, 49(9): 295-302, 312.

http://www.ecice06.com/CN/Y2023/V49/I9/295

图/表 18

图1 不同类型肺音的波形时域图与对数梅尔频率谱

Fig.1 Waveform time domain diagrams and logarithmic Mel spectrums of different types of lung sounds

图2 ECA-Net结构

Fig.2 Structure of ECA-Net

图3 特征增强示意图

Fig.3 Schematic diagram of feature augment

图4 基于VGGish优化模型的肺音识别示意图

Fig.4 Schematic diagram of lung sound recognition based on VGGish optimization model

图5 基于VGG19的肺音识别示意图

Fig.5 Schematic diagram of lung sound recognition based on VGG19

图6 基于LSR-Net的肺音识别示意图

Fig.6 Schematic diagram of lung sound recognition based on LSR-Net

图7 基于VGGish的肺音识别示意图

Fig.7 Schematic diagram of lung sound recognition based on VGGish

图8 LSR-Net在ICBHI-2017上对肺音识别的混淆矩阵

Fig.8 The confusion matrix of LSR-Net for lung sound recognition on ICBHI-2017

图9 LSR-Net在Coswara上对肺音识别的混淆矩阵

Fig.9 The confusion matrix of LSR-Net for lung sound recognition on Coswara

参考文献 30

1	BOHADANA A, IZBICKI G, KRAMAN S S. Fundamentals of lung auscultation. The New England Journal of Medicine, 2014, 370(8): 744- 751. doi: 10.1056/NEJMra1302901
2	PRAMONO R X A, BOWYER S, RODRIGUEZ-VILLEGAS E. Automatic adventitious respiratory sound analysis: a systematic review. PLoS One, 2017, 12(5): e0177926. doi: 10.1371/journal.pone.0177926
3	裴振伟, 朱平. 基于ICEEMDAN-MLP的肺音信号识别研究. 电子设计工程, 2021, 29(1): 96- 100. URL
	PEI Z W, ZHU P. Research on lung sound signal recognition based on ICEEMDAN-MLP. Electronic Design Engineering, 2021, 29(1): 96- 100. URL
4	PRAMONO R X A, IMTIAZ S A, RODRIGUEZ-VILLEGAS E. Evaluation of features for classification of wheezes and normal respiratory sounds. PLoS One, 2019, 14(3): e0213659. doi: 10.1371/journal.pone.0213659
5	MESSNER E, FEDIUK M, SWATEK P, et al. Multi-channel lung sound classification with convolutional recurrent neural networks. Computers in Biology and Medicine, 2020, 122, 103831. doi: 10.1016/j.compbiomed.2020.103831
6	ROCHA B M, FILOS D, MENDES L, et al. Α respiratory sound database for the development of automated classification[C]//Proceedings of International Conference on Biomedical and Health Informatics. Berlin, Germany: Springer, 2018: 33-37.
7	SERBES G, ULUKAYA S, KAHYA Y P. An automated lung sound preprocessing and classification system based on spectral analysis methods[C]//Proceedings of International Conference on Biomedical and Health Informatics. Berlin, Germany: Springer, 2018: 45-49.
8	PERNA D, TAGARELLI A. Deep auscultation: predicting respiratory anomalies and diseases via recurrent neural networks[C]//Proceedings of the 32nd International Symposium on Computer-Based Medical Systems. Washington D. C., USA: IEEE Press, 2019: 50-55.
9	DEMIR F, ISMAEL A M, SENGUR A. Classification of lung sounds with CNN model using parallel pooling structure. IEEE Access, 2020, 8, 105376- 105383. doi: 10.1109/ACCESS.2020.3000111
10	GEMMEKE J F, ELLIS D P W, FREEDMAN D, et al. Audio Set: an ontology and human-labeled dataset for audio events[C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. Washington D. C., USA: IEEE Press, 2017: 776-780.
11	张志超, 李晓燕. 基于VGGish网络的音频信息情感智能识别算法. 电子设计工程, 2022, 30(4): 26- 30. URL
	ZHANG Z C, LI X Y. Intelligent recognition algorithm of audio information emotion based on VGGish network. Electronic Design Engineering, 2022, 30(4): 26- 30. URL
12	廖辉强. 基于语音帧自动标注和领域知识迁移的语音情感识别研究[D]. 广州: 华南理工大学, 2020.
	LIAO H Q. Research on speech emotion recognition based on automatic labeling of speech frames and domain knowledge transfer[D]. Guangzhou: South China University of Technology, 2020. (in Chinese)
13	乔栋, 陈章进, 邓良, 等. 基于改进语音处理的卷积神经网络中文语音情感识别方法. 计算机工程, 2022, 48(2): 281- 290. URL
	QIAO D, CHEN Z J, DENG L, et al. Method for Chinese speech emotion recognition based on improved speech processing convolutional neural network. Computer Engineering, 2022, 48(2): 281- 290. URL
14	陈敏, 王娆芬. 基于二维图像与迁移卷积神经网络的心律失常分类. 计算机工程, 2020, 46(10): 315- 320. URL
	CHEN M, WANG R F. Arrhythmia classification based on two-dimensional image and tranfer convolutional neural network. Computer Engineering, 2020, 46(10): 315- 320. URL
15	张驰名, 王庆凤, 刘志勤, 等. 基于深度迁移学习的肺结节辅助诊断方法. 计算机工程, 2020, 46(1): 271- 278. URL
	ZHANG C M, WANG Q F, LIU Z Q, et al. Pulmonary nodule auxiliary diagnosis method based on deep transfer learning. Computer Engineering, 2020, 46(1): 271- 278. URL
16	WANG Q L, WU B G, ZHU P F, et al. ECA-Net: efficient channel attention for deep convolutional neural networks[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2020: 11531-11539.
17	PROKHORENKOVA L, GUSEV G, VOROBEV A, et al. CatBoost: unbiased boosting with categorical features[EB/OL]. [2022-04-10]. https://arxiv.org/abs/1706.09516.
18	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 7132-7141.
19	WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the 15th European Conference on Computer Vision. New York, USA: ACM Press, 2018: 3-19.
20	CHEN T Q, GUESTRIN C. XGBoost: a scalable tree boosting system[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM Press, 2016: 785-794.
21	KE G L, MENG Q, FINLEY T, et al. LightGBM: a highly efficient gradient boosting decision tree[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2017: 3149-3157.
22	SHARMA N, KRISHNAN P, KUMAR R, et al. Coswara—a database of breathing, cough, and voice sounds for COVID-19 diagnosis[EB/OL]. [2022-04-10]. https://arxiv.org/abs/2005.10548.
23	DEMIR F, SENGUR A, BAJAJ V. Convolutional neural networks based efficient approach for classification of lung diseases. Health Information Science and Systems, 2020, 8(1): 104407.
24	MA Y, XU X Z, YU Q, et al. LungBRN: a smart digital stethoscope for detecting respiratory disease using bi-ResNet deep learning algorithm[C]//Proceedings of IEEE Biomedical Circuits and Systems Conference. Washington D. C., USA: IEEE Press, 2019: 1-4.
25	KOCHETOV K, PUTIN E, BALASHOV M, et al. Noise masking recurrent neural network for respiratory sound classification[C]//Proceedings of International Conference on Artificial Neural Networks. Berlin, Germany: Springer, 2018: 208-217.
26	ACHARYA J, BASU A. Deep neural network for respiratory sound classification in wearable devices enabled by patient specific model tuning. IEEE Transactions on Biomedical Circuits and Systems, 2020, 14(3): 535- 544.
27	MA Y, XU X Z, LI Y F. LungRN+NL: an improved adventitious lung sound classification using non-local block ResNet neural network with mixup data augmentation[C]//Proceedings of Interspeech 2020. [S. l.]: ISCA, 2020: 2902-2906.
28	ZHAO X S, SHAO Y B, MAI J Y, et al. Respiratory sound classification based on BiGRU-attention network with XGBoost[C]//Proceedings of IEEE International Conference on Bioinformatics and Biomedicine. Washington D. C., USA: IEEE Press, 2021: 915-920.
29	NGUYEN T, PERNKOPF F. Lung sound classification using snapshot ensemble of convolutional neural networks[C]//Proceedings of the 42nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society. Washington D. C., USA: IEEE Press, 2020: 760-763.
30	SONG W J, HAN J Q, SONG H W. Contrastive embedding learning method for respiratory sound classification[C]//Proceedings of 2021 IEEE International Conference on Acoustics, Speech and Signal Processing. Washington D. C., USA: IEEE Press, 2021: 1275-1279.

[1]	卢昂, 储珺, 冷璐. 基于高低频特征增强的图像去雾[J]. 计算机工程, 2023, 49(8): 174-181.
[2]	陈仲磊, 伊鹏, 陈祥, 胡涛. 基于集成学习的系统调用实时异常检测框架[J]. 计算机工程, 2023, 49(6): 162-169,179.
[3]	王爱玲, 马文臻, 邹自明, 钟佳. 基于领域自适应的卫星工程参数异常检测[J]. 计算机工程, 2023, 49(5): 29-37,47.
[4]	白俊卿, 韩柏迅, 张丰侠. 基于深度学习的无人机图像语义分割算法研究[J]. 计算机工程, 2023, 49(4): 233-239.
[5]	郑云涛, 叶家炜. 基于茫然传输协议的FATE联邦迁移学习方案[J]. 计算机工程, 2023, 49(2): 24-30.
[6]	伍洲, 杨寒石, 邬俊俊, 张海军, 宋晴. 进化迁移优化算法综述[J]. 计算机工程, 2023, 49(1): 1-14.
[7]	于敏, 屈丹, 司念文. 改进的RetinaNet目标检测算法[J]. 计算机工程, 2022, 48(8): 249-257.
[8]	邱鸿辉, 刘海林, 陈磊. 基于协方差矩阵调整的多目标多任务优化算法[J]. 计算机工程, 2022, 48(8): 306-312.
[9]	白杰, 张赛, 李艳萍. 基于改进交错组卷积的眼底硬性渗出物自动分割[J]. 计算机工程, 2022, 48(7): 307-314.
[10]	艾成豪, 高建华, 黄子杰. 混合特征选择和集成学习驱动的代码异味检测[J]. 计算机工程, 2022, 48(7): 168-176,198.
[11]	史宝岱, 张秦, 李瑶, 李宇环. 面向图像目标识别的轻量化卷积神经网络[J]. 计算机工程, 2022, 48(6): 257-262.
[12]	王萌铎, 续欣莹, 阎高伟, 史丽娟, 郭磊. 基于AdaBoost集成加权宽度学习系统的不平衡数据分类[J]. 计算机工程, 2022, 48(4): 99-105,112.
[13]	张云帆, 易尧华, 汤梓伟, 王新宇. 基于通道注意力机制的文本生成图像方法[J]. 计算机工程, 2022, 48(4): 206-212,222.
[14]	宋佳, 陈程立诏. 基于多流网络一致性的视频显著性检测[J]. 计算机工程, 2022, 48(2): 215-223.
[15]	罗梦诗, 徐杨, 叶星鑫. 融入双注意力的高分辨率网络人体姿态估计[J]. 计算机工程, 2022, 48(2): 314-320.

选择文件类型/文献管理软件名称

选择包含的内容