作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (9): 295-302, 312. doi: 10.19678/j.issn.1000-3428.0065690

• 开发研究与工程应用 • 上一篇    下一篇

基于双源域迁移学习的肺音信号识别

包善书1, 车波2, 邓林红2   

  1. 1. 常州大学 计算机与人工智能学院, 江苏 常州 213164
    2. 常州大学 生物医学工程与健康科学研究院 常州市呼吸医学工程重点实验室, 江苏 常州 213164
  • 收稿日期:2022-09-05 出版日期:2023-09-15 发布日期:2022-12-13
  • 作者简介:

    包善书(1997—),男,硕士研究生,主研方向为机器学习、深度学习

    车波,博士研究生

    邓林红,教授、博士

  • 基金资助:
    国家自然科学基金(12272063); 国家自然科学基金(11532003)

Lung Sound Signal Recognition Based on Dual-Source Domain Transfer Learning

Shanshu BAO1, Bo CHE2, Linhong DENG2   

  1. 1. School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou 213164, Jiangsu, China
    2. Changzhou Key Laboratory of Respiratory Medical Engineering, Institute of Biomedical Engineering and Health Sciences, Changzhou University, Changzhou 213164, Jiangsu, China
  • Received:2022-09-05 Online:2023-09-15 Published:2022-12-13

摘要:

针对目前肺音识别中因肺音数据集规模较小所致模型过拟合分类精度低的问题,提出一种基于双源域迁移学习的肺音识别方法。一方面,将音频数据集Audio Set上的预训练模型VGGish网络迁移至肺音识别中,融合高效通道注意力ECA-Net以增强识别能力,然后提取肺音的对数梅尔频率谱特征,使用VGGish网络按照时序学习谱图中的信息,并对VGGish网络输出的特征向量经不同大小和扩张率的一维卷积核进行特征增强,将增强后的特征图输入双向门控循环单元以捕获肺音的时序信息;另一方面,将图像数据集ImageNet上的预训练模型VGG19迁移至肺音识别中,将肺音波形数据转换为谱图的形式输入并训练。将两方面的模型训练后作为特征提取器,融合具有高层语义的特征向量融合并输入集成学习算法CatBoost,实现最终的分类。实验结果表明,该方法对Coswara新冠数据集中肺音识别的特异性、敏感性指标和准确率分别达到80.66%、77.69%和79.18%,对ICBHI-2017数据集中肺音识别的特异性、敏感性指标和ICHBI-score分别达到88.75%、72.04%和80.39%,均优于对比的常见识别方法。

关键词: 肺音识别, 迁移学习, 通道注意力, 对数梅尔频率谱, 集成学习

Abstract:

In current lung sound recognition methods, problems include model overfitting and low classification accuracy caused by the small size of the lung sound dataset.Therefore, a lung sound recognition method based on dual-source domain transfer learning is proposed.On the one hand, the VGGish network pre-trained on the Audio Set dataset is transferred for lung sound recognition and ECA-Net to enhance recognition ability.The Logarithmic Mel spectrum(Log-Mel) features of the extracted lung sounds and a VGGish network are used to learn the information in the spectrum according to the time sequence.The feature vectors output by the VGGish network are enhanced using one-dimensional convolution kernels of different sizes and expansion rates.The enhanced feature map is input into a bidirectional gated recurrent unit to capture the timing information of the lung sounds.On the other hand, the VGG19 model pre-trained on the ImageNet dataset is transferred to lung sound recognition, and the lung sound waveform data are converted into a spectrum, which is subsequently used for training.Finally, the two models are trained as feature extractors, and the feature vectors with high-level semantics are fused into the ensemble learning algorithm CatBoost to achieve the final classification.The experimental results show that the specificity, sensitivity indexes, and accuracy of the lung sound recognition method on the Coswara COVID-19 dataset reach 80.66%, 77.69%, and 79.18%, respectively, the specificity, sensitivity indexes, and ICHBI-score of the proposed method for lung sound recognition on the ICBHI-2017 lung sound dataset reach 88.75%, 72.04%, and 80.39%, respectively, which are better than the comparison common recognition methods.

Key words: lung sound recognition, transfer learning, channel attention, Logarithmic Mel spectrum(Log-Mel), ensemble learning