作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (10): 174-184. doi: 10.19678/j.issn.1000-3428.0067959

• 人工智能与模式识别 • 上一篇    下一篇

基于LPDMR-NET的鸟鸣声识别

王娅茹1, 唐璐1, 陈爱斌1,*(), 彭伟雄2, 沈平3   

  1. 1. 中南林业科技大学计算机与信息工程学院, 湖南 长沙 410004
    2. 湖南自兴智慧医疗科技有限公司, 湖南 长沙 410004
    3. 长沙市回音科技有限公司, 湖南 长沙 410004
  • 收稿日期:2023-06-28 出版日期:2024-10-15 发布日期:2024-10-11
  • 通讯作者: 陈爱斌
  • 基金资助:
    国家自然科学基金(62276276); 湖南省研究生科研创新项目(CX20210879); 中南林业科技大学智慧物流技术湖南省重点实验室项目(2019TP1015)

Birdsong Recognition Based on LPDMR-NET

WANG Yaru1, TANG Lu1, CHEN Aibin1,*(), PENG Weixiong2, SHEN Ping3   

  1. 1. School of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha 410004, Hunan, China
    2. Hunan Zixing Smart Medical Technology Co., Ltd., Changsha 410004, Hunan, China
    3. Changsha Echo Technology Co., Ltd., Changsha 410004, Hunan, China
  • Received:2023-06-28 Online:2024-10-15 Published:2024-10-11
  • Contact: CHEN Aibin

摘要:

为了高效且快速地识别自然环境中的鸟鸣声, 提出一种基于轻量级逐点深度的多感受野注意力残差网络(LPDMR-NET)模型。首先, 通过Mel滤波器生成Mel频谱图。接着, 采用basicblock和downblock连接生成两层残差网络DBNet, 堆叠DBNet作为鸟鸣声识别的主干网络, 以提高训练速度。然后, 利用逐点深度卷积网络(PDNet)提取频谱图特征信息, 替代主干网络下采样模块, 将两个残差模块的basicblock中的3×3卷积替换为分离分支块(DBB), 引入不同的感受野, 在复杂多分支结构下显著提高网络的识别性能。最后, 在两个残差模块间嵌入轻量级高效置换注意力(SA)模块用于传递两层残差模块间的有效信息, 增强频谱图波纹特征, 进一步提高网络识别性能。在自建的30类鸟鸣声数据集Birdselfdata上的实验结果表明, 该模型的识别准确率为96.82%、F1值为96.73%, 在识别效率和准确性方面超越了对比模型。

关键词: 卷积神经网络, 鸟鸣声分类, 深度学习, Mel频谱图, 残差网络, 深度可分离卷积

Abstract:

A Lightweight Point-by-point Depth-based Multisensory wild attention Residual NETwork (LPDMR-NET) model is proposed to efficiently and quickly recognize birdsong in natural environments. First, Mel spectrograms are generated using Mel filters. Second, a two-layer residual network, DBNet, is generated using basic-block and down-block connections, and the stacked DBNet is used as the backbone network for birdsong recognition to improve training speed. Subsequently, a Point-by-point Deep convolutional Network (PDNet) is utilized to extract the spectrogram feature information, replace the downsampling module of the backbone network, replace the 3×3 convolution in the basic block of the two residual modules with the Detached Branching Block (DBB), and introduce different sensory fields. These changes significantly improve the recognition performance of the network under the complex multi-branch structure. Finally, a lightweight efficient Substitution Attention(SA) module is embedded between the two residual modules to transfer effective information between the two layers of the residual modules. This addition enhances the spectrogram ripple features and further improves the recognition performance of the network. Experimental results on a self-constructed 30-class bird song dataset, Birdselfdata, show that the model has a recognition accuracy of 96.82% and an F1 value of 96.73%. Thus, the proposed model outperforms the comparison model in terms of recognition efficiency and accuracy.

Key words: Convolutional Neural Network(CNN), birdsong classification, deep learning, Mel spectrogram, Residual Network(ResNet), Depth Separable Convolution(DSC)