作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2019, Vol. 45 ›› Issue (1): 199-205. doi: 10.19678/j.issn.1000-3428.0049401

• 人工智能及识别技术 • 上一篇    下一篇

基于改进卷积神经网络与听觉谱图的乐器识别

王飞,于凤芹   

  1. 江南大学 物联网工程学院,江苏 无锡 214100
  • 收稿日期:2017-11-21 出版日期:2019-01-15 发布日期:2019-01-15
  • 作者简介:王飞(1991—),男,硕士研究生,主研方向为语音信号分析处理、深度学习;于凤芹,教授、博士
  • 基金资助:

    国家自然科学基金(61703185)

Musical Instrument Identification Based on Improved Convolutional Neural Network and Auditory Spectrum

WANG Fei,YU Fengqin   

  1. School of Internet of Things Engineering,Jiangnan University,Wuxi,Jiangsu 214100,China
  • Received:2017-11-21 Online:2019-01-15 Published:2019-01-15

摘要:

针对传统乐器识别需要音乐的低级声频特征及识别性能依赖特征选取的问题,利用接近人耳感知且低冗余度的听觉谱图作为5层深度卷积网络的输入,逐层抽象出音色的高级时频表示用于乐器识别。为有效捕获听觉谱图中的时频信息,将卷积网络第1层矩形卷积核改进为频率、时间轴上的多尺度卷积核。在IOWA乐器库上进行的仿真实验结果表明,该神经网能获得96.95%的识别准确率,优于使用单一卷积核的神经网,在相同的网络结构下,基于听觉谱图得到的识别准确率较基于梅尔频率倒谱系数(MFCC)、语谱图分别高出9.11%、3.54%,且对打击乐器与同族乐器的错分率均较小。

关键词: 听觉谱图, 卷积神经网络, 卷积核, 时频特征, 乐器识别

Abstract:

Aiming at the problem that traditional musical instrument identification depends on feature selection and elementary acoustical feature,a 5-layer Convolutional Neural Network(CNN) extracting high-level time-frequency information of timbre layer by layer is proposed,whose input is auditory spectrum containing harmonic information and close to human perception.The mono convolution kernel of first layer is improved by multi-scale kernel of time and frequency axises to effectively extract time-frequency information from auditory spectrum.Experimental results on IOWA database show that using the improved multi-scale convolution kernel can achieve 96.95% recognition accuracy,which is better than using a mono convolution kernel.Under the same network structure,the recognition accuracy obtained by using the auditory spectrum is 9.11% and 3.54% higher than the Mel-Frequency Cepstral Coefficient (MFCC) and spectrogram,respectively,and the misclassification rate of percussion instruments and kindred instruments are 2% and 3.1%,which are less than MFCC and spectrogram.

Key words: auditory spectrum, Convolutional Neural Network(CNN), convolution kernel, time-frequency feature, musical instrument identification

中图分类号: