基于改进卷积神经网络与听觉谱图的乐器识别

doi:10.19678/j.issn.1000-3428.0049401

计算机工程 ›› 2019, Vol. 45 ›› Issue (1): 199-205. doi: 10.19678/j.issn.1000-3428.0049401

基于改进卷积神经网络与听觉谱图的乐器识别

王飞,于凤芹

江南大学物联网工程学院,江苏无锡 214100

收稿日期:2017-11-21 出版日期:2019-01-15 发布日期:2019-01-15
作者简介:王飞(1991—),男,硕士研究生,主研方向为语音信号分析处理、深度学习;于凤芹,教授、博士
基金资助:
国家自然科学基金(61703185)

Musical Instrument Identification Based on Improved Convolutional Neural Network and Auditory Spectrum

WANG Fei,YU Fengqin

School of Internet of Things Engineering,Jiangnan University,Wuxi,Jiangsu 214100,China

Received:2017-11-21 Online:2019-01-15 Published:2019-01-15

摘要/Abstract

摘要：

针对传统乐器识别需要音乐的低级声频特征及识别性能依赖特征选取的问题,利用接近人耳感知且低冗余度的听觉谱图作为5层深度卷积网络的输入,逐层抽象出音色的高级时频表示用于乐器识别。为有效捕获听觉谱图中的时频信息,将卷积网络第1层矩形卷积核改进为频率、时间轴上的多尺度卷积核。在IOWA乐器库上进行的仿真实验结果表明,该神经网能获得96.95%的识别准确率,优于使用单一卷积核的神经网,在相同的网络结构下,基于听觉谱图得到的识别准确率较基于梅尔频率倒谱系数(MFCC)、语谱图分别高出9.11%、3.54%,且对打击乐器与同族乐器的错分率均较小。

关键词: 听觉谱图, 卷积神经网络, 卷积核, 时频特征, 乐器识别

Abstract:

Aiming at the problem that traditional musical instrument identification depends on feature selection and elementary acoustical feature,a 5-layer Convolutional Neural Network(CNN) extracting high-level time-frequency information of timbre layer by layer is proposed,whose input is auditory spectrum containing harmonic information and close to human perception.The mono convolution kernel of first layer is improved by multi-scale kernel of time and frequency axises to effectively extract time-frequency information from auditory spectrum.Experimental results on IOWA database show that using the improved multi-scale convolution kernel can achieve 96.95% recognition accuracy,which is better than using a mono convolution kernel.Under the same network structure,the recognition accuracy obtained by using the auditory spectrum is 9.11% and 3.54% higher than the Mel-Frequency Cepstral Coefficient (MFCC) and spectrogram,respectively,and the misclassification rate of percussion instruments and kindred instruments are 2% and 3.1%,which are less than MFCC and spectrogram.

Key words: auditory spectrum, Convolutional Neural Network(CNN), convolution kernel, time-frequency feature, musical instrument identification

中图分类号:

TP391

王飞,于凤芹. 基于改进卷积神经网络与听觉谱图的乐器识别[J]. 计算机工程, 2019, 45(1): 199-205.

WANG Fei,YU Fengqin. Musical Instrument Identification Based on Improved Convolutional Neural Network and Auditory Spectrum[J]. Computer Engineering, 2019, 45(1): 199-205.

http://www.ecice06.com/CN/Y2019/V45/I1/199

参考文献

［1］STURM B L.The state of the art ten years after a state of the art:future research in music information retrieval［J］.Journal of New Music Research,2014,43(2):147-172.
［2］BHALKE D G,RAO C B R,BORMANE D S.Automatic musical instrument classification using fractional Fourier transform based-MFCC features and counter propagation neural network［J］.Journal of Intelligent Information Systems,2016,46(3):1-22.
［3］YU L F,SU L,YANG Y H.Sparse cepstral codes and power scale for instrument identification［C］//Proceedings of IEEE International Conference on Acoustics Speech and Signal Processing.Washington D.C.,USA:IEEE Press,2014:7460-7464.
［4］ISNARD V,SUIED C,LEMAITRE G.Auditory bubbles reveal sparse time-frequency cues subserving identification of musical voices and instruments［J］.Journal of the Acoustical Society of America,2017,140(4):3267.
［5］BURRED J J,ROBEL A,SIKORA T.Dynamicspectral envelope modeling for timbre analysis of musical instrument sounds［J］.IEEE Transactions on Audio Speech and Language Processing,2010,18(3):663-674.
［6］HU Y,LIU G.Instrument identification and pitch estimation in multi-timbre polyphonic musical signals based on probabilistic mixture model decomposition［J］.Journal of Intelligent Information Systems,2013,40(1):141-158.
［7］ARORA V,BEHERA L.Instrument identification using PLCA over stretched manifolds［C］//Proceedings of the 12th National Conference on Communications.Washington D.C.,USA:IEEE Press,2014:1-5.
［8］BENGIO Y,COURVILLE A,VINCENT P.Representation learning:a review and new perspectives［J］.IEEE Transactions on Pattern Analysis and Machine Intelligence,2012,35(8):1798-1828.
［9］李彧晟,王芳,朱雨倩,等.基于深度置信网络的中国传统乐器分类方法:CN106328121A［P］.［2017-01-11］.
［10］HAN Y,KIM J,LEE K,et al.Deep convolutional neural networks for predominant instrument recognition in polyphonic music［J］.IEEE/ACM Transactions on Audio Speech and Language Processing,2017,25(1):208-221.
［11］PEETERS G,GIORDANO B,SUSINI P,et al.The timbre toolbox:audio descriptors of musical signals［J］.Journal of the Acoustical Society of America,2011,130(5):2902-2916.
［12］PONS J,SERRA X.Designing efficient architectures for modeling temporal features with convolutional neural networks［C］//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.Washington D.C.,USA:IEEE Press,2017:2472-2476.
［13］MEDDIS R,LOPEZPOVEDA E,FAY R R,et al.Computational models of the auditory system［M］.Berlin,Germany:Springer,2010:135-149.
［14］KALCHBRENNER N,GREFENSTETTE E,BLUNSOM P.A convolutional neural network for modelling sentences［EB/OL］.［2017-11-11］.https://arxiv.org/pdf/1404.2188.pdf.
［15］李乐,王玉英,李小霞.一种改进的小波能量熵语音端点检测算法［J］.计算机工程,2017,43(5):268-274.
［16］University of IOWA electronic music studio:a musical instrument database［EB/OL］.［2017-11-09］.http://theremin.music.uiowa.edu/ MISflute.html.
［17］Google Inc:Tensorflow for deep learning ［EB/OL］.［2017-11-09］.https://www.tensorflow.org.
［18］HE K,ZHANG X,REN S,et al.Delving deep into rectifiers:surpassing human-level performance on imagenet classification［C］//Proceedings of 2015 IEEE International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2015:1026-1034.
［19］GLOROT X,BENGIO Y.Understanding the difficulty of training deep feedforward neural networks［J］.Journal of Machine Learning Research,2010,9:249-256.
［20］KINGMA D P,BA J.Adam:a method for stochastic optimization［EB/OL］.［2017-11-11］ http://cn.arxiv.org/pdf/1412.6980v9.
［21］HINTON G E,SRIVASTAVA N,KRIZHEVSKY A,et al.Improving neural networks by preventing co-adaptation of feature detectors［J］.Computer Science,2012,3(4):212-223.
［22］MAATEN L V D,HINTON G.Viualizing data using t-SNE［J］.Journal of Machine Learning Research,2008,9(2605):2579-2605.

[1]	曹坪, 杨怀志, 薄一军, 尤嘉, 张淳杰, 李丹勇. 面向低质量裂缝图像的多知识蒸馏分类[J]. 计算机工程, 2023, 49(7): 204-213.
[2]	白明昌. 基于折叠路径聚合的属性网络节点嵌入方法[J]. 计算机工程, 2023, 49(7): 76-84.
[3]	代祖华, 刘园园, 狄世龙. 语义增强的图神经网络方面级文本情感分析[J]. 计算机工程, 2023, 49(6): 71-80.
[4]	沈学利, 田桂源, 姜彦吉, 马琳琳. 基于双阶段Conv-Transformer的时频域语音增强算法[J]. 计算机工程, 2023, 49(6): 123-130.
[5]	丁子轩, 俞雷, 张娟, 李想, 王新宇. 基于深度残差自适应注意力网络的图像超分辨率重建[J]. 计算机工程, 2023, 49(5): 231-238.
[6]	陈治旭, 靳雁霞, 芦烨, 杨晶, 刘亚变, 史志儒. 基于子图卷积神经网络的多精度服装建模方法[J]. 计算机工程, 2023, 49(4): 174-181.
[7]	徐康, 李霏, 姬东鸿. 结合依存图卷积与文本片段搜索的方面情感三元组抽取[J]. 计算机工程, 2023, 49(4): 61-67.
[8]	衡红军, 苗菁. 语义与句法信息加强的二元标记实体关系联合抽取[J]. 计算机工程, 2023, 49(4): 77-84.
[9]	钟宝荣, 吴夏灵. 基于高分辨率网络的轻量型人体姿态估计研究[J]. 计算机工程, 2023, 49(4): 226-232,239.
[10]	杨晶晶, 谢海燕, 薛妮妮, 张傲明. 基于双通道残差网络的水下图像去噪研究[J]. 计算机工程, 2023, 49(4): 188-198.
[11]	刘晶晶, 黄浩. 引入非局部模块卷积神经网络的基频提取模型[J]. 计算机工程, 2023, 49(3): 128-133,160.
[12]	邹长龙, 安敬民, 李冠宇. 基于邻域聚合与CNN的知识图谱实体类型补全[J]. 计算机工程, 2023, 49(3): 134-141.
[13]	翟社平, 张宇航, 柏晓夏. 融合实体邻域信息的知识图谱嵌入负采样方法[J]. 计算机工程, 2023, 49(3): 95-104.
[14]	程小辉, 李钰, 康燕萍. 基于中间图特征提取的卷积网络双标准剪枝[J]. 计算机工程, 2023, 49(3): 105-112.
[15]	陈柏霖, 王天极, 任丽娜, 黄瑞章. 融合ELECTRA和文本局部信息的中文语法错误检测方法[J]. 计算机工程, 2023, 49(3): 304-311.

选择文件类型/文献管理软件名称

选择包含的内容

基于改进卷积神经网络与听觉谱图的乐器识别

Musical Instrument Identification Based on Improved Convolutional Neural Network and Auditory Spectrum

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于改进卷积神经网络与听觉谱图的乐器识别

Musical Instrument Identification Based on Improved Convolutional Neural Network and Auditory Spectrum

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价