[1] YONG K H,WON Y J,JUN C S,et al.A multi-resolution approach to GAN-based speech enhancement[J].Applied Sciences,2021,11(2):721. [2] XU Y,DU J,DAI L R,et al.An experimental study on speech enhancement based on deep neural networks[J].IEEE Signal Processing Letters,2014,21(1):65-68. [3] 王志杰,张学良.基于双路径循环神经网络的单通道语音增强[J].信号处理,2021,37(10):1872-1879.WANG Z J,ZHANG X L.Single channel speech enhancement based on dual-path recurrent neural network[J].Journal of Signal Processing,2021,37(10):1872-1879.(in Chinese) [4] WANG D L,CHEN J T.Supervised speech separation based on deep learning:an overview[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2018,26:1702-1726. [5] TAN K,WANG D L.Learning complex spectral mapping with gated convolutional recurrent networks for monaural speech enhancement[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2020,28:380-390. [6] ZHANG K,HE S,LI H,et al.DBNet:a dual-branch network architecture processing on spectrum and waveform for single-channel speech enhancement[EB/OL].[2022-05-05].https://arxiv.org/abs/2105.02436. [7] PANDEY A,WANG D.Self-attending RNN for speech enhancement to improve cross-corpus generalization[EB/OL].[2022-05-05].https://arxiv.org/abs/2105.12831. [8] PALIWAL K,WOJCICKI K,SHANNON B.The importance of phase in speech enhancement[J].Speech Communication,2011,53(4):465-494. [9] PASCUAL S,BONAFONTE A,SERRÀ J.SEGAN:speech enhancement generative adversarial network[EB/OL].[2022-05-05].https://arxiv.org/abs/1703.09452. [10] RETHAGE D,PONS J,SERRA X.A wavenet for speech denoising[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.Washington D.C.,USA:IEEE Press,2018:5069-5073. [11] PANDEY A,WANG D L.Densely connected neural network with dilated convolutions for real-time speech enhancement in the time domain[C]//Proceedings of 2020 IEEE International Conference on Acoustics,Speech and Signal Processing.Washington D.C.,USA:IEEE Press,2020:6629-6633. [12] KISHORE V,TIWARI N,PARAMASIVAM P.Improved speech enhancement using TCN with multiple encoder-decoder layers[EB/OL].[2022-05-05].http://www.interspeech2020.org/uploadfile/pdf/Thu-2-11-8.pdf. [13] 袁文浩,孙文珠,夏斌,等.利用深度卷积神经网络提高未知噪声下的语音增强性能[J].自动化学报,2018,44(4):751-759.YUAN W H,SUN W Z,XIA B,et al.Improving speech enhancement in unseen noise using deep convolutional neural network[J].Acta Automatica Sinica,2018,44(4):751-759.(in Chinese) [14] CHOI H S,PARK S,LEE J H,et al.Real-time denoising and dereverberation wtih tiny recurrent U-net[C]//Proceedings of 2021 IEEE International Conference on Acoustics,Speech and Signal Processing.Washington D.C.,USA:IEEE Press,2021:5789-5793. [15] VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[EB/OL].[2022-05-05].https://arxiv.org/abs/1706.03762. [16] KIM J,EL-KHAMY M,LEE J.T-GSA:Transformer with Gaussian-weighted self-attention for speech enhancement[C]//Proceedings of 2020 IEEE International Conference on Acoustics,Speech and Signal Processing.Washington D.C.,USA:IEEE Press,2020:6649-6653. [17] WANG K,HE B B,ZHU W P.CAUNet:context-aware U-net for speech enhancement in time domain[C]//Proceedings of IEEE International Symposium on Circuits and Systems.Washington D.C.,USA:IEEE Press,2021:1-5. [18] YU W,ZHOU J,WANG H,et al.SETransformer:speech enhancement Transformer[J].Cognitive Computation,2022,14(3):1152-1158. [19] HU G,WANG K J,LIU L L.Underwater acoustic target recognition based on depthwise separable convolution neural networks[J].Sensors (Basel,Switzerland),2021,21(4):1429. [20] REN X,ZHANG X,CHEN L,et al.A causal U-Net based neural beamforming network for real-time multi-channel speech enhancement[EB/OL].[2022-05-05].https://www.isca-speech.org/archive/pdfs/interspeech_2021/ren21_interspeech.pdf. [21] VEAUX C,YAMAGISHI J,KING S.The voice bank corpus:design,collection and data analysis of a large regional accent speech database[EB/OL].[2022-05-05].https://www.researchgate.net/publication/261462711_The_voice_bank_corpus_Design_collection_and_data_analysis_of_a_large_regional_accent_speech_database. [22] THIEMANN J,ITO N,VINCENT E.The Diverse Environments Multi-Channel Acoustic Noise Database (DEMAND):a database of multichannel environmental noise recordings[J].Proceedings of Meetings on Acoustics,2013,19:3591-3591. [23] WEN S X,DU J,LEE C H.On generating mixing noise signals with basis functions for simulating noisy speech and learning DNN-based speech enhancement models[C]//Proceedings of the 27th IEEE International Workshop on Machine Learning for Signal Processing.Washington D.C.,USA:IEEE Press,2017:1-6. [24] RIX A W,BEERENDS J G,HOLLIER M P,et al.Perceptual Evaluation of Speech Quality(PESQ)-a new method for speech quality assessment of telephone networks and codecs[C]//Proceedings of IEEE International Conference on Acoustics,Speech,and Signal Processing.Washington D.C.,USA:IEEE Press,2001:749-752. [25] TAAL C H,HENDRIKS R C,HEUSDENS R,et al.An algorithm for intelligibility prediction of time-frequency weighted noisy speech[J].IEEE Transactions on Audio,Speech,and Language Processing,2011,19(7):2125-2136. [26] HU Y,LOIZOU P C.Evaluation of objective quality measures for speech enhancement[J].IEEE Transactions on Audio,Speech,and Language Processing,2008,16(1):229-238. [27] MACARTNEY C,WEYDE T.Improved speech enhancement with the Wave-U-Net[EB/OL].[2022-05-05].https://arxiv.org/abs/1811.11307. [28] DEFOSSEZ A,SYNNAEVE G,ADI Y.Real time speech enhancement in the waveform domain[EB/OL].[2022-05-05].https://arxiv.org/abs/2006.12847. [29] LI A,ZHENG C,ZHANG L,et al.Glance and gaze:a collaborative learning framework for single-channel speech enhancement[J].Applied Acoustics,2022,187:108499. [30] PANAYOTOV V,CHEN G G,POVEY D,et al.Librispeech:an ASR corpus based on public domain audio books[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.Washington D.C.,USA:IEEE Press,2015:5206-5210. [31] VARGA A,STEENEKEN H J M.Assessment for automatic speech recognition II:NOISEX-92:a database and an experiment to study the effect of additive noise on speech recognition systems[J].Speech Communication,1993,12(3):247-251. |