[1] 刘文举, 聂帅, 梁山, 等.基于深度学习语音分离技术的研究现状与进展[J].自动化学报, 2016, 42(6):819-833. LIU W J, NIE S, LIANG S, et al.Deep learning based speech separation technology and its developments[J].Acta Automatica Sinica, 2016, 42(6):819-833.(in Chinese) [2] WANG D, CHEN J.Supervised speech separation based on deep learning:an overview[J].IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2018, 26(10):1702-1726. [3] WANG Y X, NARAYANAN A, WANG D L.On training targets for supervised speech separation[J].IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014, 22(12):1849-1858. [4] XU Y, DU J, DAI L R, et al.An experimental study on speech enhancement based on deep neural networks[J].IEEE Signal Processing Letters, 2014, 21(1):65-68. [5] XU Y, DU J, DAI L R, et al.A regression approach to speech enhancement based on deep neural networks[J].IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015, 23(1):7-19. [6] HUANG P S, KIM M, HASEGAWA-JOHNSON M, et al.Joint optimization of masks and deep recurrent neural networks for monaural source separation[J].IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015, 23(12):2136-2147. [7] WENINGER F, ERDOGAN H, WATANABE S, et al.Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR[C]//Proceedings of the 12th International Conference on Latent Variable Analysis and Signal Separation.Washington D.C., USA:IEEE Press, 2015:91-99. [8] PARK S R, LEE J.A fully convolutional neural network for speech enhancement[EB/OL].[2020-07-15].http://export.arxiv.org/abs/1609.07132 [9] FU S W, TSAO Y, LU X G.SNR-aware convolutional neural network modeling for speech enhancement[EB/OL].[2020-07-15].https://www.researchgate.net/publication/307889660_SNR-Aware_Convolutional_Neural_Network_Modeling_for_Speech_Enhancement. [10] TAN K, CHEN J T, WANG D L.Gated residual networks with dilated convolutions for supervised speech separation[C]//Proceedings of 2018 IEEE International Conference on Acoustics, Speech and Signal Processing.Washington D.C., USA:IEEE Press, 2018:21-25. [11] TAN K, CHEN J T, WANG D L.Gated residual networks with dilated convolutions for monaural speech enhancement[J].IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2019, 27(1):189-198. [12] LI Y, LI X, DONG Y, et al.Densely connected network with time-frequency dilated convolution for speech enhancement[C]//Proceedings of 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing.Washington D.C., USA:IEEE Press, 2019:6860-6864. [13] ZHAO H, ZARAR S, TASHEV I, et al.Convolutional-recurrent neural networks for speech enhancement[C]//Proceedings of 2018 IEEE International Conference on Acoustics, Speech and Signal Processing.Washington D.C., USA:IEEE Press, 2018:2401-2405. [14] TAN K, WANG D L.A convolutional recurrent neural network for real-time speech enhancement[EB/OL].[2020-07-15].http://web.cse.ohio-state.edu/~wang.77/papers/Tan-Wang1.interspeech18.pdf. [15] FU S W, TSAO Y, LU X G, et al.Raw waveform-based speech enhancement by fully convolutional networks[C]//Proceedings of 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference.Washington D.C., USA:IEEE Press, 2017:6-12. [16] FU S W, WANG T W, TSAO Y, et al.End-to-end waveform utterance enhancement for direct evaluation metrics optimization by fully convolutional neural networks[J].IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2018, 26(9):1570-1584. [17] PANDEY A, WANG D L.TCNN:temporal convolutional neural network for real-time speech enhancement in the time domain[C]//Proceedings of 2019 IEEE International Conference on Acoustics, Speech and Signal Processing.Washington D.C., USA:IEEE Press, 2019:6875-6879. [18] PANDEY A, WANG D L.A new framework for supervised speech enhancement in the time domain[J].IEEE/ACM Transactions on Audio, Speech and Language Processing, 2019, 27(7):1179-1188. [19] PANDEY A, WANG D L.A new framework for CNN-based speech enhancement in the time domain[J].IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2019, 27(7):1179-1188. [20] GAROFOLO J S, LAMEL L F, FISHER W M, et al.TIMIT acoustic-phonetic continuous speech corpus[EB/OL].[2020-07-15].https://www.researchgate.net/publication/283617660_TIMIT_Acoustic-Phonetic_Continuous_Speech_Corpus. [21] HU G.100 nonspeech environmental sounds[EB/OL].[2020-07-15].http://web.cse.ohiostate.edu/pnl/corpus/HuNonspeech/HuCorpus.html. [22] VARGA A, STEENEKEN H J M.Assessment for automatic speech recognition:II.NOISEX-92:a database and an experiment to study the effect of additive noise on speech recognition systems[J].Speech Communication, 1993, 12(3):247-251. [23] RIX A W, BEERENDS J G, HOLLIER M P, et al.Perceptual Evaluation of Speech Quality(PESQ)-a new method for speech quality assessment of telephone networks and codecs[C]//Proceedings of 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing.Washington D.C., USA:IEEE Press, 2001:749-752. [24] TAAL C H, HENDRIKS R C, HEUSDENS R, et al.An algorithm for intelligibility prediction of time-frequency weighted noisy speech[J].IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(7):2125-2136. |