[1] BOLL S.Suppression of acoustic noise in speech using spectral subtraction[J].IEEE Transactions on Acoustics,Speech,and Signal Processing,1979,22(4):113-120. [2] KAMATH S,LOIZOU P.A multi-band spectral subtraction method for enhancing speech corrupted by colored noise[C]//Proceedings of International Conference on Acoustics,Speech,and Signal Processing.Washington D.C.,USA:IEEE Press,2011:1-10. [3] SCALART P,FILHO J V.Speech enhancement based on a priori signal to noise estimation[C]//Proceedings of International Conference on Acoustics,Speech,and Signal Processing.Washington D.C.,USA:IEEE Press,2002:629-632. [4] DENDRINOS M,BAKAMIDIS S,CARAYANNIS G.Speech enhancement from noise:a regenerative approach[J].Speech Communication,1991,10(1):45-57. [5] WANG D L.On ideal binary mask as the computational goal of auditory scene analysis[M]//Speech separation by humans and machines.Boston:Kluwer Academic Publishers,2006:181-197. [6] LUO Y,MESGARANI N.Conv-TasNet:surpassing ideal time-frequency magnitude masking for speech separation[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2019,27(8):1256-1266. [7] GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Generative adversarial networks[J].Communications of the ACM,2020,63(11):139-144. [8] PASCUAL S,BONAFONTE A,SERRÀ J.SEGAN:speech enhancement generative adversarial network[EB/OL].[2022-06-10].https://arxiv.org/pdf/1703.09452.pdf. [9] FU S W,LIAO C F,TSAO Y,et al.MetricGAN:generative adversarial networks based black-box metric scores optimization for speech enhancement[EB/OL].[2022-06-10].https://arxiv.org/abs/1905.04874. [10] FU S W,YU C,HSIEH T A,et al.MetricGAN+:an improved version of MetricGAN for speech enhancement[EB/OL].[2022-06-10].https://arxiv.org/abs/2104.03538. [11] 袁文浩,时云龙,胡少东,等.一种基于时频域特征融合的语音增强方法[J].计算机工程,2021,47(10):75-81.YUAN W H,SHI Y L,HU S D,et al.A speech enhancement approach based on fusion of time-domain and frequency-domain features[J].Computer Engineering,2021,47(10):75-81.(in Chinese) [12] STOLLER D,EWERT S,DIXON S.Wave-U-net:a multi-scale neural network for end-to-end audio source separation[EB/OL].[2022-06-10].https://arxiv.org/pdf/1806.03185.pdf. [13] 武瑞沁,陈雪勤,俞杰,等.结合注意力机制的改进U-Net网络在端到端语音增强中的应用[J].声学学报,2022,47(2):266-275.WU R Q,CHEN X Q,YU J,et al.Application of improved U-Net network with attention mechanism in end-to-end speech enhancement[J].Acta Acustica,2022,47(2):266-275.(in Chinese) [14] KIM J,EL-KHAMY M,LEE J.T-GSA:transformer with Gaussian-weighted self-attention for speech enhancement[C]//Proceedings of International Conference on Acoustics,Speech and Signal Processing.Washington D.C.,USA:IEEE Press,2020:6649-6653. [15] YU W W,ZHOU J,WANG H B,et al.SETransformer:speech enhancement transformer[J].Cognitive Computation,2022,14(3):1152-1158. [16] WANG K,HE B B,ZHU W P.TSTNN:two-stage transformer based neural network for speech enhancement in the time domain[C]//Proceedings of International Conference on Acoustics,Speech and Signal Processing.Washington D.C.,USA:IEEE Press,2021:7098-7102. [17] DANG F,CHEN H T,ZHANG P Y.DPT-FSNet:dual-path transformer based full-band and sub-band fusion network for speech enhancement[C]//Proceedings of International Conference on Acoustics,Speech and Signal Processing.Washington D.C.,USA:IEEE Press,2022:6857-6861. [18] GULATI A,QIN J,CHIU C C,et al.Conformer:convolution-augmented transformer for speech recognition[EB/OL].[2022-06-10].https://arxiv.org/abs/2005.08100. [19] LI B,GULATI A,YU J H,et al.A better and faster end-to-end model for streaming ASR[C]//Proceedings of International Conference on Acoustics,Speech and Signal Processing.Washington D.C.,USA:IEEE Press,2021:5634-5638. [20] KIM E,SEO H.SE-Conformer:time-domain speech enhancement using Conformer[C]//Proceedings of Interspeech 2021.[S.l]:ACM Press,2021:2736-2740. [21] CAO R Z,ABDULATIF S,YANG B.CMGAN:conformer-based Metric GAN for speech enhancement[EB/OL].[2022-06-10].https://arxiv.org/abs/2203.15149v1. [22] CHEN S Y,WU Y,CHEN Z,et al.Continuous speech separation with conformer[C]//Proceedings of International Conference on Acoustics,Speech and Signal Processing.Washington D.C.,USA:IEEE Press,2021:5749-5753. [23] RETHAGE D,PONS J,SERRA X.A Wavenet for speech denoising[C]//Proceedings of International Conference on Acoustics,Speech and Signal Processing.Washington D.C.,USA:IEEE Press,2018:5069-5073. [24] VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.New York,USA:ACM Press,2017:1-10. [25] VALENTINI-BOTINHAO C,WANG X,TAKAKI S,et al.Investigating RNN-based speech enhancement methods for noise-robust Text-to-Speech[EB/OL].[2022-06-10].https://www.cstr.ed.ac.uk/downloads/publications/2016/SSW9_Cassia_1.pdf. [26] VEAUX C,YAMAGISHI J,KING S.The Voice Bank corpus:design,collection and data analysis of a large regional accent speech database[C]//Proceedings of International Conference Oriental COCOSDA Held Jointly with Conference on Asian Spoken Language Research and Evaluation.Washington D.C.,USA:IEEE Press,2014:1-4. [27] THIEMANN J,ITO N,VINCENT E.DEMAND:a collection of multi-channel recordings of acoustic noise in diverse environments[C]//Proceedings of Conference on Acoust.New York,USA:[s.n.],2013:1-6. [28] BU H,DU J Y,NA X Y,et al.AISHELL-1:an open-source Mandarin speech corpus and a speech recognition baseline[C]//Proceedings of the 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment.Washington D.C.,USA:IEEE Press,2018:1-5. [29] VARGA A,STEENEKEN H J M.Assessment for automatic speech recognition:II.NOISEX-92:a database and an experiment to study the effect of additive noise on speech recognition systems[J].Speech Communication,1993,12(3):247-251. [30] MACARTNEY C,WEYDE T.Improved speech enhancement with the Wave-U-net[EB/OL].[2022-06-10].https://arxiv.org/pdf/1811.11307.pdf. [31] SONI M H,SHAH N,PATIL H A.Time-frequency masking-based speech enhancement using generative adversarial network[C]//Proceedings of International Conference on Acoustics,Speech and Signal Processing.Washington D.C.,USA:IEEE Press,2018:5039-5043. [32] WANG K,HE B B,ZHU W P.CAUNet:context-aware U-net for speech enhancement in time domain[C]//Proceedings of International Symposium on Circuits and Systems.Washington D.C.,USA:IEEE Press,2021:1-5. |