[1] LOIZOU P C. Speech enhancement:theory and practice[M].[S. l.]:CRC Press, 2013. [2] 张雄伟, 李毅豪, 孙蒙, 等. 单通道语音增强中深度学习方法研究现状与展望[J]. 陆军工程大学学报, 2022(5):1-12. ZHANG X W, LI Y H, SUN M, et al. Methods of deep learning in monaural speech enhancement:state of art and prospects[J]. Journal of Army Engineering University of PLA, 2022(5):1-12.(in Chinese) [3] BOLL S. Suppression of acoustic noise in speech using spectral subtraction[J]. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1979, 27(2):113-120. [4] ZALEVSKY Z, MENDLOVIC D. Fractional Wiener filter[J]. Applied Optics, 1996, 35(20):3930-3936. [5] EPHRAIM Y, MALAH D. Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator[J]. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1984, 32(6):1109-1121. [6] EPHRAIM Y, VAN TREES H L. A signal subspace approach for speech enhancement[J]. IEEE Transactions on Speech and Audio Processing, 1995, 3(4):251-266. [7] XU Y, DU J, DAI L R, et al. A regression approach to speech enhancement based on deep neural networks[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2014,23(1):7-19. [8] HU Y, LIU Y, LV S, et al. DCCRN:deep complex convolution recurrent network for phase-aware speech enhancement[EB/OL].[2023-06-11]. https://arxiv.org/abs/2008.00264. [9] TAN K, WANG D L. Learning complex spectral mapping with gated convolutional recurrent networks for monaural speech enhancement[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020, 28:380-390. [10] DEFOSSEZ A, SYNNAEVE G, ADI Y. Real time speech enhancement in the waveform domain[EB/OL]. IEEE Access, 2020, 8:48464-48476. [11] PASCUAL S, BONAFONTE A, SERRA J. SEGAN:speech enhancement generative adversarial network[EB/OL].[2023-06-11]. https://arxiv.org/pdf/1703.09452.pdf. [12] FU S W, LIAO C F, TSAO Y, et al. MetricGAN:generative adversarial networks based black-box metric scores optimization for speech enhancement[C]//Proceedings of International Conference on Machine Learning. Washington D.C., USA:IEEE Press, 2019:2031-2041. [13] 沈梦强, 于文年, 易黎, 等. 基于GAN的全时间尺度语音增强方法[J]. 计算机工程, 2023, 49(6):115-122, 130. SHEN M Q, YU W N, YI L, et al. Full-time scale speech enhancement method based on GAN[J]. Computer Engineering, 2023, 49(6):115-122, 130.(in Chinese) [14] WANG D L. On ideal binary mask as the computational goal of auditory scene analysis[M]. Berlin, Germany:Springer, 2005. [15] NARAYANAN A, WANG D L. Ideal ratio mask estimation using deep neural networks for robust speech recognition[C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. Washington D.C., USA:IEEE Press, 2013:7092-7096. [16] WILLIAMSON D S, WANG Y X, WANG D L. Complex ratio masking for monaural speech separation[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016, 24(3):483-492. [17] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York, USA:ACM Press, 2017:6000-6010. [18] PARK H J, KANG B H, SHIN W, et al. MANNER:multi-view attention network for noise erasure[C]//Proceedings of 2022 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP). Washington D.C., USA:IEEE Press, 2022:7842-7846. [19] WANG K, HE B B, ZHU W P. TSTNN:two-stage Transformer based neural network for speech enhancement in the time domain[C]//Proceedings of 2021 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP). Washington D.C., USA:IEEE Press, 2021:7098-7102. [20] 沈学利, 田桂源, 姜彦吉, 等. 基于双阶段Conv-Transformer的时频域语音增强算法[J]. 计算机工程, 2023, 49(6):123-130. SHEN X L, TIAN G Y, JIANG Y J, et al. Time-frequency domain speech enhancement algorithm based on dual-stage Conv-Transformer[J]. Computer Engineering, 2023, 49(6):123-130.(in Chinese) [21] ZHAO S K, MA B, WATCHARASUPAT K N, et al. FRCRN:boosting feature representation using frequency recurrence for monaural speech enhancement[EB/OL].[2023-06-11]. https://arxiv.org/abs/2206.07293. [22] WANG K P, LU W J, LIU P, et al. Multi-stage attention network for monaural speech enhancement[J]. IET Signal Processing, 2023, 17(3):e12182. [23] WOO S, PARK J, LEE J Y, et al. CBAM:convolutional block attention module[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany:Springer, 2018:3-19. [24] GULATI A, QIN J, CHIU C C, et al. Conformer:convolution-augmented Transformer for speech recognition[EB/OL].[2023-06-11]. https://arxiv.org/abs/2005.08100. [25] BRAUN S, TASHEV I. A consolidated view of loss functions for supervised deep learning-based speech enhancement[EB/OL].[2023-06-11]. https://arxiv.org/abs/2009.12286. [26] DING L, TANG H, BRUZZONE L. LANet:local attention embedding to improve the semantic segmentation of remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 59(1):426-435. [27] DAI Y M, GIESEKE F, OEHMCKE S, et al. Attentional feature fusion[C]//Proceedings of IEEE Winter Conference on Applications of Computer Vision(WACV). Washington D.C., USA:IEEE Press, 2021:3560-3569. [28] LU Y P, LI Z H, HE D, et al. Understanding and improving Transformer from a multi-particle dynamic system point of view[EB/OL].[2023-06-11]. https://arxiv.org/abs/1906.02762. [29] SHAZEER N. GLU variants improve Transformer[EB/OL].[2023-06-11]. https://arxiv.org/abs/2002.05202. [30] BRAUN S, GAMPER H, REDDY C K A, et al. Towards efficient models for real-time deep noise suppression[C]//Proceedings of 2021 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP). Washington D.C., USA:IEEE Press, 2021:656-660. [31] VALENTINI-BOTINHAO C. Noisy speech database for training speech enhancement algorithms and TTS models[J]. Edinburgh, UK:University of Edinburgh, 2017. [32] HU Y, LOIZOU P C. Subjective comparison and evaluation of speech enhancement algorithms[J]. Speech Communication, 2007, 49(7):588-601. [33] HU Y, LOIZOU P C. Evaluation of objective quality measures for speech enhancement[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2008, 16(1):229-238. [34] RIX A W, BEERENDS J G, HOLLIER M P, et al. Perceptual Evaluation of Speech Quality(PESQ):a new method for speech quality assessment of telephone networks and codecs[C]//Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. Washington D.C., USA:IEEE Press, 2001:749-752. [35] TAAL C H, HENDRIKS R C, HEUSDENS R, et al. An algorithm for intelligibility prediction of time-frequency weighted noisy speech[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(7):2125-2136. [36] HANSEN J H L, PELLOM B L. An effective quality evaluation protocol for speech enhancement algorithms[EB/OL].[2023-06-11].https://www.semanticscholar.org/paper/An-effective-quality-evaluation-protocol-for-speech-Hansen-Pellom/497418c70971c8d990e2edf989d6f05675b7c23a. [37] YIN D C, LUO C, XIONG Z W, et al. PHASEN:a phase-and-harmonics-aware speech enhancement network[C]//Proceedings of AAAI Conference on Artificial Intelligence. Palo Alto, USA:AAAI Press, 2020:9458-9465. |