[1] BOLL S. Suppression of acoustic noise in speech using spectral subtraction[J]. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1979, 27(2):113-120. [2] YANG Y, LIU P P, ZHOU H L, et al. A speech enhancement algorithm combining spectral subtraction and wavelet transform[C]//Proceedings of the 4th International Conference on Automation, Electronics and Electrical Engineering. Washington D. C., USA:IEEE Press, 2021:268-273. [3] JABLOUN F, CHAMPAGNE B. A multi-microphone signal subspace approach for speech enhancement[C]//Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. Washington D. C., USA:IEEE Press, 2002:205-208. [4] CHEN J D, BENESTY J, HUANG Y T, et al. New insights into the noise reduction Wiener filter[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2006, 14(4):1218-1234. [5] GERKMANN T, HENDRIKS R C. Unbiased MMSE-based noise power estimation with low complexity and low tracking delay[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(4):1383-1393. [6] ISLAM M S, ZHU Y Y, HOSSAIN M I, et al. Supervised single channel dual domains speech enhancement using sparse non-negative matrix factorization[J]. Digital Signal Processing, 2020, 100:102697. [7] 李江和,王玫.一种用于因果式语音增强的门控循环神经网络[J].计算机工程, 2022, 48(11):77-82. LI J H, WANG M. A gated recurrent neural network for causal speech enhancement[J]. Computer Engineering, 2022, 48(11):77-82.(in Chinese) [8] 董宏越,马建芬,张朝霞.基于时域波形映射-频域谐波损失的语音增强[J].计算机工程与设计, 2021, 42(6):1677-1683. DONG H Y, MA J F, ZHANG Z X. Waveform mapping in time domain and harmonic loss in frequency domain based speech enhancement[J]. Computer Engineering and Design, 2021, 42(6):1677-1683.(in Chinese) [9] 袁文浩,时云龙,胡少东,等.一种基于时频域特征融合的语音增强方法[J].计算机工程, 2021, 47(10):75-81. YUAN W H, SHI Y L, HU S D, et al. A speech enhancement approach based on fusion of time-domain and frequency-domain features[J]. Computer Engineering, 2021, 47(10):75-81.(in Chinese) [10] TAN K, WANG D L. Learning complex spectral mapping with gated convolutional recurrent networks for monaural speech enhancement[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020, 28:380-390. [11] 张天骐,罗庆予,方蓉,等.基于信息提炼与残差特征聚合网络的单通道语音增强[J].信号处理, 2023, 39(7):1285-1298. ZHANG T Q, LUO Q Y, FANG R, et al. Single-channel speech enhancement method based on hierarchical refinement and residual feature aggregation network[J]. Journal of Signal Processing, 2023, 39(7):1285-1298.(in Chinese) [12] ZEZARIO R E, FU S W, CHEN F, et al. Deep learning-based non-intrusive multi-objective speech assessment model with cross-domain features[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022, 31:54-70. [13] TESCH K, GERKMANN T. Insights into deep non-linear filters for improved multi-channel speech enhancement[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023, 31:563-575. [14] BORGSTRÖM B J, BRANDSTEIN M S. Speech enhancement via attention masking network (SEAMNET):an end-to-end system for joint suppression of noise and reverberation[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29:515-526. [15] AKTER K, MAMUN N, HOSSAIN M A. A T-F masking based monaural speech enhancement using U-Net architecture[C]//Proceedings of the International Conference on Electrical, Computer and Communication Engineering. Washington D. C., USA:IEEE Press, 2023:1-5. [16] MARTÍN-DOÑAS J M, JENSEN J, TAN Z H, et al. Online multichannel speech enhancement based on recursive EM and DNN-based speech presence estimation[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020, 28:3080-3094. [17] WILLIAMSON D S, WANG Y X, WANG D L. Complex ratio masking for monaural speech separation[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016, 24(3):483-492. [18] ZHAO S K, MA B. D2Former:a fully complex dual-path dual-decoder conformer network using joint complex masking and complex spectral mapping for monaural speech enhancement[C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D. C., USA:IEEE Press, 2023:1-5. [19] WANG K, HE B B, ZHU W P. TSTNN:two-stage transformer based neural network for speech enhancement in the time domain[C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. Washington D. C., USA:IEEE Press, 2021:7098-7102. [20] XIANG X X, ZHANG X J, CHEN H Z. A convolutional network with multi-scale and attention mechanisms for end-to-end single-channel speech enhancement[J]. IEEE Signal Processing Letters, 2021, 28:1455-1459. [21] XIANG X X, ZHANG X J, CHEN H Z. A nested U-Net with self-attention and dense connectivity for monaural speech enhancement[J]. IEEE Signal Processing Letters, 2022, 29:105-109. [22] 金玉堂,王以松,王丽会,等.基于多尺度阶梯时频Conformer GAN的语音增强算法[J].计算机应用, 2023, 43(11):3607-3615. JIN Y T, WANG Y S, WANG L H, et al. Speech enhancement algorithm based on multi-scale ladder-type time-frequency Conformer GAN[J]. Journal of Computer Applications, 2023, 43(11):3607-3615.(in Chinese) [23] YU G C, LI A D, WANG H, et al. DBT-Net:dual-branch federative magnitude and phase estimation with attention-in-attention transformer for monaural speech enhancement[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022, 30:2629-2644. [24] DANG F, CHEN H T, ZHANG P Y. DPT-FSNet:dual-path transformer based full-band and sub-band fusion network for speech enhancement[C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. Washington D. C., USA:IEEE Press, 2022:6857-6861. [25] PANDEY A, WANG D L. Dense CNN with self-attention for time-domain speech enhancement[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29:1270-1279. [26] ZHANG Q Q, QIAN X Y, NI Z H, et al. A time-frequency attention module for neural speech enhancement[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023, 31:462-475. [27] ASHUTOSH P, WANG D. A new framework for CNN-based speech enhancement in the time domain[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2019, 27(7):1179-1188. [28] TAN K, WANG D L. Complex spectral mapping with a convolutional recurrent network for monaural speech enhancement[C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. Washington D. C., USA:IEEE Press, 2019:6865-6869. [29] FAN C H, YI J Y, TAO J H, et al. Gated recurrent fusion with joint training framework for robust end-to-end speech recognition[EB/OL].[2023-05-25]. https://arxiv.org/abs/2011.04249. [30] TAO F, BUSSO C. Gating neural network for large vocabulary audiovisual speech recognition[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2018, 26(7):1290-1302. |