[1] O’SHAUGHNESSY D. Speech Enhancement-A Review of Modern Methods[J]. IEEE Transactions on Human-Machine Systems, 2024, 54(1): 110–120.
[2] AAMIR W, ZAREEN A, SAIRA K, et al. Generative adversarial networks for speech processing: A review[J]. Computer Speech & Language, 2022, 72: 101308.
[3] SURESHKUMAR N, SYED A, FAISUL A, et al. Deep neural networks for speech enhancement and speech recognition: A systematic review[J]. Ain Shams Engineering Journal, 2025, 16(7): 103405.
[4] LUO Y, MESGARANI N. Conv-TasNet: Surpassing ideal time–frequency magnitude masking for speech separation[J]. IEEE-ACM transactions on audio speech and language processing, 2019, 27(08): 1256-1266.
[5] PANDEY A, WANG D. A New Framework for CNN-Based Speech Enhancement in the Time Domain[J]. IEEE-ACM transactions on audio speech and language processing, 2019, 27(07): 1179-1188.
[6] RETHAGE D, PONS J, SERRA X. A wavenet for speech denoising[J]. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, 2018, 5069-5073.
[7] KIM J, El-KHAMY M, LEE J. T-GSA: Transformer with Gaussian-Weighted Self-Attention for Speech Enhancement[J]. IEEE International Conference on Acoustics, Speech and Signal Processing, 2020, 6649–6653.
[8] LI A, LIU W, ZHENG C, et al. Two Heads are Better Than One: A Two-Stage Complex Spectral Mapping Approach for Monaural Speech Enhancement[J]. IEEE-ACM Transactions on Audio, Speech and Language Processing. 2021, 29: 1829-1843.
[9] HAO X, SU X, HORAUD. et al. Fullsubnet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement[J]. IEEE International Conference on Acoustics, Speech and Signal Processing. 2021, 6633-6637.
[10] 吕景刚,彭绍睿,高硕,等.复频域注意力和多尺度频域增强驱动的语音增强网络[J].计算机应用,2025,45(09):2957-2965.
LV J G, PENG S R, GAO S, et al. Speech Enhancement Network Driven by Complex Frequency-Domain Attention and Multi-Scale Frequency-Domain Enhancement [J]. Journal of Computer Applications, 2025, 45(09): 2957-2965.
[11] 张池,王忠,姜添豪,等.基于并行多注意力的语音增强网络[J].计算机工程,2024,50(04):68-77.
ZHANG C, WANG Z, JIANG T H, et al. Speech Enhancement Network Based on Parallel Multi-Attention[J]. Computer Engineering, 2024, 50(04): 68-77.
[12] SHI H, MIMURA M, KAWAHARA T. Waveform-Domain Speech Enhancement Using Spectrogram Encoding for Robust Speech Recognition[J]. IEEE-ACM transactions on audio speech and language processing. 2024, 32: 3049-3060.
[13] KONG Z, PING W, DANTREY A, et al. Speech Denoising in the Waveform Domain With Self-Attention[J]. ICASSP 2022--2022 IEEE International Conference on Acoustics, Speech and Signal Processing. 2022, 7867-7871.
[14] DANG F, CHEN H, ZHANG P. DPT-FSNet: Dual-path transformer based full-band and sub-band fusion network for speech enhancement[J]. ICASSP 2022--2022 IEEE International Conference on Acoustics, Speech and Signal Processing. 2022, 6857-6861.
[15] ZHAO S, NGUYEN T H, MA B. Monaural speech enhancement with complex convolutional block attention module and joint time frequency losses[J]. IEEE International Conference on Acoustics, Speech and Signal Processing. 2021, 6648-6652.
[16] XU S, CAO Y, ZHANG Z, et al. Two-stage Unet with channel and temporal-frequency attention for multi-channel speech enhancement[J]. Speech Communication, 2025, 166: 103154.
[17] 李现国,李滨.基于Transformer和多尺度CNN的图像去模糊[J].计算机工程,2023,49(09):226-233.
LI X G, LI B. Image Deblurring Based on Transformer and Multi-Scale CNN[J]. Computer Engineering, 2023, 49(09): 226-233.
[18] SONI S, YADAV R N, GUPTA L.State-of-the-Art Analysis of Deep Learning-Based Monaural Speech Source Separation Techniques[J]. IEEE ACCESS,2023,11:4242-4269
[19] HU P J, LI X, TIAN Y, et al. Automatic Pancreas Segmentation in CT Images With Distance-Based Saliency-Aware DenseASPP Network[J]. IEEE journal of biomedical and health informatics, 2021, 25(5): 1601-1611.
[20] LATIF S, ZAIDI SAM, S.A.MCUAYAHUITL H, et al. Transformers in speech processing: Overcoming challenges and paving the future[J]. computer science review, 2025, 58: 100768.
[21] XIAO H, LI L, LIU Q, et al. Transformers in medical image segmentation: A review[J]. Biomedical Signal Processing and Control, 2023, 84: 104791.
[22] CHEN H, XU Y, KE D, et al. DDP-Unet: A mapping neural network for single-channel speech enhancement[J]. Computer Speech and Language, 2025, 93: 101795.
[23] OPENSLR. LibriSpeech dataset[DB/OL]. [2025-03-15]. http://www.openslr.org/12/.
[24] PICZAK K J. ESC-50: Dataset for Environmental Sound Classification[DB/OL]. https://github.com/karolpiczak/ESC-50.
[25] PICZAK K J. ESC: Dataset for Environmental Sound Classification[J]. Proceedings of the 23rd Annual ACM Conference on Multimedia, 2015, 1015-1018.
[26] VEAUX C, YAMAGISHI J, KING S. The voice bank corpus: Design, collection and data analysis of a large regional accent speechdatabase[J]. IEEE International Conference Oriental COCOSDAheld jointly with 2013 Conference on Asian Spoken Language Research and Evaluation, 2013, 1-4.
[27]HARISHCHANDRA D, ASHKAN A, VISHAK G, et al. ICASSP 2023 Deep Noise Suppression Challenge[J]. IEEE Open Journal of Signal Processing, 2024, 5: 725-737.
[28] YU G, LI A, ZHENG C, et al. Dual-branch attention-in-attention transformer for single-channel speech enhancement[J]. International Conference on Acoustics, Speech and Signal Processing, 2022, 7847-7851.
[29] HASANNEZHAD M, YU H, ZHU W, et al. PACDNN: A phase-aware composite deep neural network for speech enhancement[J]. Speech Commun, 2022, 136: 1-13.
[30] KOLBAEK M, TAN Z H., JENSEN S H, et al. On Loss Functions for Supervised Monaural Time-Domain Speech Enhancement[J]. IEEE-ACM transactions on audio speech and language processing, 2020, 28: 825-838.
[31] FU S W, YU C, HSIEH T A, et al. MetricGAN+: An improved version of MetricGAN for speech enhancement[J]. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2021, 4: 2778–2782.
[32] ABDULATIF S, CAO R, YANG B. CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement[J]. ACM Transactions on Audio, Speech, and Language Processing, 2024, 32: 2477-2493.
[33] WANG L, WEI W, CHAN Y. D²Net: A Denoising and Dereverberation Network Based on Two-branch Encoder and Dual-path Transformer[J]. Proceedings of 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC, 2022, 1649–1654.
[34] ZHANG Z, DONG Z, Xu W,et al. Reparameterization of Lightweight Transformer for On-Device Speech Emotion Recognition[J]. IEEE Internet of Things Journal, 2025, 12(4): 4169-4182.
[35] CHEN H, ZHANG J, FU Y, et al. TFDense-GAN: a generative adversarial network for single-channel speech enhancement[J]. Eurasip Journal on Advances in Signal Processing, 2025, 10(2025):1-24.
|