1 |
LOIZOU P C. Speech enhancement: theory and practice[M]. [S. l.]: CRC Press, 2013.
|
2 |
张雄伟, 李毅豪, 孙蒙, 等. 单通道语音增强中深度学习方法研究现状与展望. 陆军工程大学学报, 2022, (5): 1- 12.
URL
|
|
ZHANG X W, LI Y H, SUN M, et al. Methods of deep learning in monaural speech enhancement: state of art and prospects. Journal of Army Engineering University of PLA, 2022, (5): 1- 12.
URL
|
3 |
BOLL S. Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1979, 27 (2): 113- 120.
doi: 10.1109/TASSP.1979.1163209
|
4 |
ZALEVSKY Z, MENDLOVIC D. Fractional Wiener filter. Applied Optics, 1996, 35 (20): 3930- 3936.
doi: 10.1364/AO.35.003930
|
5 |
EPHRAIM Y, MALAH D. Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1984, 32 (6): 1109- 1121.
doi: 10.1109/TASSP.1984.1164453
|
6 |
EPHRAIM Y, VAN TREES H L. A signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing, 1995, 3 (4): 251- 266.
doi: 10.1109/89.397090
|
7 |
XU Y, DU J, DAI L R, et al. A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2014, 23 (1): 7- 19.
doi: 10.1109/TASLP.2014.2364452
|
8 |
HU Y, LIU Y, LÜ S, et al. DCCRN: deep complex convolution recurrent network for phase-aware speech enhancement[EB/OL]. [2023-06-11]. https://arxiv.org/abs/2008.00264.
|
9 |
TAN K, WANG D L. Learning complex spectral mapping with gated convolutional recurrent networks for monaural speech enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020, 28, 380- 390.
doi: 10.1109/TASLP.2019.2955276
|
10 |
DEFOSSEZ A, SYNNAEVE G, ADI Y. Real time speech enhancement in the waveform domain[EB/OL]. IEEE Access, 2020, 8: 48464-48476.
|
11 |
|
12 |
FU S W, LIAO C F, TSAO Y, et al. MetricGAN: generative adversarial networks based black-box metric scores optimization for speech enhancement[C]∥Proceedings of International Conference on Machine Learning. Washington D. C., USA: IEEE Press, 2019: 2031-2041.
|
13 |
沈梦强, 于文年, 易黎, 等. 基于GAN的全时间尺度语音增强方法. 计算机工程, 2023, 49 (6): 115-122, 130.
URL
|
|
SHEN M Q, YU W N, YI L, et al. Full-time scale speech enhancement method based on GAN. Computer Engineering, 2023, 49 (6): 115-122, 130.
URL
|
14 |
WANG D L. On ideal binary mask as the computational goal of auditory scene analysis. Berlin, Germany: Springer, 2005.
|
15 |
NARAYANAN A, WANG D L. Ideal ratio mask estimation using deep neural networks for robust speech recognition[C]∥Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. Washington D. C., USA: IEEE Press, 2013: 7092-7096.
|
16 |
WILLIAMSON D S, WANG Y X, WANG D L. Complex ratio masking for monaural speech separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016, 24 (3): 483- 492.
doi: 10.1109/TASLP.2015.2512042
|
17 |
VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]∥Proceedings of the 31st International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2017: 6000-6010.
|
18 |
PARK H J, KANG B H, SHIN W, et al. MANNER: multi-view attention network for noise erasure[C]∥Proceedings of 2022 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP). Washington D. C., USA: IEEE Press, 2022: 7842-7846.
|
19 |
WANG K, HE B B, ZHU W P. TSTNN: two-stage Transformer based neural network for speech enhancement in the time domain[C]∥Proceedings of 2021 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP). Washington D. C., USA: IEEE Press, 2021: 7098-7102.
|
20 |
沈学利, 田桂源, 姜彦吉, 等. 基于双阶段Conv-Transformer的时频域语音增强算法. 计算机工程, 2023, 49 (6): 123- 130.
doi: 10.19678/j.issn.1000-3428.0064966
|
|
SHEN X L, TIAN G Y, JIANG Y J, et al. Time-frequency domain speech enhancement algorithm based on dual-stage Conv-Transformer. Computer Engineering, 2023, 49 (6): 123- 130.
doi: 10.19678/j.issn.1000-3428.0064966
|
21 |
ZHAO S K, MA B, WATCHARASUPAT K N, et al. FRCRN: boosting feature representation using frequency recurrence for monaural speech enhancement[EB/OL]. [2023-06-11]. https://arxiv.org/abs/2206.07293.
|
22 |
WANG K P, LU W J, LIU P, et al. Multi-stage attention network for monaural speech enhancement. IET Signal Processing, 2023, 17 (3): e12182.
doi: 10.1049/sil2.12182
|
23 |
WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]∥Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 3-19.
|
24 |
|
25 |
BRAUN S, TASHEV I. A consolidated view of loss functions for supervised deep learning-based speech enhancement[EB/OL]. [2023-06-11]. https://arxiv.org/abs/2009.12286.
|
26 |
DING L, TANG H, BRUZZONE L. LANet: local attention embedding to improve the semantic segmentation of remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 2021, 59 (1): 426- 435.
doi: 10.1109/TGRS.2020.2994150
|
27 |
DAI Y M, GIESEKE F, OEHMCKE S, et al. Attentional feature fusion[C]∥Proceedings of IEEE Winter Conference on Applications of Computer Vision(WACV). Washington D. C., USA: IEEE Press, 2021: 3560-3569.
|
28 |
LU Y P, LI Z H, HE D, et al. Understanding and improving Transformer from a multi-particle dynamic system point of view[EB/OL]. [2023-06-11]. https://arxiv.org/abs/1906.02762.
|
29 |
|
30 |
BRAUN S, GAMPER H, REDDY C K A, et al. Towards efficient models for real-time deep noise suppression[C]∥Proceedings of 2021 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP). Washington D. C., USA: IEEE Press, 2021: 656-660.
|
31 |
VALENTINI-BOTINHAO C. Noisy speech database for training speech enhancement algorithms and TTS models[J]. Edinburgh, UK: University of Edinburgh, 2017.
|
32 |
HU Y, LOIZOU P C. Subjective comparison and evaluation of speech enhancement algorithms. Speech Communication, 2007, 49 (7): 588- 601.
URL
|
33 |
HU Y, LOIZOU P C. Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 2008, 16 (1): 229- 238.
doi: 10.1109/TASL.2007.911054
|
34 |
RIX A W, BEERENDS J G, HOLLIER M P, et al. Perceptual Evaluation of Speech Quality(PESQ): a new method for speech quality assessment of telephone networks and codecs[C]∥Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. Washington D. C., USA: IEEE Press, 2001: 749-752.
|
35 |
TAAL C H, HENDRIKS R C, HEUSDENS R, et al. An algorithm for intelligibility prediction of time-frequency weighted noisy speech. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19 (7): 2125- 2136.
doi: 10.1109/TASL.2011.2114881
|
36 |
|
37 |
YIN D C, LUO C, XIONG Z W, et al. PHASEN: a phase-and-harmonics-aware speech enhancement network[C]∥Proceedings of AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2020: 9458-9465.
|