| 1 |
洪青阳, 李琳. 语音识别: 原理与应用. 北京: 电子工业出版社, 2020.
|
|
HONG Q Y , LI L . Principle and application of speech recognition. Beijing: Publishing House of Electronics Industry, 2020.
|
| 2 |
DAHL G E , YU D , DENG L , et al. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on Audio Speech and Language Processing, 2012, 20 (1): 30- 42.
doi: 10.1109/TASL.2011.2134090
|
| 3 |
GRAVES A, MOHAMED A R, HINTON G. Speech recognition with deep recurrent neural networks[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Washington D.C., USA: IEEE Press, 2013: 6645-6649.
|
| 4 |
GRAVES A, FERNÁNDEZ S, GOMEZ F, et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks[C]//Proceedings of the 23rd International Conference on Machine Learning. New York, USA: ACM Press, 2006: 369-376.
|
| 5 |
|
| 6 |
CHAN W, JAITLY N, LE Q, et al. Listen, attend and spell: a neural network for large vocabulary conversational speech recognition[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D.C., USA: IEEE Press, 2016: 4960-4964.
|
| 7 |
WANG D , WANG X , LÜ S . An overview of end-to-end automatic speech recognition. Symmetry, 2019, 11 (8): 1018.
doi: 10.3390/sym11081018
|
| 8 |
DONG L H, XU S, XU B. Speech-Transformer: a no-recurrence sequence-to-sequence model for speech recognition[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D.C., USA: IEEE Press, 2018: 5884-5888.
|
| 9 |
ZHOU S Y , DONG L H , XU S , et al. A comparison of modeling units in sequence-to-sequence speech recognition with the Transformer on mandarin Chinese. Berlin, Germany: Springer International Publishing, 2018.
|
| 10 |
赵德春, 舒洋, 李玲, 等. 丢弃冗余块的语音识别Transformer解码加速方法. 计算机工程, 2023, 49 (10): 105-111, 119.
doi: 10.19678/j.issn.1000-3428.0065685
|
|
ZHAO D C , SHU Y , LI L , et al. Speech recognition Transformer decoding acceleration method with discarding redundant blocks. Computer Engineering, 2023, 49 (10): 105-111, 119.
doi: 10.19678/j.issn.1000-3428.0065685
|
| 11 |
|
| 12 |
BURCHI M, VIELZEUF V. Efficient Conformer: progressive downsampling and grouped attention for automatic speech recognition[C]//Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). Washington D.C., USA: IEEE Press, 2021: 8-15.
|
| 13 |
GAO Z, ZHANG S, MCLOUGHLIN I, et al. Paraformer: fast and accurate parallel Transformer for non-autoregressive end-to-end speech recognition[EB/OL]. [2024-06-08]. https://arxiv.org/abs/2206.08317.
|
| 14 |
PENG Y, DALMIA S, LANE I, et al. Branchformer: parallel MLP-attention architectures to capture local and global context for speech recognition and understanding[C]//Proceedings of the International Conference on Machine Learning. [S. l.]: PMLR Press, 2022: 17627-17643.
|
| 15 |
李宜亭, 屈丹, 杨绪魁, 等. 基于分解门控注意力单元的高效Conformer模型. 计算机工程, 2023, 49 (5): 73- 80.
doi: 10.19678/j.issn.1000-3428.0064687
|
|
LI Y T , QU D , YANG X K , et al. Efficient Conformer model based on factorized gated attention unit. Computer Engineering, 2023, 49 (5): 73- 80.
doi: 10.19678/j.issn.1000-3428.0064687
|
| 16 |
|
| 17 |
胡从刚, 申艺翔, 孙永奇, 等. 基于Conformer的端到端语音识别方法. 计算机应用研究, 2024, 41 (7): 2018- 2024.
|
|
HU C G , SHEN Y X , SUN Y Q , et al. The end-to-end method based on Conformer for speech recognition. Application Research of Computers, 2024, 41 (7): 2018- 2024.
|
| 18 |
|
| 19 |
PARK D S, CHAN W, ZHANG Y, et al. Specaugment: a simple data augmentation method for automatic speech recognition[EB/OL]. [2024-06-08]. https://arxiv.org/abs/1904.08779.
|
| 20 |
LU Y, LI Z, HE D, et al. Understanding and improving Transformer from a multi-particle dynamic system point of view[EB/OL]. [2024-06-08]. https://arxiv.org/abs/1906.02762.
|
| 21 |
WATANABE S, BOYER F, CHANG X K, et al. The 2020 ESPNet update: new features, broadened applications, performance improvements, and future plans[C]//Proceedings of the IEEE Data Science and Learning Workshop (DSLW). Washington D.C., USA: IEEE Press, 2021: 1-6.
|
| 22 |
YAO Z, WU D, WANG X, et al. WeNet: production oriented streaming and non-streaming end-to-end speech recognition toolkit[EB/OL]. [2024-06-08]. https://arxiv.org/abs/2102.01547.
|
| 23 |
HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2016: 770-778.
|
| 24 |
BU H, DU J Y, NA X Y, et al. AISHELL-1: an open-source Mandarin speech corpus and a speech recognition baseline[C]//Proceedings of the 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment. Washington D.C., USA: IEEE Press, 2017: 1-5.
|
| 25 |
REHR R, GERKMANN T. Cepstral noise subtraction for robust automatic speech recognition[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D.C., USA: IEEE Press, 2015: 375-378.
|