1 |
GRAVES A, FERNÁNDEZ S, GOMEZ F, et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks[C]//Proceedings of the 23rd International Conference on Machine Learning. New York, USA: ACM Press, 2006: 369-376.
|
2 |
吕浩田, 马志强, 王洪彬, 等. 基于CNN-CTC的蒙古语层迁移语音识别模型. 中文信息学报, 2022, 36 (6): 52- 60.
URL
|
|
LÜ H T, MA Z Q, WANG H B, et al. CNN-CTC based layer transfer model for Mongolian speech recognition. Journal of Chinese Information Processing, 2022, 36 (6): 52- 60.
URL
|
3 |
GRAVES A, MOHAMED A R, HINTON G. Speech recognition with deep recurrent neural networks[C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. Washington D. C., USA: IEEE Press, 2013: 6645-6649.
|
4 |
LI J Y, ZHAO R, HU H, et al. Improving RNN transducer modeling for end-to-end speech recognition[C]//Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop. Washington D. C., USA: IEEE Press, 2020: 114-121.
|
5 |
CHAN W, JAITLY N, LE Q, et al. Listen, attend and spell: a neural network for large vocabulary conversational speech recognition[C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. Washington D. C., USA: IEEE Press, 2016: 4960-4964.
|
6 |
WATANABE S, HORI T, KIM S, et al. Hybrid CTC/Attention architecture for end-to-end speech recognition. IEEE Journal of Selected Topics in Signal Processing, 2017, 11 (8): 1240- 1253.
doi: 10.1109/JSTSP.2017.2763455
|
7 |
马夺. 基于LAS模型的中英文混杂语音识别研究[D]. 兰州: 西北民族大学, 2020.
|
|
MA D. Research on Chinese-English mixed speech recognition based on LAS model[D]. Lanzhou: Northwest University for Nationalities, 2020. (in Chinese)
|
8 |
LI J Y. Recent advances in end-to-end automatic speech recognition. APSIPA Transactions on Signal and Information Processing, 2022, 11 (1): 1- 27.
|
9 |
ZHANG B, WU D, YAO Z, et al. Unified streaming and non-streaming two-pass end-to-end model for speech recognition[EB/OL]. [2022-08-05]. https://arxiv.org/abs/2012.05481.
|
10 |
MIAO H R, CHENG G F, ZHANG P Y, et al. Online hybrid CTC/Attention end-to-end automatic speech recognition architecture. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020, 28, 1452- 1465.
doi: 10.1109/TASLP.2020.2987752
|
11 |
VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2017: 6000-6010.
|
12 |
DONG L H, XU S, XU B. Speech-Transformer: a no-recurrence sequence-to-sequence model for speech recognition[C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. Washington D. C., USA: IEEE Press, 2018: 5884-5888.
|
13 |
SONG X C, WU Z Y, HUANG Y H, et al. Non-autoregressive Transformer ASR with CTC-enhanced decoder input[C]//Proceedings of 2021 IEEE International Conference on Acoustics, Speech and Signal Processing. Washington D. C., USA: IEEE Press, 2021: 5894-5898.
|
14 |
WANG N J C, QUAN Z, WANG S, et al. Adding connectionist temporal summarization into Conformer to improve its decoder efficiency for speech recognition[EB/OL]. [2022-08-05]. https://arxiv.org/abs/2204.03889.
|
15 |
WANG Y H, LEE H Y, LEE L S. Segmental audio Word2Vec: representing utterances as sequences of vectors with applications in spoken term detection[C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. Washington D. C., USA: IEEE Press, 2018: 6269-6273.
|
16 |
YAO Z Y, WU D, WANG X, et al. WeNet: production oriented streaming and non-streaming end-to-end speech recognition toolkit[EB/OL]. [2022-08-05]. https://arxiv.org/pdf/2102.01547.pdf.
|
17 |
|
18 |
LEE J, WATANABE S. Intermediate loss regularization for CTC-based speech recognition[C]//Proceedings of 2021 IEEE International Conference on Acoustics, Speech and Signal Processing. Washington D. C., USA: IEEE Press, 2021: 6224-6228.
|
19 |
陈戈, 谢旭康, 孙俊, 等. 使用Conformer增强的混合CTC/Attention端到端中文语音识别. 计算机工程与应用, 2023, 59 (4): 97- 103.
URL
|
|
CHEN G, XIE X K, SUN J, et al. Hybrid CTC/Attention end-to-end Chinese speech recognition enhanced by Conformer. Computer Engineering and Applications, 2023, 59 (4): 97- 103.
URL
|
20 |
|
21 |
O'MALLEY T, NARAYANAN A, WANG Q, et al. A Conformer-based ASR frontend for joint acoustic echo cancellation, speech enhancement and speech separation[C]//Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop. Washington D. C., USA: IEEE Press, 2022: 304-311.
|
22 |
POVEY D, GHOSHAL A, BOULIANNE G, et al. The Kaldi speech recognition toolkit[C]//Proceedings of 2011 IEEE Workshop on Automatic Speech Recognition and Understanding. Washington D. C., USA: IEEE Press, 2011: 15-22.
|
23 |
WATANABE S, HORI T, KARITA S, et al. ESPnet: end-to-end speech processing toolkit[C]//Proceedings of ISCA'18. Washington D. C., USA: IEEE Press, 2018: 2207-2211.
|
24 |
PARK D S, CHAN W, ZHANG Y, et al. SpecAugment: a simple data augmentation method for automatic speech recognition[C]//Proceedings of ISCA'19. Washington D. C., USA: IEEE Press, 2019: 2613-2617.
|
25 |
FAN R C, CHU W, CHANG P, et al. An improved single step non-autoregressive Transformer for automatic speech recognition[C]//Proceedings of ISCA'21. Washington D. C., USA: IEEE Press, 2021: 3715-3719.
|
26 |
BAI Y, YI J Y, TAO J H, et al. Fast end-to-end speech recognition via non-autoregressive models and cross-modal knowledge transferring from BERT. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29, 1897- 1911.
doi: 10.1109/TASLP.2021.3082299
|
27 |
GAO Z F, ZHANG S L, MCLOUGHLIN I, et al. Paraformer: fast and accurate parallel Transformer for non-autoregressive end-to-end speech recognition[C]//Proceedings of ISCA'22. Washington D. C., USA: IEEE Press, 2022: 2063-2067.
|
28 |
WANG Y, LIU R, BAO F, et al. Alignment-learning based single-step decoding for accurate and fast non-autoregressive speech recognition[C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. Washington D. C., USA: IEEE Press, 2022: 8292-8296.
|