[1] VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.New York,USA:ACM Press,2017:6000-6010. [2] JAIN A,ROUHE A K,GRÖNROOS S A,et al.Finnish ASR with deep Transformer models[EB/OL].[2022-04-12].https://arxiv.org/abs/2003.11562v1. [3] DONG L H,XU S,XU B.Speech-Transformer:a no-recurrence sequence-to-sequence model for speech recognition[C]//Proceedings of 2018 IEEE International Conference on Acoustics,Speech and Signal Processing.Washington D.C.,USA:IEEE Press,2018:5884-5888. [4] GULATI A,QIN J,CHIU C C,et al.Conformer:convolution-augmented transformer for speech recognition[C]//Proceedings of Annual Conference of the International Speech Communication Association.Washington D.C.,USA:IEEE Press,2020:3015-3025. [5] WINATA G I,CAHYAWIJAYA S,LIN Z J,et al.Lightweight and efficient end-to-end speech recognition using low-rank Transformer[C]//Proceedings of 2020 IEEE International Conference on Acoustics,Speech and Signal Processing.Washington D.C.,USA:IEEE Press,2020:6144-6148. [6] WANG X,SUN S N,XIE L,et al.Efficient Conformer with prob-sparse attention mechanism for end-to-end speech recognition[C]//Proceedings of Annual Conference of the International Speech Communication Association.Washington D.C.,USA:IEEE Press,2021:1898-1902. [7] BURCHI M,VIELZEUF V.Efficient Conformer:progressive downsampling and grouped attention for automatic speech recognition[EB/OL].[2022-04-12].https://arxiv.org/abs/2109.01163. [8] CHANG H J,YANG S,LEE H Y.DistilHuBERT:speech representation learning by layer-wise distillation of hidden-unit BERT[EB/OL].[2022-04-12].https://arxiv.org/abs/2110.01900. [9] LÜ Y J,WANG L B,GE M,et al.Compressing Transformer-based ASR model by task-driven loss and attention-based multi-level feature distillation[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.Washington D.C.,USA:IEEE Press,2022:7992-7996. [10] LIN Z,LIU J,YANG Z,et al.Pruning redundant mappings in transformer models via spectral-normalized identity prior[C]//Proceedings of EMNLP'20.Stroudsburg,USA:Association for Computational Linguistics,2020:719-730. [11] QIN Z,SUN W,DENG H,et al.cosFormer:rethinking Softmax in attention[EB/OL].[2022-04-12].https://arxiv.org/abs/2202.08791. [12] ZHOU S Y,DONG L H,XU S,et al.A comparison of modeling units in sequence-to-sequence speech recognition with the transformer on mandarin Chinese[M].Berlin,Germany:Springer,2018. [13] ZHOU S Y,DONG L H,XU S,et al.Syllable-based sequence-to-sequence speech recognition with the transformer in mandarin Chinese[C]//Proceedings of Annual Conference of the International Speech Communication Association.Washington D.C.,USA:IEEE Press,2018:791-795. [14] LI S,XU M,ZHANG X.Conformer-based end-to-end speech recognition with rotary position embedding[EB/OL].[2022-04-12].https://arxiv.org/abs/2107.05907. [15] LIN X F,ZHAO C,PAN W.Towards accurate binary convolutional neural network[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.New York,USA:ACM Press,2017:344-352. [16] TULLOCH A,JIA Y.High performance ultra-low-precision convolutions on mobile devices[EB/OL].[2022-04-12].https://arxiv.org/abs/1712.02427. [17] LUO P,ZHU Z Y,LIU Z W,et al.Face model compression by distilling knowledge from neurons[C]//Proceedings of the 13th AAAI Conference on Artificial Intelligence.New York,USA:ACM Press,2016:3560-3566. [18] NOVIKOV A,PODOPRIKHIN D,OSOKIN A,et al.Tensorizing neural networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems.New York,USA:ACM Press,2015:442-450. [19] KRIMAN S,BELIAEV S,GINSBURG B,et al.Quartznet:deep automatic speech recognition with 1D time-channel separable convolutions[C]//Proceedings of 2020 IEEE International Conference on Acoustics,Speech and Signal Processing.Washington D.C.,USA:IEEE Press,2020:6124-6128. [20] MEHROTRA A,DUDZIAK Ł,YEO J,et al.Iterative compression of end-to-end ASR model using AutoML[C]//Proceedings of Annual Conference of the International Speech Communication Association.Washington D.C.,USA:IEEE Press,2020:3361-3365. [21] LI S,RAJ D,LU X G,et al.Improving Transformer-based speech recognition systems with compressed structure and speech attributes augmentation[C]//Proceedings of Annual Conference of the International Speech Communication Association.Washington D.C.,USA:IEEE Press,2019:4400-4404. [22] MORI T,TJANDRA A,SAKTI S,et al.Compressing end-to-end ASR networks by tensor-train decomposition[C]//Proceedings of Annual Conference of the International Speech Communication Association.Washington D.C.,USA:IEEE Press,2018:806-810. [23] KHODAK M,TENENHOLTZ N,MACKEY L,et al.Initialization and regularization of factorized neural layers[EB/OL].[2022-04-12].https://arxiv.org/abs/2105. 01029. [24] HUA W,DAI Z,LIU H,et al.Transformer quality in linear time[EB/OL].[2022-04-12].https://arxiv.org/abs/2202. 10447. [25] ZAHEER M,GURUGANESH G,DUBEY A,et al.Big bird:Transformers for longer sequences[EB/OL].[2022-04-12].https://arxiv.org/abs/2007.14062. [26] KITAEV N,KAISER Ł,LEVSKAYA A.Reformer:the efficient Transformer[EB/OL].[2022-04-12].https://arxiv.org/abs/2001.04451. [27] TAY Y,BAHRI D,METZLER D,et al.Synthesizer:rethinking self-attention in Transformer models[EB/OL].[2022-04-12].https://arxiv.org/abs/2005.00743. [28] XU M L,LI S Q,ZHANG X L.Transformer-based end-to-end speech recognition with local dense synthesizer attention[C]//Proceedings of 2021 IEEE International Conference on Acoustics,Speech and Signal Processing.Washington D.C.,USA:IEEE Press,2021:5899-5903. [29] WU Z,LIU Z,LIN J,et al.Lite transformer with long-short range attention[EB/OL].[2022-04-12].https://arxiv.org/abs/2004.11886. [30] TITSIAS M K.One-vs-each approximation to softmax for scalable estimation of probabilities[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems.New York,USA:ACM Press,2016:4168-4176. [31] PARK D S,CHAN W,ZHANG Y,et al.SpecAugment:a simple data augmentation method for automatic speech recognition[C]//Proceedings of Annual Conference of the International Speech Communication Association.Washington D.C.,USA:IEEE Press,2019:2613-2617. [32] LIU Y,LI T,ZHANG P,et al.Improved Conformer-based end-to-end speech recognition using neural architecture search[EB/OL].[2022-04-12].https://arxiv.org/abs/2104.05390. [33] YAO Z Y,WU D,WANG X,et al.WeNet:production oriented streaming and non-streaming end-to-end speech recognition toolkit[C]//Proceedings of Annual Conference of the International Speech Communication Association.Washington D.C.,USA:IEEE Press,2021:2093-2097. |