[1] WANG X, TAKAKI S, YAMAGISHI J.An RNN-based quantized F0 model with multi-tier feedback links for text-to-speech synthesis[C]//Proceedings of INTERSPEECH'17.Washington D.C., USA:IEEE Press, 2017:1059-1063. [2] GHAHREMANI P, BABAALI B, POVEY D, et al.A pitch extraction algorithm tuned for automatic speech recognition[C]//Proceedings of 2014 IEEE International Conference on Acoustics, Speech and Signal Processing.Washington D.C., USA:IEEE Press, 2014:2494-2498. [3] ATAL B S.Automatic speaker recognition based on pitch contours[J].The Journal of the Acoustical Society of America, 1972, 52(6):1687-1697. [4] KATO A, MILNER B.Using hidden Markov models for speech enhancement[C]//Proceedings of INTERSPEECH'14.Washington D.C., USA:IEEE Press, 2014:5695-5699. [5] NOLL A M.Cepstrum pitch determination[J].Journal of Clinical Sleep Medicine, 1967, 41(2):293-309. [6] DUBNOWSKI J, SCHAFER R, RABINER L.Real-time digital hardware pitch detector[J].IEEE Transactions on Acoustics, Speech, and Signal Processing, 1976, 24(1):2-8. [7] ROSS M, SHAFFER H, COHEN A, et al.Average magnitude difference function pitch extractor[J].IEEE Transactions on Acoustics, Speech, and Signal Processing, 1974, 22(5):353-362. [8] TALKIN D.A robust algorithm for pitch tracking[J].Speech Coding and Synthesis, 1995, 44:495-518. [9] BOERSMA P.Praat, a system for doing phonetics by computer[J].Glot International, 2002, 5(9/10):341-345. [10] DE CHEVEIGNÉ A, KAWAHARA H.YIN:a fundamental frequency estimator for speech and music[J].The Journal of the Acoustical Society of America, 2002, 111(4):1917-1930. [11] GONZALEZ S, BROOKES M.A pitch estimation filter robust to high levels of noise[C]//Proceedings of the 19th European Signal Processing Conference.Berlin, Germany:Springer, 2011:451-455. [12] CAMACHO A, HARRIS J G.A sawtooth waveform inspired pitch estimator for speech and music[J].The Journal of the Acoustical Society of America, 2008, 124(3):1638-1652. [13] MAUCH M, DIXON S.PYIN:a fundamental frequency estimator using probabilistic threshold distributions[C]//Proceedings of 2014 IEEE International Conference on Acoustics, Speech and Signal Processing.Washington D.C., USA:IEEE Press, 2014:659-663. [14] GU Y H.HMM-based noisy-speech pitch contour estimation[C]//Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing.Washington D.C., USA:IEEE Press, 1992:21-24. [15] NISHIMOTO T, SAGAYAMA S, KAMEOKA H.Multi-pitch trajectory estimation of concurrent speech based on harmonic GMM and nonlinear Kalman filtering[C]//Proceedings of INTERSPEECH'04.Washington D.C., USA:IEEE Press, 2004:2433-2436. [16] WALMSLEY P J, GODSILL S J, RAYNER P J W.Polyphonic pitch tracking using joint Bayesian estimation of multiple frame parameters[C]//Proceedings of IEEE Workshop on Applications of Signal Processing to Audio & Acoustics.Washington D.C., USA:IEEE Press, 1999:119-122. [17] HAN K, WANG D L.Neural network based pitch tracking in very noisy speech[J].IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014, 22(12):2158-2168. [18] KATO A, KINNUNEN T.Waveform to single sinusoid regression to estimate the F0 contour from noisy speech using recurrent deep neural networks[EB/OL].[2022-01-10].https://arxiv.org/abs/1807.00752. [19] KIM J W, SALAMON J, LI P, et al.Crepe:a convolutional representation for pitch estimation[C]//Proceedings of 2018 IEEE International Conference on Acoustics, Speech and Signal Processing.Washington D.C., USA:IEEE Press, 2018:161-165. [20] ARDAILlON L, ROEBEL A.Fully-convolutional network for pitch estimation of speech signals[C]//Proceedings of INTERSPEECH'19.Washington D.C., USA:IEEE Press, 2019:2005-2009. [21] WANG X, GIRSHICK R, GUPTA A, et al.Non-local neural networks[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:7794-7803. [22] KINGMA D, BA J.Adam:a method for stochastic optimization[C]//Proceedings of International Conference on Learning Representations.Washington D.C., USA:IEEE Press, 2014:564-578. [23] PIRKER G, WOHLMAYR M, PETRIK S, et al.A pitch tracking corpus with evaluation on multipitch tracking scenario[C]//Proceedings of INTERSPEECH'11.Washington D.C., USA:IEEE Press, 2011:1509-1512. [24] LAMEL L F, KASSEL R H, SENEFF S.Speech database development:design and analysis of the acoustic-phonetic corpus[J].Speech Input/Output Assessment and Speech Databases, 1989(2):2161-2170. [25] WANG W J, LU Y M.Analysis of the mean absolute error and the root mean square error in assessing rounding model[J].Materials Science and Engineering, 2018, 324:012049. [26] RABINER L, CHENG M, ROSENBERG A, et al.A comparative performance study of several pitch detection algorithms[J].IEEE Transactions on Acoustics, Speech, and Signal Processing, 1976, 24(5):399-418. [27] VASWANI A, SHAZEER N, PARMAR N, et al.Attention is all you need[C]//Proceedings of Advances in Neural Information Processing System.Cambridge, USA:MIT Press, 2017:30. |