[1] ROSE R C, REYNOLDS D A.Text independent speaker identification using automatic acoustic segmentation[C]//Proceedings of International Conference on Acoustics, Speech, and Signal Processing.Washington D.C., USA:IEEE Press, 2002:293-296. [2] REYNOLDS D A, QUATIERI T F, DUNN R B.Speaker verification using adapted Gaussian mixture models[J].Digital Signal Processing, 2000, 10(1/2/3):19-41. [3] CORTES C, VAPNIK V.Support-vector networks[J].Machine Learning, 1995, 20(3):273-297. [4] KENNY P, BOULIANNE G, OUELLET P, et al.Joint factor analysis versus eigenchannels in speaker recognition[J].IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(4):1435-1447. [5] DEHAK N, KENNY P J, DEHAK R, et al.Front-end factor analysis for speaker verification[J].IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(4):788-798. [6] PRINCE S J D, ELDER J H.Probabilistic linear discriminant analysis for inferences about identity[C]//Proceedings of the 11th International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2007:1-8. [7] AWAD M, KHANNA R.Efficient learning machines[M].Berkeley, USA:Apress, 2015. [8] LECUN Y, BOSER B, DENKER J S, et al.Backpropagation applied to handwritten zip code recognition[J].Neural Computation, 1989, 1(4):541-551. [9] GRAVES A, MOHAMED A R, HINTON G.Speech recognition with deep recurrent neural networks[C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing.Washington D.C., USA:IEEE Press, 2013:6645-6649. [10] LI C, MA X K, JIANG B, et al.Deep speaker:an end-to-end neural speaker embedding system[EB/OL].[2022-03-11].https://arxiv.org/abs/1705.02304. [11] HE K M, ZHANG X Y, REN S Q, et al.Deep residual learning for image recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2016:770-778. [12] 吴震东, 潘树诚, 章坚武.基于CNN的连续语音说话人声纹识别[J].电信科学, 2017, 33(3):59-66. WU Z D, PAN S C, ZHANG J W.Continuous speech speaker recognition based on CNN[J].Telecommunications Science, 2017, 33(3):59-66.(in Chinese) [13] TORFI A, DAWSON J, NASRABADI N M.Text-independent speaker verification using 3D convolutional neural networks[C]//Proceedings of IEEE International Conference on Multimedia and Expo.Washington D.C., USA:IEEE Press, 2018:1-6. [14] YADAV S, RAI A.Frequency and temporal convolutional attention for text-independent speaker recognition[C]//Proceedings of 2020 IEEE International Conference on Acoustics, Speech and Signal Processing.Washington D.C., USA:IEEE Press, 2020:6794-6798. [15] WOO S, PARK J, LEE J Y, et al.CBAM:convolutional block attention module[M].Berlin, Germany:Springer, 2018. [16] 王鹏程, 崔敏, 李剑, 等.基于深度学习的小样本声目标识别方法[J].计算机测量与控制, 2021, 29(4):217-221. WANG P C, CUI M, LI J, et al.Small sample acoustic target recognition method based on deep learning[J].Computer Measurement & Control, 2021, 29(4):217-221.(in Chinese) [17] WANG P Q, CHEN P F, YUAN Y, et al.Understanding convolution for semantic segmentation[C]//Proceedings of IEEE Winter Conference on Applications of Computer Vision.Washington D.C., USA:IEEE Press, 2018:1451-1460. [18] SCHUSTER M, PALIWAL K K.Bidirectional recurrent neural networks[J].IEEE Transactions on Signal Processing, 1997, 45(11):2673-2681. [19] DAVIS S, MERMELSTEIN P.Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences[J].IEEE Transactions on Acoustics, Speech, and Signal Processing, 1980, 28(4):357-366. [20] WANG J, LI L T, WANG D, et al.Research on generalization property of time-varying FBank-weighted MFCC for i-vector based speaker verification[C]//Proceedings of the 9th International Symposium on Chinese Spoken Language Processing.Washington D.C., USA:IEEE Press, 2014:423. [21] SZEGEDY C, LIU W, JIA Y Q, et al.Going deeper with convolutions[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2015:1-9. [22] YIN W, SCHÜTZE H.Multichannel variable-size convolution for sentence classification[EB/OL].[2022-03-11].https://arxiv.org/abs/1603.04513. [23] 李昊轩.基于深度学习的音频事件分类研究[D].北京:北京邮电大学, 2020. LI H X.Research on audio event classification based on deep learning[D].Beijing:Beijing University of Posts and Telecommunications, 2020.(in Chinese) [24] HOCHREITER S, SCHMIDHUBER J.Long short-term memory[J].Neural Computation, 1997, 9(8):1735-1780. [25] DEHAK N, DEHAK R, GLASS J, et al.Cosine similarity scoring without score normalization techniques[C]//Proceedings of Speaker and Language Recognition Workshop.Berlin, Germany:Springer, 2010:71-75. [26] PANAYOTOV V, CHEN G G, POVEY D, et al.Librispeech:an ASR corpus based on public domain audio books[C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing.Washington D.C., USA:IEEE Press, 2015:5206-5210. [27] BU H, DU J Y, NA X Y, et al.AISHELL-1:an open-source Mandarin speech corpus and a speech recognition baseline[C]//Proceedings of the 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment.Washington D.C., USA:IEEE Press, 2018:1-5. [28] SCHROFF F, KALENICHENKO D, PHILBIN J.FaceNet:a unified embedding for face recognition and clustering[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2015:815-823. [29] DELGADO H, EVANS N, KINNUNEN T, et al.ASVspoof 2021:automatic speaker verification spoofing and countermeasures challenge evaluation plan[EB/OL].[2022-03-11].https://arxiv.org/abs/2109.00535. [30] 殷兵.NIST说话人识别评测进展综述[C]//第一届全国声像资料检验鉴定技术交流会议论文集.北京:中国感光学会, 2011. YIN B.Overview of NIST speaker recognition evaluation progress[C]//Proceedings of the 1st National Conference on Inspection and Identification of Audiovisual Data.Beijing:China Photographic Society, 2011.(in Chinese) [31] MARTIN A, PRZYBOCKI M.The NIST 1999 speaker recognition evaluation-an overview[J].Digital Signal Processing, 2000, 10(1/2/3):1-18. [32] 李富强, 万红, 黄俊杰.基于MATLAB的语谱图显示与分析[J].微计算机信息, 2005, 21(10X):172-176. LI F Q, WAN H, HUANG J J.The display and analysis of sonogram based on MATLAB[J].Microcomputer Information, 2005, 21(10X):172-176.(in Chinese). |