端到端说话人辨认的对抗样本应用比较研究

doi:10.19678/j.issn.1000-3428.0058239

计算机工程 ›› 2021, Vol. 47 ›› Issue (6): 132-141. doi: 10.19678/j.issn.1000-3428.0058239

端到端说话人辨认的对抗样本应用比较研究

廖俊帆¹, 顾益军¹, 张培晶², 廖茜¹

1. 中国人民公安大学信息网络安全学院, 北京 102600;
2. 中国人民公安大学网络信息中心, 北京 100038

收稿日期:2020-05-03 修回日期:2020-06-17 发布日期:2020-06-28
作者简介:廖俊帆(1995-),男,硕士研究生,主研方向为对抗样本攻击与防御;顾益军(通信作者),教授、博士;张培晶,副研究员、硕士;廖茜,硕士研究生。

Comparative Research on Application of Adversarial Samples for End-to-End Speaker Identification

LIAO Junfan¹, GU Yijun¹, ZHANG Peijing², LIAO Qian¹

1. College of Information Network Security, People's Public Security University of China, Beijing 102600, China;
2. Network Information Center, People's Public Security University of China, Beijing 100038, China

Received:2020-05-03 Revised:2020-06-17 Published:2020-06-28
Contact: 公安部技术研究计划竞争性遴选项目（2019JZX009）；中国人民公安大学公共安全行为科学研究与技术创新专项。 E-mail:754605668@qq.com

摘要/Abstract

摘要： 为探究对抗样本对端到端说话人辨认系统的安全威胁与攻击效果，比较现有对抗样本生成算法在语音环境下的性能优劣势，分析FGSM、JSMA、BIM、C&W、PGD 5种白盒算法和ZOO、HSJA 2种黑盒算法。将7种对抗样本生成算法在ResCNN和GRU两种网络结构的端到端说话人辨认模型中实现有目标和无目标攻击，并制作音频对抗样本，通过攻击成功率和信噪比等性能指标评估攻击效果并进行人工隐蔽性测试。实验结果表明，现有对抗样本生成算法可在端到端说话人辨认模型中进行实现，白盒算法中的BIM、PGD具有较好的性能表现，黑盒算法的无目标攻击能达到白盒算法的攻击效果，但其有目标攻击性能有待进一步提升。

关键词: 说话人辨认, 对抗样本, 鲁棒性, 对抗攻击, 信噪比

Abstract: In order to explore the security threats and attack effects of the adversarial samples on the end-to-end speaker identification system, this paper analyzes five white box algorithms(FGSM, JSMA, BIM, C&W, PGD) and two black box algorithms(ZOO, HSJA) to compare the advantages and disadvantages of the existing adversarial sample generation algorithms in a phonetic context.Each generation algorithm implements targeted and non-targeted attacks in the end-to-end speaker identification model of ResCNN and GRU, and creates effective audio adversarial samples.Then the attack effects are evaluated by using the performance indicators such as Attack Success Rate(ASR) and Signal to Noise Ratio(SNR).Finally, a manual concealment test is performed.Experimental results show that the existing adversarial sample generation algorithms can be implemented in the end-to-end speaker identification model.The BIM and PGD in the white box generation algorithm have excellent performance.The black box generation algorithm gets non-targeted attacks that are on par with that of the white box generation algorithm, while its targeted attack effect still needs improvement.

Key words: speaker identification, adversarial sample, robustness, adversarial attack, Signal to Noise Ratio(SNR)

中图分类号:

TP391

廖俊帆, 顾益军, 张培晶, 廖茜. 端到端说话人辨认的对抗样本应用比较研究[J]. 计算机工程, 2021, 47(6): 132-141.

LIAO Junfan, GU Yijun, ZHANG Peijing, LIAO Qian. Comparative Research on Application of Adversarial Samples for End-to-End Speaker Identification[J]. Computer Engineering, 2021, 47(6): 132-141.

https://www.ecice06.com/CN/Y2021/V47/I6/132

图/表 11

20210618183352

20210618183355

20210618183358

20210618183403

20210618183407

20210618183411

20210618183414

20210618183418

20210618183421

20210618183425

20210618183429

参考文献

[1] JUNG J W,HEO H S,YANG I H,et al.A complete end-to-end speaker verification system using deep neural networks:from raw signals to verification result[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.Washington D.C.,USA:IEEE Press,2018:5349-5353.
[2] ZHANG C L,KOISHIDA K.End-to-end text-independent speaker verification with triplet loss on short utterances[EB/OL].[2020-04-05].http://m.isca-speech.org/archive/Interspeech_2017/pdfs/1608.PDF.
[3] KINNUNEN T,LI H Z.An overview of text-independent speaker recognition:from features to supervectors[J].Speech Communication,2010,52(1):12-40.
[4] VILLALBA J,CHEN N X,SNYDER D,et al.State-of-the-art speaker recognition with neural network embeddings in NIST SRE18 and speakers in the wild evaluations[J].Computer Speech & Language,2020,60:101026.
[5] FREDRIKSON M,JHA S,RISTENPART T.Model inversion attacks that exploit confidence information and basic countermeasures[C]//Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security.New York,USA:ACM Press,2015:1322-1333.
[6] SZEGEDY C,ZAREMBA W,SUTSKEVER I,et al.Intriguing properties of neural networks[EB/OL].[2020-04-05].https://arxiv.org/pdf/1312.6199.pdf.
[7] YUAN Xiaoyong,HE Pan,ZHU Qili,et al.Adversarial examples:attacks and defenses for deep learning[J].IEEE Transactions on Neural Networks and Learning Systems,2019,30(9):2805-2824.
[8] LUO Yuan,WANG Boyu,CHEN Xu.Research progresses of target detection technology based on deep learning[J].Semiconductor Optoelectronics,2020,41(1):1-10.(in Chinese)罗元,王薄宇,陈旭.基于深度学习的目标检测技术的研究综述[J].半导体光电,2020,41(1):1-10.
[9] KREUK F,ADI Y,CISSE M,et al.Fooling end-to-end speaker verification with adversarial examples[C]//Proceedings of 2018 IEEE International Conference on Acoustics,Speech and Signal Processing.Washington D.C.,USA:IEEE Press,2018:1962-1966.
[10] LI Xu,ZHONG Jinghua,WU Xixin,et al.Adversarial attacks on GMM I-vector based speaker verification systems[C]//Proceedings of 2020 IEEE International Conference on Acoustics,Speech and Signal Processing.Washington D.C.,USA:IEEE Press,2020:6579-6583.
[11] GOODFELLOW I J,SHLENS J,SZEGEDY C.Explaining and harnessing adversarial examples[EB/OL].[2020-04-05].https://arxiv.org/pdf/1412.6572.pdf.
[12] PAPERNOT N,MCDANIEL P,JHA S,et al.The limitations of deep learning in adversarial settings[C]//Proceedings of 2016 IEEE European Symposium on Security and Privacy.Washington D.C.,USA:IEEE Press,2016:372-387.
[13] KURAKIN A,GOODFELLOW I,BENGIO S.Adversarial examples in the physical world[EB/OL].[2020-04-05].https://arxiv.org/pdf/1607.02533.pdf?source=post_page.
[14] CARLINI N,WAGNER D.Towards evaluating the robustness of neural networks[C]//Proceedings of 2017 IEEE Symposium on Security and Privacy.Washington D.C.,USA:IEEE Press,2017:39-57.
[15] MADRY A,MAKELOV A,SCHMIDT L,et al.Towards deep learning models resistant to adversarial attacks[EB/OL].[2020-04-05].https://arxiv.org/pdf/1706.06083.
[16] CHEN P Y,ZHANG H,SHARMA Y,et al.ZOO:zeroth order optimization based black-box attacks to deep neural networks without training substitute models[C]//Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security.New York,USA:ACM Press,2017:15-26.
[17] CHEN J B,JORDAN M I,WAINWRIGHT M J.HopSkipJumpAttack:a query-efficient decision-based adversarial attack[EB/OL].[2020-04-05].https://arxiv.org/abs/1904.02144v1.
[18] MOOSAVI-DEZFOOLI S M,FAWZI A,FROSSARD P.DeepFool:a simple and accurate method to fool deep neural networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2016:2574-2582.
[19] CARLINI N,WAGNER D.Audio adversarial examples:targeted attacks on speech-to-text[C]//Proceedings of 2018 IEEE Security and Privacy Workshops.Washington D.C.,USA:IEEE Press,2018:1-7.
[20] PAPERNOT N,MCDANIEL P,GOODFELLOW I,et al.Practical black-box attacks against machine learning[C]//Proceedings of 2017 ACM on Asia Conference on Computer and Communications Security.New York,USA:ACM Press,2017:506-519.
[21] LAX P D,TERRELL M S.Calculus with applications[M].Berlin,Germany:Springer,2014.
[22] BRENDEL W,RAUBER J,BETHGE M.Decision-based adversarial attacks:reliable attacks against black-box machine learning models[EB/OL].[2020-04-05].https://arxiv.org/pdf/1712.04248.pdf.
[23] LI Chao,MA Xiaokong,JIANG Bing,et al.Deep speaker:an end-to-end neural speaker embedding system[EB/OL].[2020-04-05].https://blog.csdn.net/qq_34755941/article/details/109247992.
[24] BU Hui,DU Jiayu,NA Xingyu,et al.AISHELL-1:an open-source mandarin speech corpus and a speech,recognition baseline[EB/OL].[2020-04-05].https://arxiv.org/pdf/1709.05522.pdf.
[25] PANAYOTOV V,CHEN G G,POVEY D,et al.LIBRISPEECH:an ASR corpus based on public domain audio books[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.Washington D.C.,USA:IEEE Press,2015:19-24.

选择文件类型/文献管理软件名称

选择包含的内容

端到端说话人辨认的对抗样本应用比较研究

Comparative Research on Application of Adversarial Samples for End-to-End Speaker Identification

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	牛瑞婷, 严天峰, 高锐, 王映植. 低信噪比下基于深度学习TCNN-MobileNet的调制识别[J]. 计算机工程, 2024, 50(7): 204-215.
[2]	顾永跟, 高凌轩, 吴小红, 陶杰. 非独立同分布下联邦半监督学习的数据分享研究[J]. 计算机工程, 2024, 50(6): 188-196.
[3]	曾嘉忻, 张卫明, 张荣. 基于后门的鲁棒后向模型水印方法[J]. 计算机工程, 2024, 50(2): 132-139.
[4]	刘帅威, 李智, 王国美, 张丽. 基于Transformer和GAN的对抗样本生成算法[J]. 计算机工程, 2024, 50(2): 180-187.
[5]	李倩, 向海昀, 张玉婷, 甘昀, 廖浩德. 结合高斯滤波与MASK的G-MASK人脸对抗攻击[J]. 计算机工程, 2024, 50(2): 308-316.
[6]	张玉婷, 向海昀, 李倩, 廖浩德. 基于稳定Adam和空间域变换的对抗样本生成算法[J]. 计算机工程, 2024, 50(1): 251-258.
[7]	李哲铭, 王晋东, 侯建中, 李伟, 张世华, 张恒巍. 基于显著区域优化的对抗样本攻击方法[J]. 计算机工程, 2023, 49(9): 246-255, 264.
[8]	杨燕燕, 谢明轩, 曹江峡, 王学宾, 柳厅文, 杜彦辉. 基于原型网络的中文分类模型对抗样本生成[J]. 计算机工程, 2023, 49(8): 54-62.
[9]	白祉旭, 王衡军. 基于改进遗传算法的对抗样本生成方法[J]. 计算机工程, 2023, 49(5): 139-149.
[10]	王春东, 孙嘉琪, 杨文军. 基于矫正理解的中文文本对抗样本生成方法[J]. 计算机工程, 2023, 49(2): 37-45.
[11]	谢云旭, 吴锡, 彭静. 无锚框模型类梯度全局对抗样本生成[J]. 计算机工程, 2023, 49(10): 186-193.
[12]	王飞宇, 张帆, 杜加玉, 类红乐, 祁晓峰. 基于图像降噪与压缩的对抗样本检测方法[J]. 计算机工程, 2023, 49(10): 230-238.
[13]	郑德生, 陈继鑫, 周静, 柯武平, 陆超, 周永, 仇钎. 基于输入通道拆分的对抗攻击迁移性增强算法[J]. 计算机工程, 2023, 49(1): 130-137.
[14]	王树芬, 张哲, 马士尧, 陈俞强, 伍一. 一种鲁棒的半监督联邦学习系统[J]. 计算机工程, 2022, 48(6): 107-114,123.
[15]	杨文雪, 吴非, 郭桐, 肖利民. 基于噪声溶解的对抗样本防御方法[J]. 计算机工程, 2022, 48(4): 158-164.

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

端到端说话人辨认的对抗样本应用比较研究

Comparative Research on Application of Adversarial Samples for End-to-End Speaker Identification

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献

相关文章 15

编辑推荐

Metrics

本文评价