Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2021, Vol. 47 ›› Issue (3): 291-297,303. doi: 10.19678/j.issn.1000-3428.0058025

• Development Research and Engineering Application • Previous Articles     Next Articles

Research on Recording Playback Attack Detection Based on Mixed Features of Gaussian Filter Bank

CHEN Xu, JIANG Ye   

  1. School of Information Engineering, Nanjing University of Finance & Economics, Nanjing 210023, China
  • Received:2020-04-10 Revised:2020-05-12 Published:2021-03-15

基于高斯滤波器组混合特征的录音回放攻击检测研究

陈旭, 蒋晔   

  1. 南京财经大学 信息工程学院, 南京 210023
  • 作者简介:陈旭(1996-),男,硕士研究生,主研方向为说话人识别;蒋晔(通信作者),副教授、博士。
  • 基金资助:
    江苏省自然科学基金青年基金(BK20150987)。

Abstract: Recording playback is a prevalent means of counterfeit speech attacks faced by the existing voiceprint recognition techniques.To address the problem that traditional speech features fail to distinguish real speech from playback,this paper proposes a hybrid cepstrum feature extraction algorithm using Fisher ratio based on Gaussian filter bank.The Gaussian filter bank is used to replace the traditional triangular filter bank,and the linear frequency and the inverse ERB frequency are used to replace the MEL frequency to form two new features:the Gaussian-Linear Frequency Cepstral Coefficients(G-LFCC) and the Gaussian-Inverse ERB Frequency Cepstral Coefficients(G-IEFCC).The Fisher criterion is used to fuse G-LFCC and G-IEFCC to form a new mixed feature parameter.The new feature increases the difference between the real voice and the playback in the high frequency band,while reducing the interference caused by the different recordings and playback devices in the low frequency band of the playback.Experimental results on the evaluation data of ASVSpoof2017 show that the mixed feature proposed in this paper has a significant impact,reducing the EER by 21.8%,38.8%,58.3%,and 43.7% compared with the IMFCC,LFCC,CQCC,and GSV algorithms.

Key words: recording playback, Fisher criterion, Gaussian filter bank, inverse ERB frequency, linear frequency

摘要: 录音回放是目前声纹识别技术应对各种仿冒语音攻击的主要手段。针对传统语音特征无法区分真实语音和回放语音的问题,提出一种基于高斯滤波器组的Fisher比混合倒谱特征提取算法。将高斯滤波器组代替传统三角滤波器组,分别采用线性频率和逆ERB频率替换MEL频率,形成高斯线性频率倒谱系数特征(G-LFCC)和高斯逆ERB频率倒谱系数特征(G-IEFCC)两个新的特征。通过Fisher准则将G-LFCC和G-IEFCC融合,生成新的混合特征参数,该特征提高了真实语音和回放语音在高频段的区分度,同时降低回放语音在低频段因不同录音及回放设备造成的干扰。在ASVSpoof2017评测数据上的实验结果表明,该算法混合特征具有较好的区分效果,与IMFCC、LFCC、CQCC和GSV等算法相比,等错误概率分别降低21.8%、38.8%、58.3%和43.7%。

关键词: 录音回放, Fisher准则, 高斯滤波器组, 逆ERB频率, 线性频率

CLC Number: