计算机工程 ›› 2019, Vol. 45 ›› Issue (8): 255-259.doi: 10.19678/j.issn.1000-3428.0051780

• 多媒体技术及应用 • 上一篇    下一篇

基于深度神经网络的因果形式语音增强模型

袁文浩, 梁春燕, 夏斌   

  1. 山东理工大学 计算机科学与技术学院, 山东 淄博 255000
  • 收稿日期:2018-06-11 修回日期:2018-07-16 出版日期:2019-08-15 发布日期:2019-08-08
  • 作者简介:袁文浩(1985-),男,讲师、博士,主研方向为语音信号处理;梁春燕,讲师、博士;夏斌,副教授、博士。
  • 基金项目:
    国家自然科学基金(61701286,11704229);山东省自然科学基金(ZR2015FL003,ZR2017MF047,ZR2017LA011)。

Causal Speech Enhancement Model Based on Deep Neural Network

YUAN Wenhao, LIANG Chunyan, XIA Bin   

  1. School of Computer Science and Technology, Shandong University of Technology, Zibo, Shandong 255000, China
  • Received:2018-06-11 Revised:2018-07-16 Online:2019-08-15 Published:2019-08-08

摘要: 传统的基于深度神经网络(DNN)的语音增强方法由于采用非因果形式的输入,在处理过程中具有固定延时,不适用于实时性要求较高的场合。针对这一问题,从网络结构角度展开研究,通过实验对不同网络结构在不同输入形式下的语音增强性能进行对比,寻找适用于因果形式输入的网络结构,在此基础上,结合卷积神经网络和长短期记忆网络建立一个能充分利用先前帧信息的因果语音增强模型。实验结果表明,该模型在提高基于DNN的语音增强方法实时性的同时,保证了语音增强性能,其PESQ与STOI得分分别为2.25和0.76。

关键词: 语音增强, 因果形式输入, 延时, 深度神经网络, 卷积神经网络

Abstract: The traditional speech enhancement method based on Deep Neural Network (DNN) has a fixed delay in processing due to its non-causal input,which is unsuitable for the real-time applications.To solve this problem,studying from the perspective of network structures,comparing the speech enhancement performance of different network structures under different input formats through experiments,the network structure suitable for the causal input is found in this paper.On this basis,by combining Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM),a causal speech enhancement model that can fully utilize the information of previous frames is established.Experimental results show that the proposed model is able to improve the real-time performance of the DNN-based speech enhancement method while ensuring the speech enhancement performance,whose PESQ and STOI scores are 2.25 and 0.76.

Key words: speech enhancement, causal input, delay, Deep Neural Network(DNN), Convolutional Neural Network(CNN)

中图分类号: