作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (11): 77-82. doi: 10.19678/j.issn.1000-3428.0063047

• 人工智能与模式识别 • 上一篇    下一篇

一种用于因果式语音增强的门控循环神经网络

李江和, 王玫   

  1. 桂林理工大学 信息科学与工程学院, 广西 桂林 541006
  • 收稿日期:2021-10-26 修回日期:2021-12-16 发布日期:2021-12-30
  • 作者简介:李江和(1996—),男,硕士研究生,主研方向为深度学习、语音增强;王玫(通信作者),教授、博士。
  • 基金资助:
    国家自然科学基金(62071135);广西自然科学基金(2020GXNSFAA159004)。

A Gated Recurrent Neural Network for Causal Speech Enhancement

LI Jianghe, WANG Mei   

  1. College of Information Science and Engineering, Guilin University of Technology, Guilin, Guangxi 541006, China
  • Received:2021-10-26 Revised:2021-12-16 Published:2021-12-30

摘要: 传统基于深度学习的语音增强方法为了提高网络对带噪语音的建模能力,通常采用非因果式的网络输入,由此导致了固定时延问题,使得语音增强系统实时性较差。提出一种用于因果式语音增强的门控循环神经网络CGRU,以解决实时语音增强系统中的固定时延问题并提高语音增强性能。为了更好地建模带噪语音信号的相关性,网络单元在计算当前时刻的输出时融合上一时刻的输入与输出。此外,采用线性门控机制来控制信息传输,以缓解网络训练过程中的过拟合问题。考虑到因果式语音增强系统对实时性要求较高,在CGRU网络中采用单门控的结构设计,以降低网络的结构复杂度,提高系统的实时性。实验结果表明,CGRU网络在增强后的语音感知质量、语音客观可懂度、分段信噪比指标上均优于GRU、SRNN、SRU等传统网络结构,在信噪比为0 dB的条件下,CGRU的平均语音感知质量和平均语音客观可懂度分别达到2.4和0.786。

关键词: 门控循环神经网络, 固定时延, 因果式语音增强, 语音质量, 语音可懂度

Abstract: Traditional speech enhancement methods based on deep learning typically require noncausal network input to improve the modeling ability of the network for noisy speech.However, this input leads to fixed delay and poor real-time performance of the speech enhancement system.A gated recurrent neural network for causal speech enhancement called CGRU is proposed to solve the fixed delay problem in real-time speech enhancement systems and improve speech enhancement performance.The network unit fuses the input and output of the previous time when calculating the output of the current time to effectively model the correlation of noisy speech signals.In addition, the linear gating mechanism is used to control the information transmission to alleviate the over-fitting problem during the network training process. Because the causal speech enhancement system requires high real-time performance, the CGRU adopts a single-gate control structure design in its network structure design to simplify the network structure and improve the real-time performance of the system.The experimental results show that the CGRU network is superior to the Gated Recurrent Unit(GRU), Simple Recurrent Neural Network(SRNN), Simple Recurrent Unit (SRU), and other traditional network structures in terms of enhanced speech perception quality, speech objective intelligibility, Segmented Signal-to-Noise Ratio (SSNR), and other indicators.For an Signal-to-Noise Ratio (SNR) of 0 dB, the average speech perception quality and speech objective intelligibility of the CGRU reach 2.4 and 0.786, respectively.

Key words: gated recurrent neural network, fixed delay, causal speech enhancement, speech quality, speech intelligibility

中图分类号: