作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2021, Vol. 47 ›› Issue (10): 75-81. doi: 10.19678/j.issn.1000-3428.0059354

• 人工智能与模式识别 • 上一篇    下一篇

一种基于时频域特征融合的语音增强方法

袁文浩, 时云龙, 胡少东, 娄迎曦   

  1. 山东理工大学 计算机科学与技术学院, 山东 淄博 255000
  • 收稿日期:2020-08-25 修回日期:2020-09-28 发布日期:2020-10-15
  • 作者简介:袁文浩(1985-),男,副教授、博士,主研方向为语音信号处理;时云龙、胡少东、娄迎曦,硕士研究生。
  • 基金资助:
    国家自然科学基金(61701286)。

A Speech Enhancement Approach Based on Fusion of Time-Domain and Frequency-Domain Features

YUAN Wenhao, SHI Yunlong, HU Shaodong, LOU Yingxi   

  1. College of Computer Science and Technology, Shandong University of Technology, Zibo, Shandong 255000, China
  • Received:2020-08-25 Revised:2020-09-28 Published:2020-10-15

摘要: 为充分利用含噪语音特征来提升深度神经网络的语音增强性能,提出一种融合时频域特征的语音增强方法。以含噪语音的波形和纯净语音的对数功率谱分别作为训练特征和训练目标,获取含噪语音时域特征到纯净语音频域特征的映射关系。将含噪语音的波形和对数功率谱共同作为训练特征,构建融合含噪语音时域和频域特征的深度神经网络实现语音增强。实验结果表明,与单纯使用频域特征的语音增强方法相比,该方法能够明显提升增强语音的质量和可懂度,具有更好的语音增强性能。

关键词: 语音增强, 深度神经网络, 特征融合, 时域特征, 频域特征

Abstract: To make full use of noisy speech features to improve the speech enhancement performance of deep neural networks, a speech enhancement method based on the fusion of time-domain and frequency-domain features is proposed.First, by using the waveform of noisy speech as the training feature and the log power spectrum of clean speech as the training target, the mapping from the time-domain features of noisy speech to the frequency-domain features of clean speech is designed.On this basis, the waveform and log power spectrum of noisy speech are used as training features to construct a speech enhancement network that integrates the time-domain and frequency-domain features of noisy speech.Experimental results show that compared with the methods using only frequency-domain features, the proposed method can significantly improve the quality and intelligibility of enhanced speech, and has better speech enhancement performance.

Key words: speech enhancement, deep neural network, feature fusion, time-domain feature, frequency-domain feature

中图分类号: