计算机工程 ›› 2017, Vol. 43 ›› Issue (12): 278-282,291.doi: 10.3969/j.issn.1000-3428.2017.12.050

• 开发研究与工程应用 • 上一篇    下一篇

基于PSOLA与DCT的情感语音合成方法

李勇,魏珰,王柳渝   

  1. (重庆邮电大学 自动化学院,重庆 400065)
  • 收稿日期:2016-12-06 出版日期:2017-12-15 发布日期:2017-12-15
  • 作者简介:李勇(1976—),男,副教授、博士,主研方向为情感语音合成、认知网络;魏珰、王柳渝,硕士研究生。

Emotional Speech Synthesis Method Based on PSOLA and DCT

LI Yong,WEI Dang,WANG Liuyu   

  1. (School of Automation,Chongqing University of Posts and Telecommunications,Chongqing 400065,China)
  • Received:2016-12-06 Online:2017-12-15 Published:2017-12-15

摘要: 情感语音合成可以增强语音的表现力,为使合成的情感语音更自然,提出一种结合时域基音同步叠加(PSOLA)和离散余弦变换(DCT)的情感语音合成方法。根据情感语音数据库中的高兴、悲伤、中性语音进行韵律参数分析归纳情感规则,调整中性语音各音节的基音频率、能量和时长。使用DCT方法对基音标记过的语音段进行基音频率的调整,并利用PSOLA算法修改基音频率使其逼近目标情感语音的基频。实验结果表明,该方法比单独使用PSOLA算法合成的情感语音更具情感色彩,其主观情感的识别率更高,合成的情感语音质量更好。

关键词: 情感语音合成, 离散余弦变换, 基音同步叠加, 基频, 时长, 能量

Abstract: Emotional speech synthesis is expected to make the synthesized speech more expressive.In order to synthesis more natural emotional speech signals,this paper proposes a new emotional speech synthesis method combining Pitch Synchronous Overlap Add(PSOLA) and Discrete Cosine Transform(DCT).The research builds up emotional rules for happy,sad,neutral speech.Through analyzing the prosody parameters,it can modify the each syllable of neutral speech’s fundamental frequency,energy and duration based on the emotional rules.The combination method adjusts pitch frequency for which marked pitch through DCT method,and then adjusts the pitch frequency to approach the target emotional fundamental frequency by the PSOLA algorithm.Experimental results show that the proposed method is more sensitive than the PSOLA algorithm.The subjective emotion recognition rate is higher,and the synthesized emotion speech quality is better.

Key words: emotional speech synthesis, Discrete Cosine Transform(DCT), Pitch Synchronous Overlap Add(PSOLA), fundamental frequency, duration, energy

中图分类号: