基于PSOLA与DCT的情感语音合成方法

doi:10.3969/j.issn.1000-3428.2017.12.050

计算机工程 ›› 2017, Vol. 43 ›› Issue (12): 278-282,291.

基于PSOLA与DCT的情感语音合成方法

李勇,魏珰,王柳渝

(重庆邮电大学自动化学院,重庆 400065)

收稿日期:2016-12-06 出版日期:2017-12-15 发布日期:2017-12-15
作者简介:李勇(1976—),男,副教授、博士,主研方向为情感语音合成、认知网络;魏珰、王柳渝,硕士研究生。

Emotional Speech Synthesis Method Based on PSOLA and DCT

LI Yong,WEI Dang,WANG Liuyu

(School of Automation,Chongqing University of Posts and Telecommunications,Chongqing 400065,China)

Received:2016-12-06 Online:2017-12-15 Published:2017-12-15

摘要/Abstract

摘要： 情感语音合成可以增强语音的表现力,为使合成的情感语音更自然,提出一种结合时域基音同步叠加(PSOLA)和离散余弦变换(DCT)的情感语音合成方法。根据情感语音数据库中的高兴、悲伤、中性语音进行韵律参数分析归纳情感规则,调整中性语音各音节的基音频率、能量和时长。使用DCT方法对基音标记过的语音段进行基音频率的调整,并利用PSOLA算法修改基音频率使其逼近目标情感语音的基频。实验结果表明,该方法比单独使用PSOLA算法合成的情感语音更具情感色彩,其主观情感的识别率更高,合成的情感语音质量更好。

关键词: 情感语音合成, 离散余弦变换, 基音同步叠加, 基频, 时长, 能量

Abstract: Emotional speech synthesis is expected to make the synthesized speech more expressive.In order to synthesis more natural emotional speech signals,this paper proposes a new emotional speech synthesis method combining Pitch Synchronous Overlap Add(PSOLA) and Discrete Cosine Transform(DCT).The research builds up emotional rules for happy,sad,neutral speech.Through analyzing the prosody parameters,it can modify the each syllable of neutral speech’s fundamental frequency,energy and duration based on the emotional rules.The combination method adjusts pitch frequency for which marked pitch through DCT method,and then adjusts the pitch frequency to approach the target emotional fundamental frequency by the PSOLA algorithm.Experimental results show that the proposed method is more sensitive than the PSOLA algorithm.The subjective emotion recognition rate is higher,and the synthesized emotion speech quality is better.

Key words: emotional speech synthesis, Discrete Cosine Transform(DCT), Pitch Synchronous Overlap Add(PSOLA), fundamental frequency, duration, energy

中图分类号:

TP391.42

李勇,魏珰,王柳渝. 基于PSOLA与DCT的情感语音合成方法[J]. 计算机工程, 2017, 43(12): 278-282,291.

LI Yong,WEI Dang,WANG Liuyu. Emotional Speech Synthesis Method Based on PSOLA and DCT[J]. Computer Engineering, 2017, 43(12): 278-282,291.

https://www.ecice06.com/CN/Y2017/V43/I12/278

参考文献

参考文献［1］RUSSELL S J,NORVIG P,CANNY J F,et al.Artificial Intelligence:A Modern Approach［M］.Upper Saddle River,USA:Prentice Hall,Inc.,2003. ［2］邵艳秋,韩纪庆,王卓然,等.韵律参数和频谱包络修改相结合的情感语音合成技术研究［J］.信号处理,2007,23(4):526-530. ［3］WU C H,HSIA C C,LEE C H,et al.Hierarchical Prosody Conversion Using Regression-based Clustering for Emotional Speech Synthesis［J］.IEEE Transactions on Audio,Speech,and Language Processing,2010,18(6):1394-1405. ［4］HAMADA Y,ELBAROUGY R,AKAGI M.A Method for Emotional Speech Synthesis Based on the Position of Emotional State in Valence-activation Space［C］//Pro-ceedings of Signal and Information Processing Associa-tion Annual Summit and Conference.Washington D.C.,USA:IEEE Press,2014:1-7. ［5］MOULINES E,CHARPENTIER F.Pitch-synchronous Waveform Processing Techniques for Text-to-speech Synthesis Using Diphones［J］.Speech Communication,1990,9(5/6):453-467. ［6］GOVIND D,PRASANNA S R M.Expressive Speech Synthesis:A Review［J］.International Journal of Speech Technology,2013,16(2):237-260. ［7］MURALISHANKAR R,RAMAKRISHNAN A G,PRATHIBHA P.Modification of Pitch Using DCT in the Source Domain［J］.Speech Communication,2004,42(2):143-154. ［8］耿德勤.医学心理学［M］.南京:东南大学出版社,2003. ［9］YADAV J,RAO K S.Generation of Emotional Speech by Prosody Imposition on Sentence,Word and Syllable Level Fragments of Neutral Speech［C］//Proceedings of 2015 International Conference on Cognitive Computing and Information Processing.Washington D.C.,USA:IEEE Press,2015:1-5. ［10］郑继明,王劲松.语音基音周期检测方法［J］.计算机工程,2010,36(10):273-275. ［11］赵力.语音信号处理［M］.北京:机械工业出版社,2003. ［12］RAO K R,YIP P.Discrete Cosine Transform:Algorithms,Advantages,Applications［J］.Discrete Cosine Transform Algorithms Advantages Applications,1990,14(6):507-508. (下转第291页) (上接第282页) ［13］简志华,杨震.一种用于语声转换系统的LPC残差信号生成算法［J］.信号处理,2008,24(5):762-765. ［14］KAIN A B.High Resolution Voice Transformation［D］.Portland,USA:Oregon Health & Science University,2001. ［15］LEE K S.Statistical Approach for Voice Personality Transformation［J］.IEEE Transactions on Audio,Speech,and Language Processing,2007,15(2):641-651. ［16］陈愉,张宗红,李炜,等.PSOLA技术在汉语文-语转换系统中的应用［J］.计算机工程,2000,26(1):84-86. 编辑陆燕菲

[1]	许晋, 贾向东, 韩向花, 张兴元. 非线性能量采集的工业物联网非线性信息年龄分析[J]. 计算机工程, 2024, 50(8): 198-206.
[2]	更藏措毛, 黄鹤鸣, 杨毅杰. 融合多尺度特征与上下文信息的语音增强方法[J]. 计算机工程, 2024, 50(6): 138-147.
[3]	张雷, 沈国琛, 欧冬秀. 用于热成像数据的卷积神经网络特征图筛选方法[J]. 计算机工程, 2024, 50(4): 31-40.
[4]	白杰, 赵琰. 结合四元数拉盖尔矩和三维结构的图像哈希[J]. 计算机工程, 2024, 50(4): 208-218.
[5]	吴嘉鑫, 孙一飞, 吴亚兰, 武继刚. 面向安全传输的低能耗无人机轨迹优化算法[J]. 计算机工程, 2024, 50(2): 59-67.
[6]	李贵勇, 高馨雨, 于晓娜. 毫米波非正交多址接入的节能资源分配[J]. 计算机工程, 2024, 50(12): 194-199.
[7]	杨荐焜, 王茜竹, 冉泳屹, 陈丰. 基于IRS辅助通信的无线供能传感器网络信息年龄优化[J]. 计算机工程, 2023, 49(8): 146-153,162.
[8]	郝振超, 贾向东, 陈智, 许晋. 非线性能量采集的WSN状态更新系统信息年龄研究[J]. 计算机工程, 2023, 49(5): 198-205,214.
[9]	区展华, 李翠然, 杨茜. 基于ANN的能量采集无线传感器网络中继选择策略[J]. 计算机工程, 2023, 49(5): 215-222,230.
[10]	刘晶晶, 黄浩. 引入非局部模块卷积神经网络的基频提取模型[J]. 计算机工程, 2023, 49(3): 128-133,160.
[11]	谷允捷, 吴长禾, 吴庆, 张伟, 吕天航, 胡琪, 宋晓斌, 闫吉宇. 面向时延优化的级联漏洞扫描引擎部署策略[J]. 计算机工程, 2023, 49(3): 161-167,176.
[12]	赵文薇, 林兵, 卢宇, 王明芬. 面向汽车充电预约的光储充电站能量调度策略[J]. 计算机工程, 2023, 49(12): 262-273, 281.
[13]	付鹏程, 杨关, 刘小明, 刘阳, 张紫明, 成曦. 基于空间关系与频率特征的视觉问答模型[J]. 计算机工程, 2022, 48(9): 96-104.
[14]	王海浪, 张玲华. 基于PEGASIS的无线传感器网络路由协议改进[J]. 计算机工程, 2022, 48(12): 165-171,179.
[15]	郝振超, 贾向东, 陈智, 许晋. 基于能量采集的WSN状态更新系统信息年龄研究[J]. 计算机工程, 2022, 48(11): 257-265.

选择文件类型/文献管理软件名称

选择包含的内容

基于PSOLA与DCT的情感语音合成方法

Emotional Speech Synthesis Method Based on PSOLA and DCT

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于PSOLA与DCT的情感语音合成方法

Emotional Speech Synthesis Method Based on PSOLA and DCT

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价