基于非负矩阵分解的情感语音基频转换研究

doi:10.19678/j.issn.1000-3428.0046903

计算机工程 ›› 2018, Vol. 44 ›› Issue (5): 256-261. doi: 10.19678/j.issn.1000-3428.0046903

基于非负矩阵分解的情感语音基频转换研究

邓叶勋,赵晖

新疆大学信息科学与工程学院,乌鲁木齐 830046

收稿日期:2017-04-20 出版日期:2018-05-15 发布日期:2018-05-15
作者简介:邓叶勋(1991—),男,硕士研究生,主研方向为情感语音分析与转换、人工智能;赵晖(通信作者),教授、博士生导师。
基金资助:
国家自然科学基金(61561047)。

Research on F0 Conversion of Emotional Voice Based on Non-negative Matrix Factorization

DENG Yexun,ZHAO Hui

College of Information Science and Engineering,Xinjiang University,Urumqi 830046,China

Received:2017-04-20 Online:2018-05-15 Published:2018-05-15

摘要/Abstract

摘要： 为解决情感语音基频转换过程中基频建模的间断性问题,提高生成语音的情感自然度,利用非负矩阵分解(NMF)技术,提出带有参数控制的情感语音基频转换方法。选择连续小波变换参数化基频并对语音韵律结构中的各层级进行独立建模,采用NMF将基频特征数据分解为基范例及其对应的权重,将目标基范例替换待转换语音基范例并重建目标语音基频。此外,引入激活度调整因子作为参数控制对现有模型进行优化。实验结果表明,在小数据库语料中,该方法在基频重建误差与情感力度方面都显示出优势,且能够有效地将中性语音转换为情感语音。

关键词: 情感语音转换, 连续小波变换, 非负矩阵分解, 基频转换, 韵律层级

Abstract: In order to solve the discontinuous problem of the F0 modeling in the process of the emotional voice conversion,and improve the emotional naturalness of the generated voice,a method for F0 conversion of emotional voice with parameter control based on Non-negative Matrix Factorization(NMF) is proposed.The Continuous Wavelet Transform (CWT) is used to parameterize F0 and model the different levels in the phonetic prosody structure.Then,the characteristic data of F0 is decomposed to the base exemplars and their weights by using the NMF,and by replacing the base exemplars of being converted voice with target,the F0 of the target voice is constructed.In addition,the activation factor,as control parameter,is introduced to optimize the existing model.Experimental results show that,this proposed method has a certain advantage in both the fundamental frequency reconstruction error and the emotional intensity,and can effectively convert the neutral voice to the emotional voice.

Key words: emotional voice conversion, Continuous Wavelet Transform(CWT), Non-negative Matrix Factorization(NMF), F0 conversion, prosody level

中图分类号:

TP391

邓叶勋,赵晖. 基于非负矩阵分解的情感语音基频转换研究[J]. 计算机工程, 2018, 44(5): 256-261.

DENG Yexun,ZHAO Hui. Research on F0 Conversion of Emotional Voice Based on Non-negative Matrix Factorization[J]. Computer Engineering, 2018, 44(5): 256-261.

http://www.ecice06.com/CN/Y2018/V44/I5/256

参考文献

［1］凌震华,高丽,戴礼荣.基于目标逼近特征和双向联想贮存器的情感语音基频转换［J］.天津大学学报(自然科学与工程技术版),2015,48(8):670-674.
［2］MING H,HUANG D,DONG M,et al.Fundamental frequency modeling using wavelets for emotional voice conversion［C］//Proceedings of 2015 International Conference on Affective Computing and Intelligent Interaction.Washington D.C.,USA:IEEE Press,2015:804-809.
［3］ABE M,NAKAMURA S,SHIKANO K,et al.Voice conversion through vector quantization［J］.Journal of the Acoustical Society of Japan,1988,11(2):71-76.
［4］AFIFY M,CUI X,GAO Y.Stereo-based stochastic mapping for robust speech recognition［J］.IEEE Transactions on Audio Speech and Language Processing,2009,17(7):1325-1334.
［5］YE H,YOUNG S.Perceptually weighted linear transfor-mations for voice conversion［EB/OL］.［2017-04-20］.https://wenku.baidu.com/view/582d53353186bceb18e8bbc9.html.
［6］DESAI S,RAGHAVENDRA E V,YEGNANARAYANA B,et al.Spectral mapping using artificial neural networks for voice conversion［J］.IEEE Transactions on Audio Speech and Language Processing,2010,18(5):954-964.
［7］TAO J,KANG Y,LI A.Prosody conversion from neutral speech to emotional speech［J］.IEEE Transac-tions on Audio Speech and Language Processing,2006,14(4):1145-1154.
［8］CHAO Y R.A Grammar of Spoken Chinese［M］.Berkeley,USA:University of California Press,1970.
［9］李贤,於俊,汪增福.面向情感语音转换的韵律转换方法［J］.声学学报,2014,39(4):509-516.
［10］孙健,张雄伟,曹铁勇,等.基于卷积非负矩阵分解的语音转换方法［J］.数据采集与处理,2013,28(2):141-148.
［11］SANCHEZ G,SILEN H,NURMINEN J,et al.Hierarchical modeling of F0 contours for voice conversion［EB/OL］.［2017-04-20］.http://www.isca-speech.org/archive/archive_papers/interspeech_2014/i14_2318.pdf.
［12］AIHARA R,UEDA R,TAKIGUCHI T,et al.Exemplar-based emotional voice conversion using non-negative matrix factorization［C］//Proceedings of 2014 Summit and Conference on Asia-Pacific Signal and Information Processing Association.Washington D.C.,USA:IEEE Press,2014:1-7.
［13］杜楠楠,赵晖.维吾尔语情感语音韵律转换研究［J］.计算机工程与应用,2016,52(19):154-160.
［14］KAWAHARA H,MORISE M,TAKAHASHI T,et al.Tandem-STRAIGHT:a temporally stable power spectral representation for periodic signals and applications to interference-free spectrum,F0,and aperiodicity estimation［C］//Proceedings of 2008 IEEE International Conference on Acoustics,Speech and Signal Processing.Washington D.C.,USA:IEEE Press,2008:3933-3936.
［15］LEE D D,SEUNG H S.Learning the parts of objects by non-negative matrix factorization［J］.Nature,1999,401(6755):788-791.
［16］TAKASHIMA R,TAKIGUCHI T,ARIKI Y.Exemplar-based voice conversion in noisy environment［C］//Proceedings of IEEE Workshop on Spoken Language Technology.Washington D.C.,USA:IEEE Press,2013:313-317.
［17］郭立,张守志,汪卫,等.一种增量式非负矩阵分解算法［J］.计算机工程,2010,36(4):66-68.

[1]	陈君航, 杨祖元, 刘名扬, 李陵江. 基于正交约束的广义可分离非负矩阵分解算法[J]. 计算机工程, 2023, 49(8): 46-53.
[2]	吕少卿, 赵雪莉, 张潘, 任新成. 一种保留社区结构信息的网络嵌入算法[J]. 计算机工程, 2021, 47(12): 122-130.
[3]	王静,杨丹. 基于邻近交替线性化的稀疏非负矩阵分解算法[J]. 计算机工程, 2019, 45(2): 220-225,232.
[4]	朱俊霖, 王海平, 杨祖元. 带标签约束的心肺音分离方法[J]. 计算机工程, 2019, 45(12): 314-320.
[5]	王晓莹, 谢钧, 陶性留, 邵东生, 王忠. 基于嵌入式特征提取的多标记分类算法[J]. 计算机工程, 2019, 45(11): 172-176.
[6]	王泽华, 柯新生. 基于Coclus联合聚类与非负矩阵分解的推荐算法[J]. 计算机工程, 2019, 45(11): 68-73,80.
[7]	陈吉成, 陈鸿昶, 于洪涛. 基于聚类质量的半监督INMF动态社区检测算法[J]. 计算机工程, 2019, 45(10): 227-233.
[8]	陈梦伟,吕钊,崔修涛. 基于联合非负矩阵分解的话题变迁检测方法[J]. 计算机工程, 2018, 44(1): 35-43.
[9]	陈露露,郭文普,何灏. 基于ME-PGNMF的异常流量检测方法[J]. 计算机工程, 2018, 44(1): 165-170.
[10]	王超锋,施俊,吴金杰,朱捷. 基于Hessian正则化的多视图联合非负矩阵分解算法[J]. 计算机工程, 2017, 43(11): 134-139.
[11]	张凤斌,葛海洋,杨泽. 非负矩阵分解在免疫入侵检测中的优化和应用[J]. 计算机工程, 2016, 42(5): 173-178,185.
[12]	李煜,何世钧. 基于投影梯度的非负矩阵分解盲信号分离算法[J]. 计算机工程, 2016, 42(2): 104-107,112.
[13]	马慧芳,贾美惠子,袁媛,张志昌. 融合词项关联关系的半监督微博聚类算法[J]. 计算机工程, 2015, 41(5): 202-206,212.
[14]	陈芸,董西伟,荆晓远. 联合混合范数约束和增量非负矩阵分解的目标跟踪[J]. 计算机工程, 2015, 41(12): 260-264.
[15]	吴月,叶庆卫,王晓东,周宇. 一种强鲁棒性的稀疏NMF算法研究与应用[J]. 计算机工程, 2014, 40(12): 214-219,224.

选择文件类型/文献管理软件名称

选择包含的内容

基于非负矩阵分解的情感语音基频转换研究

Research on F0 Conversion of Emotional Voice Based on Non-negative Matrix Factorization

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于非负矩阵分解的情感语音基频转换研究

Research on F0 Conversion of Emotional Voice Based on Non-negative Matrix Factorization

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价