作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2011, Vol. 37 ›› Issue (5): 256-257,260. doi: 10.3969/j.issn.1000-3428.2011.05.087

• 开发研究与设计技术 • 上一篇    下一篇

面向语音转换的汉语语料自动选取算法

沈亚敏,赵 晖,张 权,唐朝京   

  1. (国防科学技术大学电子科学与工程学院,长沙 410073)
  • 出版日期:2011-03-05 发布日期:2012-10-31
  • 作者简介:沈亚敏(1983-),女,博士研究生,主研方向:多媒体通信,语音转换,网络语音安全;赵 晖,博士;张 权,副教授;唐朝京,教授、博士生导师
  • 基金资助:
    国家部委基金资助项目

Automatic Selection Algorithm of Mandarin Corpus for Voice Conversion

SHEN Ya-min, ZHAO Hui, ZHANG Quan, TANG Chao-jing   

  1. (College of Electronic Science and Engineering, National University of Defence Technology, Changsha 410073, China)
  • Online:2011-03-05 Published:2012-10-31

摘要: 为实现语音转换,建立符合要求的汉语语音转换语料库,提出一种基于半音节模型的语料自动选取算法。根据语音转换训练时需要语料数量较少的特征,选择半音节作为语料库的基本单位。在此基础上,从原始语料中自动选取语料,根据语音转换对说话人特征较敏感的情况,利用评估函数并根据半音节的出现次数对原始语料中的句子进行打分。实验结果表明,与传统算法相比,该算法在语音库自动选取615句汉语语料时,可以覆盖97.8%的带声调半音节,其覆盖效率、覆盖率和稀疏度有较大改进。

关键词: 中文信息处理, 语音库, 语音转换, 覆盖率

Abstract: In order to realize voice conversion, a satisfied voice conversion corpus needs to be built up. This paper proposes an automatic corpus selection algorithm based on the semi-syllable model. Because the number of corpus sentences is small for voice conversion, the semi-syllable is chosen as the basic unit of the corpus. The algorithm automatically selects corpus from original corpus. An evaluation function is utilized to score sentences from original corpus according to the number and the kind of semi-syllable. When the number of chosen sentences is 615, the set of selected text covers 97.8% of the semi-syllables. The covering rate, coverage efficiency and sparse rate are obviously better than that of conventional algorithms.

Key words: Chinese information processing, speech database, voice conversion, covering rate

中图分类号: