作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2009, Vol. 35 ›› Issue (17): 1-3. doi: 10.3969/j.issn.1000-3428.2009.17.001

• 博士论文 •    下一篇

基于视频三音子的双模态语料自动选取算法

赵 晖,林成龙,唐朝京   

  1. (国防科技大学电子科学与工程学院,长沙 410073)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2009-09-05 发布日期:2009-09-05

Automatic Selecting Algorithm of Bimodal Corpus Based on Visual Triphone

ZHAO Hui, LIN Cheng-long, TANG Chao-jing   

  1. (College of Electronic Science and Engineering, National University of Defence Technology, Changsha 410073)
  • Received:1900-01-01 Revised:1900-01-01 Online:2009-09-05 Published:2009-09-05

摘要: 为实现可视语音合成,建立符合条件的双模态语料库,提出双模态语料自动选取算法。根据视频中唇部发音特征,对已有的三音子模型归类,形成视频三音子,在其基础上从原始语料中自动选取语料,利用评估函数对原始语料中的句子打分。与其他双模态语料库相比,该语料库在覆盖率等指标上有较大改进,为实现具有真实感的可视语音合成奠定基础。

关键词: 可视语音合成, 双模态语料, 视频三音子, 评估函数

Abstract: In order to realize visual speech synthesis, a satisfied bimodal database needs to be built up. This paper proposes an automatic corpus selection algorithm, according to features of lip pronunciation in video, visual triphone modal is established. Proposed algorithm automatically selects corpus from original corpus. Evaluation function is utilized to score sentences from original corpus. Compared to other bimodal databases, coverage rate, coverage efficiency and high-frequency words distribution are greatly improved, it builds a firm foundation for realistic visual speech synthesis.

Key words: visual speech synthesis, bimodal corpus, visual triphone, evaluation function

中图分类号: