Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering

Previous Articles     Next Articles

Research on Coefficient of Neighboring Dialect Differences Based on Hidden Markov Model

WANG Xuefei  a,LIU Jun  b   

  1. (a.Modern Education Technology Center; b.Department of Computer Science and Technology, College of Information Engineering,Huangshan University,Huangshan,Anhui 245011,China)
  • Received:2015-01-20 Online:2016-04-15 Published:2016-04-15

基于隐马可夫模型的邻近方言差异系数研究

王雪飞 a,刘珺 b   

  1. (黄山学院 a.现代教育技术中心; b.信息工程学院计算机科学技术系,安徽 黄山245011)
  • 作者简介:王雪飞(1963-),男,实验师,主研方向为WSN能耗、集成传感、数据绘图;刘珺,本科生。
  • 基金资助:
    国家文物局文化-遗产保护领域科学和技术研究基金资助项目(2013-YB-HT-015)。

Abstract: In the research of quantifying neighboring area’s dialect difference,this paper makes people read the independent word text A in dialect to form the sound file M,uses the HTK tool to structure the acoustic feature parameter set S_M for the file M,calculates and forms the diversity factor.Using this method in continuous i neighboring regions forms the homologous parameter set Si_Mi,while using the sound file Mi to compare to the sound-character (word) mapping table of the sample area (i=0),obtaining the text Ai of the village i.The ratio of text content differences between Ai and A0 (sample area or village) is defined as diversity factor ξ.By analyzing ξ feature of continuous villages,the paper finds that the dialect has less difference when the ξ value is between 0.88 and 1 in neighboring 3 villages(geographic location),while in 9 villages distance,the ξ value(synthesize) less than 0.6,and the ξ value of phrases less than 0.2,this difference changes quickly,so this paper establishes the dialect distance and proposes the concept of dialect radius,confirming that the dialect radius is eight(eight villages).

Key words: dialect sound, coefficient of dialect differences, Hidden Markov Toolkit(HTK) software, Hidden Markov Model(HMM), dialect radius

摘要: 量化邻近地域的方言差异性研究,运用方言朗读独立字词文本A形成声音文件M,使用HTK工具将M文件构造为声学特征参数集S_M,计算方言差异系数。在邻近连续i个地域基础上得到相应的Si_Mi,同时使声音Mi结合对比样本区域(i=0)音-字(词)映射表,形成i村落并对应文本Ai。差异系数ξ定义为Ai与A0(样本区域或村落)之间的文本内容差异之比。分析连续古村落ξ值特征结果表明,方言在邻近3个村落(地理位置)的ξ值介于0.88~1时,差异较小,而当邻近9个村落的ξ值(综合)小于0.6及词组ξ值小于0.2时,差异快速变大,建立方言距离并提出方言半径概念,确认所测试方言的半径为8(8个村落)。

关键词: 方言语音, 方言差异系数, HTK软件, 隐马可夫模型, 方言半径

CLC Number: