作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2020, Vol. 46 ›› Issue (2): 286-291. doi: 10.19678/j.issn.1000-3428.0053836

• 开发研究与工程应用 • 上一篇    下一篇

藏文句义分割方法

柔特a,b,c, 色差甲a, 才让加a,b,c   

  1. 青海师范大学 a. 计算机学院;b. 青海省藏文信息处理与机器翻译重点实验室;c. 藏文信息处理教育部重点实验室, 西宁 810008
  • 收稿日期:2019-01-28 修回日期:2019-04-03 发布日期:2019-04-29
  • 作者简介:柔特(1975-),男,副教授、博士研究生,主研方向为自然语言理解;色差甲,博士研究生;才让加,教授、博士生导师。
  • 基金资助:
    国家重点研发计划(2017YFB1402200);国家自然科学基金(61662061);国家社会科学基金(14BYY132)。

Semantic Segmentation Method of Tibetan Sentences

ROU Tea,b,c, SE Chajiaa, CAI Rangjiaa,b,c   

  1. a. Computer College;b. Provincial Key Laboratory of Tibetan Intelligent Information Processing and Machine Translation;c. Key Laboratory of Tibetan Information Processing, Ministry of Education, Qinghai Normal University, Xining 810008, China
  • Received:2019-01-28 Revised:2019-04-03 Published:2019-04-29

摘要: 句子是字或词根据语法规则进行组合的编码,句义分割是句子组合规律的解码问题,即对句义进行解析。在藏文分词后直接进行语义分析,其颗粒度过小,容易出现词语歧义,而以句子为分析单位,则颗粒度过大,不能较好地揭示句子的语义。为此,提出一种藏文句义分割方法,通过长度介于词语和句子之间的语义块单元进行句义分割。在对句子进行分词和标注的基础上,重新组合分词结果,将句子分割为若干个语义块,并采用空洞卷积神经网络模型对语义块进行识别。实验结果表明,该方法对藏文句义分割的准确率达到94.68%。

关键词: 句义分割, 语义块, 语义分析, 空洞卷积神经网络, 藏文

Abstract: Sentences are characters or words that are combined according to grammatical rules.Semantic segmentation is a decoding problem of sentence combination rules,that is,parsing the meaning of sentences.If the semantic analysis is performed directly after the Tibetan word segmentation,the granularity is too small,and word ambiguity is prone to occur.However,if the sentence is used as the analysis unit,the granularity is too large to reveal the semantics of the sentence.To this end,this paper proposes a semantic segmentation method for Tibetan sentences.The method segments sentences by semantic chunk,the length of which is between a word and a sentence.After word segmentation and labeling of the sentence,the word segmentation results are re-combined to segment the sentence into several semantic chunks.Then the dilated convolutional neural network model is used to identify the semantic chunks.Experimental results show that the accuracy of the proposed method for Tibetan sentences achieves 94.68%.

Key words: semantic segmentation of sentences, semantic chunk, semantic analysis, dilated convolutional neural network, Tibetan

中图分类号: