摘要: 现有藏语句法体系复杂,不利于藏文自然语言处理的应用。为此,提出基于判别式的藏语依存句法分析方法,采用感知机方法训练句法分析模型,CYK自底向上算法解码生成最大生成树。实验结果表明,在人工标注的测试集上,句法分析正确率达到81.2%,可实际应用到藏语依存树库的构建和其他自然语言处理中。
关键词:
藏语依存句法,
句法标注规范,
最大生成树,
特征模板,
依存句法,
感知机
Abstract: The existing Tibetan syntax system is complex, which is not conducive to the application of Tibetan natural language processing. So this paper describes an approach based on discriminant for analysis of Tibetan text dependency structure, where perceptron training method is used to training parsing model. And it also proposes a maximum spanning tree with CYK from the bottom-up algorithm for decoding. Experimental results show that, the method obtains acceptable score of 81.2% on manual test set. And it is applicable to Tibetan dependency library and other natural language processing.
Key words:
Tibetan dependency syntax,
syntax tagging specification,
maximum-spanning tree,
feature template dependency syntax,
perceptron
中图分类号:
华却才让, 赵海兴. 基于判别式的藏语依存句法分析[J]. 计算机工程, 2013, 39(4): 300-304.
HUA Jiao-Cai-Rang, DIAO Hai-Xin. Tibetan Text Dependency Syntactic Analysis Based on Discriminant[J]. Computer Engineering, 2013, 39(4): 300-304.