Abstract:
According to the statistical analysis of Tibetan corpus and the research of the Tibetan grammar, this paper gives an analysis system model of Tibetan character property, meanwhile designs basic components character database, combination component character data base, coarse-grained structure character database, fine-grained structure character database, and the analysis algorithm of character property for the system. This system enables to conduct deep research on properties for modern Tibetan characters, and provides theoretical foundation for the Tibetan keyboard layout, the Tibetan input method, the Tibetan search engine, the Tibetan-related machine translation, and the network security. This system promotes the further development of Tibetan information processing.
Key words:
Chinese information processing,
property,
component,
character structure
摘要: 通过对藏语语料库的统计和现代藏文字结构的分析,研究现代藏文字属性分析系统的模型,设计基本构件字表库、组合构件字表库、粗粒度结构字表库及细粒度结构字表库,并阐述各字表库的结构特征,介绍藏文字属性分析算法。运用该算法及藏文字属性分析系统模型,解析现代藏文字的使用频度、结构、字长、构件分解、各构件的位置及频度等属性,从而为藏文键盘布局、藏文输入法研究、藏文搜索引擎、机器翻译和网络信息安全等提供理论依据。
关键词:
中文信息处理,
属性,
构件,
字结构
CLC Number:
CAI Zhi-Jie, CAI Rang-Zhuo-Ma. Design of Tibetan Character Property Analysis System Based on Corpora[J]. Computer Engineering, 2011, 37(22): 270-272.
才智杰, 才让卓玛. 基于语料库的藏文字属性分析系统设计[J]. 计算机工程, 2011, 37(22): 270-272.