摘要: 为减小语料库对中文指代消解的影响,设计一个基于无监督聚类的中文名词短语指代消解平台并给出其预处理、特征选择及聚类过程。采用3种评测工具对中文新闻语料进行评测,在自动情况下,平均F值为59.43%。实验结果表明,该中文指代消解平台能够较好地解决中文缺少语料库的问题。
关键词:
无监督,
名词短语,
指代消解,
聚类,
自然语言,
语料
Abstract: The lack of public corpus is a big problem in the research of Chinese NLP. To eliminate the effect that lack of corpus to Chinese NLP, this paper presents a Chinese noun phrase coreference resolution system based on an unsupervised clustering approach and gives the details of the platform. The method adoptes three tools to evaluate the performance of the platform, in the case of auto, the average of F-measures achieves 59.43%. Experimental results show that the platform achieves good performance.
Key words:
unsupervised,
noun phrase,
coreference resolution,
clustering,
natural language,
corpus
中图分类号:
高俊伟, 孔芳, 朱巧明, 李培峰, 华秀丽. 无监督中文名词短语指代消解研究[J]. 计算机工程, 2012, 38(17): 189-191.
GAO Dun-Wei, KONG Fang, SHU Qiao-Meng, LI Pei-Feng, HUA Xiu-Li. Research of Unsupervised Chinese Noun Phrase Coreference Resolution[J]. Computer Engineering, 2012, 38(17): 189-191.