计算机工程 ›› 2007, Vol. 33 ›› Issue (21): 199-201.doi: 10.3969/j.issn.1000-3428.2007.21.071

• 人工智能及识别技术 • 上一篇    下一篇

实体关系模板的获取技术

陈晓颖,胡 熠,陆汝占   

  1. (上海交通大学计算机科学与工程系,上海 200240)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-11-05 发布日期:2007-11-05

Extraction of Entity Relation Templates from Text Collections

CHEN Xiao-ying, HU Yi, LU Ru-zhan   

  1. (Department of Computer Science and Engineering, Shanghai Jiaotong University, Shanghai 200240)
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-11-05 Published:2007-11-05

摘要: 确定实体间的关系有助于理解文本,提高信息检索的正确率。该文研究中文实体关系模板的获取技术,提出了一种STG的bootstrapping训练方法。该方法采用生物信息学中的序列比对技术计算上下文的语义模板,使用一定的评估机制筛选模板,有效地扩充元组以提高下一轮训练的质量。实验结果表明,STG生成的模板不仅能覆盖大量的元组,而且正确率可达99%。

关键词: 信息提取, 机器学习, bootstrapping

Abstract: Extracting entity relation is benifit to understand the meaning of text, so as to increase correctness of searching. This paper researches on extracting Chinese entity relation templates from text collections, and puts forward a kind of bootstrapping method called STG. This method makes use of sequence matching technique in bioinformatics to generate semantic templates within context of Chinese entities. A new model of evaluation is presented to select better templates while tuples are expanded to obtain high quality in the next iteration of training. Experimental results show that the templates created by STG not only can cover a large number of tuples, but also can reach 99% accuracy.

Key words: information extraction, machine learning, bootstrapping

中图分类号: