Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2007, Vol. 33 ›› Issue (21): 199-201. doi: 10.3969/j.issn.1000-3428.2007.21.071

• Artificial Intelligence and Recognition Technology • Previous Articles     Next Articles

Extraction of Entity Relation Templates from Text Collections

CHEN Xiao-ying, HU Yi, LU Ru-zhan   

  1. (Department of Computer Science and Engineering, Shanghai Jiaotong University, Shanghai 200240)
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-11-05 Published:2007-11-05

实体关系模板的获取技术

陈晓颖,胡 熠,陆汝占   

  1. (上海交通大学计算机科学与工程系,上海 200240)

Abstract: Extracting entity relation is benifit to understand the meaning of text, so as to increase correctness of searching. This paper researches on extracting Chinese entity relation templates from text collections, and puts forward a kind of bootstrapping method called STG. This method makes use of sequence matching technique in bioinformatics to generate semantic templates within context of Chinese entities. A new model of evaluation is presented to select better templates while tuples are expanded to obtain high quality in the next iteration of training. Experimental results show that the templates created by STG not only can cover a large number of tuples, but also can reach 99% accuracy.

Key words: information extraction, machine learning, bootstrapping

摘要: 确定实体间的关系有助于理解文本,提高信息检索的正确率。该文研究中文实体关系模板的获取技术,提出了一种STG的bootstrapping训练方法。该方法采用生物信息学中的序列比对技术计算上下文的语义模板,使用一定的评估机制筛选模板,有效地扩充元组以提高下一轮训练的质量。实验结果表明,STG生成的模板不仅能覆盖大量的元组,而且正确率可达99%。

关键词: 信息提取, 机器学习, bootstrapping

CLC Number: