摘要: 针对训练数据来源的多样化,提出了基于多模板隐马尔可夫模型的文本信息抽取算法。该算法利用形式的聚类方法将训练数据聚成几个类,每个类代表一个模板,在聚类的基础上利用隐马尔可夫模型进行文本的信息抽取。实验结果表明,新算法具有较高的精确度和召回率。
关键词:
信息抽取;隐马尔可夫模型;多模板;聚类
Abstract: This paper proposes a new algorithm using hidden Markov model for information extraction based on multiple templates due to the variety of training data. This new algorithm firstly clusters the training data into multiple templates based on the format, and then combines hidden Markov model for information extraction. The experiment results show that the new algorithm outperforms the original one, which hasn’t clustered the training data into multiple templates, in both recall and precision
Key words:
Information extraction; Hidden Markov model; Multiple templates; Clustering
钟敏娟,郝谦 ,刘云中. 基于多模板隐马尔可夫模型的文本信息抽取算法[J]. 计算机工程, 2006, 32(2): 203-205.
ZHONG Minjuan, HAO Qian, LIU Yunzhong. Information Extraction Algorithm Based on Multiple Templates Using Hidden Markov Model[J]. Computer Engineering, 2006, 32(2): 203-205.