计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

LS-SVM 与条件随机场结合的生物证据句子抽取

张力元,姬东鸿   

  1. (武汉大学计算机学院,武汉430072)
  • 收稿日期:2014-06-20 出版日期:2015-05-15 发布日期:2015-05-15
  • 作者简介:张力元(1990 - ),男,硕士研究生,主研方向:自然语言处理,机器学习;姬东鸿,教授、博士、博士生导师。

Biological Evidence Sentence Extraction with Combination of LS-SVM and Conditional Random Field

ZHANG Liyuan,JI Donghong   

  1. (School of Computer,Wuhan University,Wuhan 430072,China)
  • Received:2014-06-20 Online:2015-05-15 Published:2015-05-15

摘要: 对于生物证据句子抽取问题,传统特征和贝叶斯分类模型构建的抽取系统效率不高,导致抽取结果的召回 率较低。为此,针对单句抽取问题和多句混合抽取问题,分别构建2 套系统。利用最小二乘支持向量机模型结合 新的特征组合和句子过滤模块构建系统1,解决传统特征涵盖不全面的问题,并在系统1 中融入条件随机场模型, 融合候选句判别规则建立系统2,解决连续多句合并的问题。实验结果表明,在单句抽取问题上,相比贝叶斯模型 的基准系统,系统1 召回率和F 值分别提高39. 7% 和12. 9% ,在多句混合抽取问题上,相比基于正例和无标记样本 学习系统,系统2 的召回率提高了37. 1% 。

关键词: 生物证据句子, 特征结合, 支持向量机, 最小二乘支持向量机, 条件随机场

Abstract: For the Gene Ontology Evidence Sentences(GOES) extraction problem,the recall rate and efficiency of the traditional system built on traditional features and Bayesian classification model are relatively low. In order to solve this problem,two systems are built for the single sentence and joined sentences retrieval. System 1 is built on Support Vector Machine(SVM) model and new combination of features,which solves the problem of incomplete coverage. Conditional Random Field (CRF) model and the rules of identification of candidate sentence are added into System 1 to build System 2 which solve the problem of sentences combination. Experimental results show that,in the single sentence extraction problem,compared with the Bayesian model based system,the recall and F-value of System 1 are increased by 39. 7% and 12. 9% . In the joined sentences extraction problem,compared with the Learning from Positive and Unlabeled Documents for Retrieval(LPU) system,the recall of System 2 is increased by 37. 1% .

Key words: biological evidence sentence, feature combination, Support Vector Machine ( SVM), Least Squares Support Vector Machine(LS-SVM), Conditional Random Field(CRF)

中图分类号: