计算机工程 ›› 2011, Vol. 37 ›› Issue (10): 184-186.doi: 10.3969/j.issn.1000-3428.2011.10.063

• 人工智能及识别技术 • 上一篇    下一篇

基于多核学习的医学文献蛋白质关系抽取

唐 楠,杨志豪,林鸿飞,李彦鹏   

  1. (大连理工大学计算机科学与技术学院,辽宁 大连 116024)
  • 出版日期:2011-05-20 发布日期:2011-05-20
  • 作者简介:唐 楠(1986-),女,硕士研究生,主研方向:文本挖掘;杨志豪,副教授、博士;林鸿飞,教授、博士、博士生导师; 李彦鹏,博士研究生
  • 基金项目:
    国家自然科学基金资助项目(60373095, 60673039);国家“863”计划基金资助项目(2006AA01Z151)

Protein-protein Interaction Extraction from Medical Literature Based on Multiple Kernels Learning

TANG Nan, YANG Zhi-hao, LIN Hong-fei, LI Yan-peng   

  1. (College of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China)
  • Online:2011-05-20 Published:2011-05-20

摘要: 从生物医学文献中抽取蛋白质交互作用关系对蛋白质知识网络的建立、新药的研制等均具有重要的意义。为此,提出一种基于多核学习的方法,用于从文献中自动抽取蛋白质关系信息。该方法融合基于特征的核、树核以及图核,并扩展最短路径依存树以及依存路径以利用更多的上下文关系信息。在AImed语料上的实验得到63.9%的F值和87.83%的AUC值,表明该方法具有较好的性能。

关键词: 文本挖掘, 信息抽取, 蛋白质关系抽取, 核方法, 多核学习

Abstract: Automatic extracting protein-protein interaction information from biomedical literature can help to build protein relation network and design new drugs. This paper presents a multiple kernels learning based approach to automatically extract protein-protein interactions from biomedical literature. The approach combines feature-based kernel, tree kernel and graph kernel. In particular, it extends shortest path-enclosed tree and dependency path tree to capture richer contextual information. Experimental evaluations show that the method can achieve state-of-the-art performance with respect to comparable evaluations, with 63.9% F-score and 87.83% AUC on the AImed corpus.

Key words: text mining, information extraction, protein-protein interaction extraction, kernel method, multiple kernels learning

中图分类号: