Abstract:
Automatic extracting protein-protein interaction information from biomedical literature can help to build protein relation network and design new drugs. This paper presents a multiple kernels learning based approach to automatically extract protein-protein interactions from biomedical literature. The approach combines feature-based kernel, tree kernel and graph kernel. In particular, it extends shortest path-enclosed tree and dependency path tree to capture richer contextual information. Experimental evaluations show that the method can achieve state-of-the-art performance with respect to comparable evaluations, with 63.9% F-score and 87.83% AUC on the AImed corpus.
Key words:
text mining,
information extraction,
protein-protein interaction extraction,
kernel method,
multiple kernels learning
摘要: 从生物医学文献中抽取蛋白质交互作用关系对蛋白质知识网络的建立、新药的研制等均具有重要的意义。为此,提出一种基于多核学习的方法,用于从文献中自动抽取蛋白质关系信息。该方法融合基于特征的核、树核以及图核,并扩展最短路径依存树以及依存路径以利用更多的上下文关系信息。在AImed语料上的实验得到63.9%的F值和87.83%的AUC值,表明该方法具有较好的性能。
关键词:
文本挖掘,
信息抽取,
蛋白质关系抽取,
核方法,
多核学习
CLC Number:
TANG Nan, YANG Zhi-Hao, LIN Hong-Fei, LI Pan-Feng. Protein-protein Interaction Extraction from Medical Literature Based on Multiple Kernels Learning[J]. Computer Engineering, 2011, 37(10): 184-186.
唐楠, 杨志豪, 林鸿飞, 李彦鹏. 基于多核学习的医学文献蛋白质关系抽取[J]. 计算机工程, 2011, 37(10): 184-186.