Abstract:
A method is proposed for recognizing the bio-entities (protein, gene, etc.) in biomedical literatures. This method is CRFs-based and selects the appropriate features to recognize the bio-entities. Then the exploitation of the contextual cues is studied to further improve performance. Experimental results show the introduction of contextual cues achieves a performance improvement of nearly 3 percent in F-score, therefore achieving a fairly good overall performance.
Key words:
text mining,
bio-entity recognition,
Conditional Random Fields(CRFs),
contextual cue
摘要: 介绍一个用于在生物医学文献中识别基因、蛋白质等生物实体的识别方法。该方法基于条件随机域方法,选取适当特征进行实体识别,利用上下文线索进一步提高识别性能。实验结果表明上下文线索的引入使识别性能在条件随机域方法基础上提高了近3%,从而获得了较好的最终识别效果。
关键词:
文本挖掘,
生物实体识别,
条件随机域,
上下文线索
CLC Number:
YANG Zhi-hao; LIN Hong-fei; LI Yan-peng. Bio-entity Recognition Based on Combination of Conditional Random Fields and Contextual Cues[J]. Computer Engineering, 2008, 34(7): 203-204,.
杨志豪;林鸿飞;李彦鹏. 条件随机域与上下文线索结合的生物实体识别[J]. 计算机工程, 2008, 34(7): 203-204,.