摘要: 提出由本体驱动,并根据文档结构和特征匹配来进行信息定位和信息抽取的方法,并实现了一个用户指导的交互式信息抽取原型系统。有效地解决了信息抽取中涉及的同义词,一词多义等语义问题,以及数据项不完整和排序不固定的问题。
关键词:
信息抽取;本体;资源描述框架/Web 本体语言;生物数据
Abstract: A new approach to extract information from semi-structured Web documents is presented, which locates the data blocks needed in the documents by means of document structure and performs pattern matching based on ontology. Meanwhile, it implements an interactive information extraction prototype system. This approach can efficiently locate the information needed in document, and avoid the semantic problems such as synonyms, polysemy and units missing, etc.
Key words:
Information extraction; Ontology; RDF(s)/OWL; Biological data
成 瑜,何洁月. 本体驱动的半结构化 Web 生物数据抽取[J]. 计算机工程, 2006, 32(5): 192-194.
CHENG Yu, HE Jieyue. Ontology-driven Extracting of Semi-structure Web Biological Data[J]. Computer Engineering, 2006, 32(5): 192-194.