摘要: 在构建面向主题的信息抽取系统时,抽取需求是明确抽取任务的前提。针对自然语言描述的抽取需求导致计算资源浪费与抽取效率降低的问题,提出面向主题的信息抽取需求的形式化定义,并研究抽取需求之间的关系。采用精简需求集拆分的方法构造等价精简需求集,消除多抽取需求之间存在的冗余。实验结果证明,等价精简需求集能够提高多需求抽取任务的运行效率。
关键词:
Web信息抽取,
主题,
抽取需求,
冗余分析,
描述模型
Abstract: In the construction of topic oriented information extraction system, extraction requirements are core of all extraction tasks. For natural language description of extraction requirements leads to a computational resource waste and extraction efficiency issues, a formal definition is proposed on the requirements of topic-oriented information extraction. On this basis, it researches on extracting relations between requirements. The reduced requirements set splitting method structural equivalences reduced requirements set, eliminate the extraction redundancy between requirements. Experimental results prove the equivalence of reduced requirements set can improve operating efficiency of the extraction tasks.
Key words:
Web information extraction,
topic,
extraction requirement,
redundancy analysis,
description model
中图分类号:
于龙, 蹇强. 面向主题的信息抽取需求描述与分析[J]. 计算机工程, 2012, 38(23): 57-59.
XU Long, JIAN Jiang. Description and Analysis of Topic-oriented Information Extraction Requirement[J]. Computer Engineering, 2012, 38(23): 57-59.