作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2012, Vol. 38 ›› Issue (23): 57-59. doi: 10.3969/j.issn.1000-3428.2012.23.013

• 软件技术与数据库 • 上一篇    下一篇

面向主题的信息抽取需求描述与分析

于 龙,蹇 强   

  1. (解放军理工大学通信工程学院,南京 210007)
  • 收稿日期:2012-03-27 出版日期:2012-12-05 发布日期:2012-12-03
  • 作者简介:于 龙(1976-),男,博士研究生,主研方向:网络信息抽取,数据挖掘,物联网技术;蹇 强,博士
  • 基金资助:
    国家“863”计划基金资助项目(2010AA012404)

Description and Analysis of Topic-oriented Information Extraction Requirement

YU Long, JIAN Qiang   

  1. (Institute of Communication Engineering, PLA University of Science and Technology, Nanjing 210007, China)
  • Received:2012-03-27 Online:2012-12-05 Published:2012-12-03

摘要: 在构建面向主题的信息抽取系统时,抽取需求是明确抽取任务的前提。针对自然语言描述的抽取需求导致计算资源浪费与抽取效率降低的问题,提出面向主题的信息抽取需求的形式化定义,并研究抽取需求之间的关系。采用精简需求集拆分的方法构造等价精简需求集,消除多抽取需求之间存在的冗余。实验结果证明,等价精简需求集能够提高多需求抽取任务的运行效率。

关键词: Web信息抽取, 主题, 抽取需求, 冗余分析, 描述模型

Abstract: In the construction of topic oriented information extraction system, extraction requirements are core of all extraction tasks. For natural language description of extraction requirements leads to a computational resource waste and extraction efficiency issues, a formal definition is proposed on the requirements of topic-oriented information extraction. On this basis, it researches on extracting relations between requirements. The reduced requirements set splitting method structural equivalences reduced requirements set, eliminate the extraction redundancy between requirements. Experimental results prove the equivalence of reduced requirements set can improve operating efficiency of the extraction tasks.

Key words: Web information extraction, topic, extraction requirement, redundancy analysis, description model

中图分类号: