作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2011, Vol. 37 ›› Issue (20): 261-263. doi: 10.3969/j.issn.1000-3428.2011.20.089

• 开发研究与设计技术 • 上一篇    下一篇

基于语义的林产品贸易文本信息结构化研究

陈 钊,李 嘉   

  1. (北京林业大学信息学院信息管理系,北京 100083)
  • 收稿日期:2011-03-18 出版日期:2011-10-20 发布日期:2011-10-20
  • 作者简介:陈 钊(1970-),男,副教授,主研方向:信息推送;李 嘉,硕士研究生
  • 基金资助:
    中央高校基本科研业务费专项基金资助项目“多元异构林产品信息推送平台研究与实践”(BLYX200928)

Research on Forest Products Trade Text Messages Structuring Based on Semantic

CHEN Zhao, LI Jia   

  1. (Department of Information Management, School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China)
  • Received:2011-03-18 Online:2011-10-20 Published:2011-10-20

摘要: 根据林产品贸易文本信息推送中信息结构化存储的需要,结合语义识别的基本原理和基于规则的信息抽取方法,提出一种基于规则的林产品贸易文本信息抽取方法,利用林产品贸易文本信息的特征,定义林产品贸易文本信息的文本层次识别规则,采用创建数据库和数据表匹配识别规则,给出识别规则匹配的正则表达式和文本内容截取识别规则,以抽取需要的特定事实信息,并以一种结构化的形式存储于数据库中。通过对实际林产品贸易网站的文本信息结构化抽取,证明该研究在林产品贸易信息推送中具有较好的应用价值。

关键词: 语义, 林产品, 贸易文本信息, 结构化, 信息抽取, 识别规则

Abstract: Based on the needs of structured storage of information in the forest products trade text messages information push and combined with the basic principle of semantic recognition and the rule-based information extraction, a research on forest products trade text messages structuring based on semantic is proposed. Took advantage of the characteristics of forest products trade text messages, this paper defines the level of text recognition rules in the trade text messages, uses match identification rules of creating databases and data tables, defines the regular expressions with matching identification rules and the rules of intercept text recognition to extract the special factual information. The information is stored in the database as a structured form. Through the text structured information extraction in the trade text messages, it proves that the research has good value in the forest products trade information push.

Key words: semantic, forest products, trade text messages, structuring, information extraction, recognition rules

中图分类号: