计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

基于模式匹配与半监督学习的评价对象抽取

宋 晖,史南胜   

  1. (东华大学计算机科学与技术学院,上海 201620)
  • 收稿日期:2012-10-17 出版日期:2013-10-15 发布日期:2013-10-14
  • 作者简介:宋 晖(1971-),女,教授,主研方向:Web信息挖掘,智能信息处理;史南胜,硕士研究生

Comment Object Extraction Based on Pattern Matching and Semi-supervised Learning

SONG Hui, SHI Nan-sheng   

  1. (School of Computer Science & Technology, Donghua University, Shanghai 201620, China)
  • Received:2012-10-17 Online:2013-10-15 Published:2013-10-14

摘要: 针对产品评论中评价对象的抽取问题,提出一种基于模式匹配与半监督学习的抽取方法。通过大量样本统计,获得种子规则集,以抽取有效评价句,利用句法结构组合以及词性距离相关性算法抽取评价对象,将种子规则和评价对象存入相应的模式库,并通过半监督学习方法与规则的动态替换,进行规则的学习与评价对象的扩充训练。实验结果表明,该方法的抽取效果较好,证明了方法的可行性。

关键词: 评价对象, 意见挖掘, 词性搭配, 词性距离相关性算法, 模式匹配, 有效评价句

Abstract: This paper presents an extraction method based on pattern matching and semi-supervised learning on product comment targets. This method gets seed rules set through making statistics on a large number samples to extract the effective evaluation sentences, and extracts accurate opinion targets through the combination of syntactic structures and the Part of Speech(POS)-distance Correlation Algorithm(PCA). Seed rules and opinion targets are stored in the corresponding pattern libraries, the training and expansion of the learning of rules and opinion targets is carried out by the semi-supervised learning methods and rules of dynamic replacement. Experimental results exhibit measurable improvement, and prove the feasibility of this method.

Key words: comment object, opinion mining, combination of Part of Speech(POS), Part of Speech(POS)-distance Correlation Algorithm (PCA), pattern matching, effective evaluation sentence

中图分类号: