中文产品评论中属性词抽取方法研究

doi:10.3969/j.issn.1000-3428.2011.12.009

计算机工程 ›› 2011, Vol. 37 ›› Issue (12): 26-28. doi: 10.3969/j.issn.1000-3428.2011.12.009

中文产品评论中属性词抽取方法研究

栗春亮，朱艳辉，徐叶强

(湖南工业大学计算机与通信学院，湖南株洲 412008)

收稿日期:2011-01-04 出版日期:2011-06-20 发布日期:2011-06-20
作者简介:栗春亮(1984－)，男，硕士研究生，主研方向：文本分类；朱艳辉，教授；徐叶强，硕士研究生
基金资助:
教育部人文社会科学研究青年基金资助项目(09YJCZH 019)；湖南省自然科学基金资助项目(10JJ3002)；中国包装总公司科研基金资助项目(2008-XK13)

Research of Attribute Word Extraction Method in Chinese Product Comment

LI Chun-liang, ZHU Yan-hui, XU Ye-qiang

(Institute of Computer & Communication, Hunan University of Technology, Zhuzhou 412008, China)

Received:2011-01-04 Online:2011-06-20 Published:2011-06-20

摘要/Abstract

摘要：

针对现有属性词抽取方法的准确率和覆盖率偏低问题，利用百度百科和分词后相邻词语同现比例识别专业领域生词，降低分词错误对属性词识别的影响，在中文产品评论语料中通过设计词性序列模板获得候选属性词集，该词性序列模板包含名词和名词短语模板、动词和动词短语模板，采用统计技术和自然语言处理技术筛选候选属性词。实验结果表明，对于3 623篇手机评论文章，利用该方法可获得1 732个属性词，准确率为0.565、召回率为0.726、调和平均值为0.636，具有较好的抽取性能。

关键词: 产品评论, 生词识别, 序列模板, 属性词

Abstract:

Aiming at solving problems of relatively low precision, rate of coverage when using existing attribute word extraction methods, this paper adopts Baidu Baike and co-occurrence proportion of adjacent words after word segmentation to identify new domain words, decreases impact on recognition of attribute word caused by segmentation errors. This paper designs part of speech sequence templates which contain noun and noun phrase templates, verb and verb phrase templates to obtain attribute word candidates from Chinese product comments, then utilizes statistical technique and natural language processing technique to filter attribute word candidates. Experimental results show that for the 3 623 mobile phone comments, this method obtains 1 732 attribute words, the precision, recall and f-measure reach 0.565, 0.726 and 0.636, and it has good extraction performance.

Key words: product comment, new word recognition, sequence template, attribute word

中图分类号:

TP393

栗春亮, 朱艳辉, 徐叶强. 中文产品评论中属性词抽取方法研究[J]. 计算机工程, 2011, 37(12): 26-28.

LI Chun-Liang, SHU Yan-Hui, XU Xie-Jiang. Research of Attribute Word Extraction Method in Chinese Product Comment[J]. Computer Engineering, 2011, 37(12): 26-28.

http://www.ecice06.com/CN/Y2011/V37/I12/26

[1]	杨凤芹,宋美佳,孙铁利,孙红光. 面向中文产品评论的完整评价对象抽取方法[J]. 计算机工程, 2017, 43(6): 169-176.
[2]	孟佳娜，段晓东，杨亮. 基于特征变换的跨领域产品评论倾向性分析[J]. 计算机工程, 2013, 39(10): 167-171.
[3]	徐叶强, 朱艳辉, 王文华, 杜锐, 鲁琳, 邓程, 刘洪婧. 中文产品评论中评价对象的识别研究[J]. 计算机工程, 2012, 38(20): 140-143.
[4]	邱云飞, 王建坤, 邵良杉, 刘大有. 基于用户行为的产品垃圾评论者检测研究[J]. 计算机工程, 2012, 38(11): 254-257,261.
[5]	伍星;何中市;黄永文. 基于弱监督学习的产品特征抽取[J]. 计算机工程, 2009, 35(13): 199-201.

选择文件类型/文献管理软件名称

选择包含的内容

中文产品评论中属性词抽取方法研究

Research of Attribute Word Extraction Method in Chinese Product Comment

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 5

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

中文产品评论中属性词抽取方法研究

Research of Attribute Word Extraction Method in Chinese Product Comment

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 5

编辑推荐

Metrics

本文评价