作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

所属专题: 移动社交专题

• 移动社交专题 • 上一篇    下一篇

基于文本内容分析的微博广告过滤模型研究

高俊波,梅 波   

  1. (上海海事大学信息工程学院,上海 201306)
  • 收稿日期:2013-12-19 出版日期:2014-05-15 发布日期:2014-05-14
  • 作者简介:高俊波(1972-),男,副教授、博士,主研方向:计算智能,数据挖掘;梅 波,硕士研究生。
  • 基金资助:
    上海海事大学科研基金资助项目(20100093)。

Research on Microblog Advertisement Filtering Model Based on Text Content Analysis

GAO Jun-bo, MEI Bo   

  1. (College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China)
  • Received:2013-12-19 Online:2014-05-15 Published:2014-05-14

摘要: 针对新浪、腾讯等微博平台出现大量广告的问题,提出一个微博广告过滤模型。通过对数据的预处理,将采集到的微博原始数据转换成干净且计算机易处理的数据。在预处理阶段,根据微博文本的特点,对停用词表进行改进,以提高查准率,然后基于支持向量机构建一个训练分类器对数据进行训练,经过不断的学习和反馈,取得较好的分类效果。实验结果表明,该模型进行广告过滤时准确率超过90%,效果优于基于关键字的方法。

关键词: 微博, 文本处理, 向量空间模型, 支持向量机, 文本分类, 广告过滤

Abstract: In order to solve the problem of a large number of advertisements on Sina, Tencent microblog platform, this paper proposes a microblog advertisement filtering model. Through the data pretreatment, the raw data are converted into clean data and easy to be handled by the computer. In the pretreatment stage, according to the characteristics of the microblog, this paper emphatically improves the stop word list, and it plays a key role in improving precision. Then it builds a classifier based on support vector machine for training data, and through continuous learning and feedback, better classification results are achieved. Experimental results show that the model of advertisement filter achieves better effect, when filtering accuracy is more than 90%, which is better than the method based on keywords.

Key words: microblog, text processing, vector space model, Support Vector Machine(SVM), text classification, advertisement filtering

中图分类号: