作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2010, Vol. 36 ›› Issue (18): 194-196. doi: 10.3969/j.issn.1000-3428.2010.18.067

• 人工智能及识别技术 • 上一篇    下一篇

基于多分类器组合模型的垃圾邮件过滤

刘菊新,徐从富   

  1. (浙江大学计算机科学与技术学院,杭州 310027)
  • 出版日期:2010-09-20 发布日期:2010-09-30
  • 作者简介:刘菊新(1982-),男,硕士研究生,主研方向:人工智能,机器学习,文本分类;徐从富,副教授
  • 基金资助:
    国家“863”计划基金资助项目(2007AA01Z197)

Spam Filter Based on Multiple Classifier Combinational Model

LIU Jiu-xin, XU Cong-fu   

  1. (College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China)
  • Online:2010-09-20 Published:2010-09-30

摘要: 针对垃圾邮件过滤中代价不对等问题,即正常邮件被误判为垃圾邮件的代价远大于垃圾邮件被误判为正常邮件,构建一种使用 2层结构的组合分类器框架。对样本邮件进行预处理,使文本特征和行为特征相结合。在提高单分类器性能的基础上,对不同分类器进行组合优化,并通过反馈及时调整模型,实现高效的自学习功能。

关键词: 垃圾邮件过滤, 组合分类器, 2层结构, 比特熵, 误判率

Abstract: Aiming at the unequal cost problem of spam filter that the cost of ham misclassification is much more than the cost of spam misclassification, this paper proposes a combinational classifier with two-layer structure. Email samples are pre-processed. The filter combines the behavioral features and text features, and optimizes the combination of different classifiers based on improving the performance of a single one. The classifier adjusts the model by timely feedback to make the filter obtain efficient self-learning function.

Key words: spam filter, combinational classifier, two-layer structure, bit entropy, false positive rate

中图分类号: