作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2007, Vol. 33 ›› Issue (08): 80-82. doi: 10.3969/j.issn.1000-3428.2007.08.027

• 软件技术与数据库 • 上一篇    下一篇

基于差分贡献的垃圾邮件过滤特征选择方法

张文良,黄亚楼,倪维健   

  1. (南开大学软件学院,天津 300071)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-04-20 发布日期:2007-04-20

Approach to Feature Selection of Spam Filtering Based on Contribution Difference

ZHANG Wenliang, HUANG Yalou, NI Weijian   

  1. (College of Software, Nankai University, Tianjin 300071)
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-04-20 Published:2007-04-20

摘要: 垃圾邮件过滤本质上是一个二类文本分类问题,特征选择是其一个重要的组成部分。针对垃圾邮件过滤问题的特殊性,基于“差分贡献”的思想对文档频数和互信息量这两种传统的特征选择方法进行了改进,设计了新的垃圾邮件过滤特征选择方法。实验结果表明,基于差分贡献的特征选择方法使得垃圾邮件过滤的精度得到了有效的提高。

关键词: 垃圾邮件过滤, 特征选择, 文档频数, 互信息量

Abstract: Spam filtering is essentially a two-category text classification problem. Feature selection plays an important role in spam filtering. For the peculiarity of the two-category classification problem, improvement on traditional feature-selection approaches can be made. Based on the idea of "contribution difference", improvement on two traditional feature selection approaches, i.e, document frequency and mutual information is made. The experiment results show that the new approach significantly improve classification precisions.

Key words: Spam filtering, Feature selection, Document frequency, Mutual information