摘要: 支持向量机在垃圾邮件过滤中能达到较高的分类准确率,实际应用中,将正常邮件误判为垃圾邮件会给用户造成更大的损失。该文提出一个基于代价敏感支持向量机的垃圾邮件过滤方案,通过为正类和负类训练样本设置不同的错误惩罚系数对分类器进行训练,在保证较高的垃圾邮件召回率的前提下,尽可能降低正常邮件的误判率(假阳性率)。实验结果表明,该方案能有效地提高过滤器的整体性能,更好地满足垃圾邮件过滤的实际要求。
关键词:
支持向量机,
垃圾邮件过滤,
代价敏感,
假阳性
Abstract: SVM based filter can achieve higher accuracy in spam filtering. But in actual applications, it costs a lot for users to lose legitimate email. In this paper, a spam filtering method based on cost sensitive SVM is proposed. The standard SVM is reconstructed as cost sensitive learner by introducing various trade-off factors for positive and negative examples. The aim is to decrease the number of legitimate emails that are misclassified while maintain a high ratio spam recall. Experimental results show that the proposed method can enhance the filtering performance effectively.
Key words:
Support Vector Machine(SVM),
spam filtering,
cost sensitive,
false positive
中图分类号:
董建设;袁占亭;张秋余. 代价敏感支持向量机在垃圾邮件过滤中的应用[J]. 计算机工程, 2008, 34(10): 131-132.
DONG Jian-she; YUAN Zhan-ting; ZHANG Qiu-yu. Application of Cost Sensitive SVM in Spam Filtering[J]. Computer Engineering, 2008, 34(10): 131-132.