作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2018, Vol. 44 ›› Issue (5): 194-200. doi: 10.19678/j.issn.1000-3428.0046434

• 人工智能及识别技术 • 上一篇    下一篇

基于自适应性分类器的垃圾邮件检测

陈龙,梁意文,谭成予   

  1. 武汉大学 计算机学院,武汉 430072
  • 收稿日期:2017-03-20 出版日期:2018-05-15 发布日期:2018-05-15
  • 作者简介:陈龙(1992—),男,硕士研究生,主研方向为人工免疫学、网络安全;梁意文,教授、博士生导师;谭成予,副教授。
  • 基金资助:
    国家自然科学基金(61170306);国家高技术研究发展计划项目(2012AA09A410)。

Spam Detection Based on Adaptive Classifier

CHEN Long,LIANG Yiwen,TAN Chengyu   

  1. Computer School,Wuhan University,Wuhan 430072,China
  • Received:2017-03-20 Online:2018-05-15 Published:2018-05-15

摘要: 垃圾邮件形式内容多变,容易伪装成正常邮件而绕过检测,其中新型垃圾邮件的检测漏报率较高。为此,结合反向选择和支持向量机(SVM)的思想,设计一种新的自适应性分类器并应用于垃圾邮件检测。使用SVM的最优超平面对邮件进行预分类,得到与预测模型匹配的“正常邮件”和垃圾邮件,运用反向选择算法(NSA)对筛选出的“正常邮件”数据集进行二次过滤以检测出新型垃圾邮件,并利用含有标签的正常邮件和垃圾邮件集合自适应更新原有的最优超平面,循环上述检测过程直至垃圾邮件的识别率趋于稳定,最终得到的最优超平面符合当前检测最优。实验结果表明,相对于SVM与NSA,该检测方法能在保证正常邮件高识别率的基础上,提高新型垃圾邮件的识别率。

关键词: 新型垃圾邮件, 反向选择算法, 支持向量机, 自适应, 分类器

Abstract: The form of spam is changeable,easy to disguise as a normal mail and bypass the test,and the new spam has higher detection rate.To solve this problem,based on the idea of negative selection and Support Vector Machine(SVM),a new adaptive classifier is designed and applied to spam detection.The optimal hyperplane of SVM is used to preclassify the mail and get so-called normal emails and spam that match the prediction model,then the second filtration is used in the previous so-called normal emails to get the final normal emails and new spam by the Negative Selection Algorithm(NSA),and the labeled normal emails and spam are used to update the initial optimal hyperplane adaptively,the cycle isn’t stopped until the spam detection rate trend to be stable.The experimental results show that,compared with SVM and NSA,the proposed detection method can improve the recognition rate of new type of spam on the basis of guaranteeing the high recognition rate of normal mail.

Key words: new spam, Negative Selection Algorithm(NSA), Support Vector Machine(SVM), adaption, classifier

中图分类号: