作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2009, Vol. 35 ›› Issue (13): 188-189,. doi: 10.3969/j.issn.1000-3428.2009.13.065

• 人工智能及识别技术 • 上一篇    下一篇

基于支持向量机的垃圾邮件过滤方法

王祖辉,姜 维   

  1. (哈尔滨工业大学信息管理与信息系统研究所,哈尔滨 150001)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2009-07-05 发布日期:2009-07-05

Spam Filter Approach Based on Support Vector Machine

WANG Zu-hui, JIANG Wei   

  1. (Research Center of Information Management and Information System, Harbin Institute of Technology, Harbin 150001)
  • Received:1900-01-01 Revised:1900-01-01 Online:2009-07-05 Published:2009-07-05

摘要: 针对中英文混合垃圾邮件过滤问题,提出一种基于支持向量机(SVM)的过滤方法和融合多种分类特征的框架。通过改进SVM中线性核的表示方式,解决存储空间和计算量问题。通过领域术语自动抽取技术,增强垃圾邮件过滤的语义单元识别能力,提高垃圾邮件分类性能。在跨语言大规模语料库上的实验表明,采用SVM比采用Good-Turing算法平滑的朴素贝叶斯模型泛化性能提高了6.13%,分类精度比最大熵模型提高了8.18%。

关键词: 垃圾邮件过滤, 支持向量机, 领域术语抽取

Abstract: This paper presents a spam filter approach based on Support Vector Machine(SVM) to deal with cross language E-mail including Chinese and English, which provides the ability of integrating more statistical information. It optimizes the representation of linear kernel to improve time complexity and storage complexity, and adopts domain term extraction to improve the ability of semantic unit recognition and the performance of spam filter. Experiments on large-scale cross language corpora show that SVM-based approach increases the precision by 6.13% compared to Naïve Bayes which is smoothed by Good-Turing, and increases classification accuracy by 8.18% compared to maximum entropy model.

Key words: spam filter, Support Vector Machine(SVM), domain term extraction

中图分类号: