作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2007, Vol. 33 ›› Issue (01): 267-269. doi: 10.3969/j.issn.1000-3428.2007.01.093

• 开发研究与设计技术 • 上一篇    下一篇

基于Milter实现的中文垃圾邮件过滤系统

杨 洁,张建忠,申庆永,何 云   

  1. (南开大学计算机科学与技术系,天津 300071)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-01-05 发布日期:2007-01-05

Chinese Spam Filter System Based on Analysis Using Milter Interface

YANG Jie, ZHANG Jianzhong, SHEN Qingyong, HE Yun   

  1. (Department of Computer Science and Technology, Nankai University, Tianjin 300071)
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-01-05 Published:2007-01-05

摘要: 提出一种基于内容的中文垃圾邮件实时过滤系统的实现方案,该系统建立在Linux的Sendmail邮件服务器上,通过Milter接口实时提取邮件内容,并结合中文分词及文本分类算法对邮件实施分类和过滤。该系统可嵌入多种文本分类算法,具有良好的可扩展性。通过测试对该系统内嵌入的不同分类算法模型进行了分析和比较。

关键词: 邮件分类, 中文分词, 贝叶斯算法, K近邻算法

Abstract: This paper presents a scheme of a real-time Chinese spam mail filtering system based on content analysis. The system works on Sendmail mail server under Linux. It utilizes Milter interface to get the real-time e-mail content, and then classifies and filters it combined with Chinese word segmentation and text categorization algorithms. It has high expansibility since it can embed many kinds of text categorization algorithms. Furthermore, these different text categorization algorithms are analyzed and compared by experiments.

Key words: Mail classification, Chinese word segmentation, Bayes, KNN