摘要: 针对垃圾邮件短小、一定时间内在网络上重复、大量地散发的特点,提出了基于签名的近似垃圾邮件检测算法(ASD)。该算法以句为基本单位,求取邮件所含的全部句子的摘要,垃圾邮件的近似检测转变为两个摘要集近似度的比较。通过与近似文本查询算法DSC、DSC-SS、I-Match 的比较,ASD 算法在近似垃圾邮件查询中,表现出样本集的存储空间大小适中、运算时间短、鲁棒性高、高准确率、高召回率的特征。
关键词:
近似垃圾邮件检测;垃圾邮件过滤;签名;文本近似度;查询
Abstract: In the term of characteristics of spammers sending spam in bulk over a relatively short period of time, this paper presents a signature-based approximate spam detection(ASD) algorithm. ASD detects similarity of E-mails by comparing sets of digests of all sentences in E-mails. The paper compares ASD algorithm with DSC, DSC-SS, I-Match algorithms. To approximate spam detection, ASD has a good performance in samples storage, computing time, robustness, precision and recall
Key words:
Approximate spam detection; Spam filtering; Signature; Similarity of documents; Query
詹 川,卢显良,侯孟书,刘志辉. 基于签名的近似垃圾邮件检测算法[J]. 计算机工程, 2006, 32(5): 122-124.
ZHAN Chuan, LU Xianliang, HOU Mengshu, LIU Zhihui. A Signature-based Approximate Spam Detection Algorithm[J]. Computer Engineering, 2006, 32(5): 122-124.