作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2008, Vol. 34 ›› Issue (11): 270-272. doi: 10.3969/j.issn.1000-3428.2008.11.097

• 开发研究与设计技术 • 上一篇    下一篇

基于SHA-1的邮件去重算法

张 曼,李弼程,林 琛   

  1. (解放军信息工程大学信息工程学院,郑州 450002)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2008-06-05 发布日期:2008-06-05

Email Remove-duplicate Algorithm Based on SHA-1

ZHANG Man, LI Bi-cheng, LIN Chen   

  1. (Information Engineering Institute, PLA Information Engineering University, Zhengzhou 450002)
  • Received:1900-01-01 Revised:1900-01-01 Online:2008-06-05 Published:2008-06-05

摘要: 在邮件服务端和邮件客户端,重复邮件浪费了大量资源。该文提出一种基于SHA-1的邮件去重算法,将邮件按大小分开处理,根据Hash值快速去除正文相同或相似的重复邮件。实验结果表明了该算法的有效性,其运行速度比传统方法快。

关键词: 重复邮件, 相似度, 去重算法

Abstract: The duplicate-emails in service terminal and client terminal wastes a lot of recouse. This paper presents an email remove-duplicated algorithm based on Secure Hash Algorithm 1(SHA-1). Based on the size of email, this algorithm detects similarity of emails by comparing sets of Hash value of all paragraphs or all sentences in emails. The experimental results show that this algorithm has a good performance in computing time.

Key words: duplicated-email, similarity, remove-duplicate algorithm

中图分类号: