Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2008, Vol. 34 ›› Issue (11): 270-272.

• Developmental Research • Previous Articles     Next Articles

Email Remove-duplicate Algorithm Based on SHA-1

ZHANG Man, LI Bi-cheng, LIN Chen   

  1. (Information Engineering Institute, PLA Information Engineering University, Zhengzhou 450002)
  • Received:1900-01-01 Revised:1900-01-01 Online:2008-06-05 Published:2008-06-05

基于SHA-1的邮件去重算法

张 曼,李弼程,林 琛   

  1. (解放军信息工程大学信息工程学院,郑州 450002)

Abstract: The duplicate-emails in service terminal and client terminal wastes a lot of recouse. This paper presents an email remove-duplicated algorithm based on Secure Hash Algorithm 1(SHA-1). Based on the size of email, this algorithm detects similarity of emails by comparing sets of Hash value of all paragraphs or all sentences in emails. The experimental results show that this algorithm has a good performance in computing time.

Key words: duplicated-email, similarity, remove-duplicate algorithm

摘要: 在邮件服务端和邮件客户端,重复邮件浪费了大量资源。该文提出一种基于SHA-1的邮件去重算法,将邮件按大小分开处理,根据Hash值快速去除正文相同或相似的重复邮件。实验结果表明了该算法的有效性,其运行速度比传统方法快。

关键词: 重复邮件, 相似度, 去重算法

CLC Number: