参考文献
[1]Campbell D M,Chen W R,Smith R D.Copy Detection Systems for Digital Documents[C]//Proceedings of IEEE Advances in Digital Libraries.Washington D.C.,USA:IEEE Press,2000:78-88.
[2]Si A,Leong H V,Lau R W H.Check:A Document Plagiarism Detection System[C]//Proceedings of 1997 ACM Symposium on Applied Computing.New York,USA:ACM Press,1997:70-77.
[3]Phan X H,Nguyen L M,Horiguchi S.Learning to Classify Short and Sparse Text & Web with Hidden Topics from Large-scale Data Collections[C]//Proceedings of the 17th International Conference on World Wide Web.New York,USA:ACM Press,2008:91-100.
[4]Charikar M S.Similarity Estimation Techniques from Rounding Algorithms[C]//Proceedings of the 34th Annual ACM Symposium on Theory of Computing.New York,USA:ACM Press,2002:380-388.
[5]Bernstein Y,Zobel J.Accurate Discovery of Co-deriva-tive Documents via Duplicate Text Detection[J].Info-rmation Systems,2006,31(7):595-609.
(下转第63页)
(上接第57页)
[6]董博,郑庆华,宋凯磊,等.基于多 SimHash指纹的近似文本检测[J].小型微型计算机系统,2011,32(11):2152-2157.
[7]Wang Meng,Lin Lanfen,Wang Jing,et al.Improving Short Text Classification Using Public Search Engines[M].Berlin,Germany:Springer-Verlag,2013.
[8]Ni Xingliang,Quan Xiaojun,Lu Zhi,et al.Short Text Clustering by Finding Core Terms[J].Knowledge and Information Systems,2011,27(3):345-365.
[9]Gong Caichun,Huang Yulan,Cheng Xueqi,et al.Detecting Near-duplicates in Large-scale Short Text Databases[M].Berlin,Germany:Springer-Verlag,2008.
[10]Coskun B,Giura P.Mitigating SMS Spam by Online Detection of Repetitive Near-duplicate Messages[C]//Proceedings of IEEE International Conference on Com-munications.Washington D.C.,USA:IEEE Press,2012:999-1004.
[11]Datar M,Immorlica N,Indyk P,et al.Locality-sensitive Hashing Scheme Based on P-stable Distribu-tions[C]//Proceedings of the 20th Annual Symposium on Computational Geometry.New York,USA:ACM Press,2004:253-262.
[12]Patidar A K,Agrawal J,Mishra N.Analysis of Different Similarity Measure Functions and Their Impacts on Shared Nearest Neighbor Clustering Approach[J].International Journal of Computer Applications,2012,40(16).
[13]Li Liangyi.ik-analyzer java开源中文分词器[EB/OL].(2014-11-20).http://code.google.com/p/ik-analyzer/.
[14]Uddin M S,Roy C K,Schneider K A,et al.On the Effectiveness of Simhash for Detecting Near-miss Clones in Large Scale Software Systems[C]//Proceedings of the 18th Working Conference on Reverse Engineering.Washington D.C.,USA:IEEE
Press,2011:13-22.
编辑索书志 |