作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2011, Vol. 37 ›› Issue (6): 193-194. doi: 10.3969/j.issn.1000-3428.2011.06.066

• 人工智能及识别技术 • 上一篇    下一篇

基于局部词频指纹的论文抄袭检测算法

秦玉平 1,冷强奎 1,王秀坤 2,王春立 3   

  1. (1. 渤海大学信息科学与工程学院,辽宁 锦州 121000;2. 大连理工大学电子与信息工程学院,辽宁 大连 116024;3. 大连海事大学信息科学技术学院,辽宁 大连 116026)
  • 出版日期:2011-03-20 发布日期:2011-03-29
  • 作者简介:秦玉平(1965-),男,教授、博士,主研方向:检测算法,机器学习;冷强奎,硕士研究生;王秀坤,教授、博士生导师;王春立,教授、博士
  • 基金资助:

    国家自然科学基金资助项目(60603023);国家“973”计划基金资助项目(2001CCA00700)

Plagiarism-detection Algorithm for Scientific Papers Based on Local Word-frequency Fingerprint

QIN Yu-ping 1, LENG Qiang-kui 1, WANG Xiu-kun 2, WANG Chun-li 3   

  1. (1. College of Information Science and Engineering, Bohai University, Jinzhou 121000, China;2. School of Electronic and Information Engineering, Dalian University of Technology, Dalian 116024, China; 3. College of Information Science and Technology, Dalian Maritime University, Dalian 116026, China)
  • Online:2011-03-20 Published:2011-03-29

摘要:

提出一种基于局部词频指纹的论文抄袭检测算法。将句子看成文档的基本构成元素,对其进行有效关键词提取排序重构,根据编码和词频联合方式获取句子指纹,以此计算文本间相似度。在新闻网页精简集SOGOU-T上的实验结果表明,该算法在一定程度上克服了现有论文抄袭检测算法检测精度低的缺点,具有较快的检测速度。

关键词: 抄袭检测, 数字指纹, 局部词频, 相似度

Abstract:

An algorithm for plagiarism-detection of scientific papers based on local word-frequency fingerprint is presented. Sentence is regarded as the basic component elements of a document, and extracting efficient keywords, sorting and reconstructing them. According to the code and word-frequency, the fingerprints are get to compute text similarity degree. The identification experiments on SOGOU-T database are done with the algorithm. Experimental results show that it partly overcomes the shortage of existing plagiarism-detection of scientific papers, and it has better performance on identification precision and identification speed.

Key words: plagiarism-detection, digital fingerprint, local word-frequency, similarity

中图分类号: