面向文档集抄袭的快速全文识别算法

doi:10.3969/j.issn.1000-3428.2010.18.068

计算机工程 ›› 2010, Vol. 36 ›› Issue (18): 197-199. doi: 10.3969/j.issn.1000-3428.2010.18.068

面向文档集抄袭的快速全文识别算法

胡明晓

(温州大学物理与电子信息工程学院，浙江温州 325035)

出版日期:2010-09-20 发布日期:2010-09-30
作者简介:胡明晓(1965－)，男，讲师、硕士，主研方向：人工智能，计算机图形学
基金资助:
温州市科技计划基金资助项目(H20090049)

Quick Full-text Identification Algorithm for Document Set Plagiarism

HU Ming-xiao

(College of Physics & Electronic Information Engineering, Wenzhou University, Wenzhou 325035, China)

Online:2010-09-20 Published:2010-09-30

摘要/Abstract

摘要： 为实现局部文档集抄袭的识别，将基于回退数与前跳数的广义编辑距离的近似值定义为文档抄袭距离，分析该文档抄袭距离满足三角不等式成立和弱三角不等式成立时的充分条件，提出一种快速全文识别算法，能识别出文档集内涉嫌抄袭的所有文档有序对。实验结果表明，相比其他算法，该算法在兼顾识别召回率的同时效率提高了3倍~5倍。

关键词: 抄袭识别, 文档集, 三角不等式, 电子文档管理

Abstract: In order to identify plagiarisms for local document set, this paper defines the document plagiarism distance as an approximate generalized edit distance based on returning number and skipping number, then uses this distance. After analyzing the sufficient conditions of satisfying triangle inequality or weak triangle inequality for the distance, it proposes an efficient full-text identification algorithm which can find out all ordered plagiarizing document pairs faithfully. Experimental results show that the algorithm improves the identifying efficiency by 3 times to 5 times meanwhile it does not lower the recall ratio.

Key words: plagiarism identification, document set, triangle inequality, electronic document management

中图分类号:

TP393

胡明晓. 面向文档集抄袭的快速全文识别算法[J]. 计算机工程, 2010, 36(18): 197-199.

HU Meng-Xiao. Quick Full-text Identification Algorithm for Document Set Plagiarism[J]. Computer Engineering, 2010, 36(18): 197-199.

http://www.ecice06.com/CN/Y2010/V36/I18/197

[1]	翟学敏;刘渊;刘波;毕蓉蓉. 改进的XML智能数据清洗策略[J]. 计算机工程, 2009, 35(4): 66-68.
[2]	陈晓云;王平;何春霞;冷明伟. 基于三角不等式原理的TTSAS聚类加速算法[J]. 计算机工程, 2006, 32(17): 97-99,1.

选择文件类型/文献管理软件名称

选择包含的内容

面向文档集抄袭的快速全文识别算法

Quick Full-text Identification Algorithm for Document Set Plagiarism

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 2

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

面向文档集抄袭的快速全文识别算法

Quick Full-text Identification Algorithm for Document Set Plagiarism

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 2

编辑推荐

Metrics

本文评价