摘要: 提出一种新的写作风格相似度评估方法,利用不同作者写作时在文章语句节奏控制方面的特点,鉴别作者的写作风格,从而达到作者身份识别的目的。该方法构建节奏特征矩阵模型来描述文本的语句节奏,利用点积相似度算法以及改进的KL距离算法来度量节奏特征矩阵之间的差异。实验表明,该方法在文学作品的作者识别方面具有较高的准确率。
关键词:
文本挖掘,
作者身份识别,
文本相似度,
节奏特征,
多维矩阵
Abstract: This paper proposes a new method of authorship similarity assessment, which identifies the authorship by sentence rhythm features of articles. The method constructs a rhythm feature matrix to describe the Sentence Rhythm Feature(SRF) of the text, and uses the inner product similarity algorithm and improves Kullback-Leibler(KL) divergence algorithm to measure the difference between the rhythm feature matrixes. Experiments show that it can make rather good results in literature authorship identification.
Key words:
text mining,
authorship identification,
text similarity,
rhythm feature,
multi-dimensional matrix
中图分类号:
王少康, 董科军, 阎保平. 基于语句节奏特征的作者身份识别研究[J]. 计算机工程, 2011, 37(9): 4-5,8.
WANG Shao-Kang, DONG Ke-Jun, YAN Bao-Beng. Research on Authorship Identification Based on Sentence Rhythm Feature[J]. Computer Engineering, 2011, 37(9): 4-5,8.