摘要: 在线学习社区中的中文真词错误会给中文文本语义的理解带来困难,从而影响基于在线学习社区文本的学习分析效果。为此,提出一种针对在线学习社区短文本的真词错误检测与修复方法。构建混淆词集和混淆词对应的固定搭配知识库,基于n-gram概率统计模型、上下文语境模型和固定搭配知识库,分别计算每一个混淆词的n-gram得分、上下文语境得分和固定搭配得分,对其加权求和作为判断原文是否出错的依据,并将最高得分的混淆词作为修复意见。实验结果表明,该方法召回率、准确率与修复率分别为85.6%、86.3%、92.9%,能准确有效检测与修复学习社区中的中文真词错误。
关键词:
真词错误,
混淆词集,
n-gram概率统计模型,
上下文语境,
中文固定搭配
Abstract: The Chinese real-word error in the online learning community will make it difficult to understand the semantics of Chinese texts,which affects the learning and analyzing effects based on online learning community texts.To this end,this paper proposes a real-word error detection and repairing method for short texts in online learning communities.Firstly,the confusion word set and the fixed collocation knowledge base corresponding to the confusion word are automatically constructed.Then,n-gram scores,context scores and fixed match scores are calculated for each confusion word based on the n-gram probability statistical model,context model,and fixed collocation knowledge base respectively.Finally,the weighted summation is used as the basis for judging whether the original text is wrong,and the confusing word with the highest score is used as the repair opinion.Experimental results show that this method can effectively detect and repair Chinese real-word error in the learning community,whose Recall,Precision,and Correction are 85.6%,86.3%,92.9% respectively.
Key words:
real-word error,
confusion word set,
n-gram probability statistical model,
context,
Chinese fixed collocation
中图分类号:
叶俊民, 徐松, 罗达雄, 王志锋, 陈曙. 一种中文真词错误检测与修复方法[J]. 计算机工程, 2019, 45(8): 178-183.
YE Junmin, XU Song, LUO Daxiong, WANG Zhifeng, CHEN Shu. A Chinese Real-word Error Detection and Repairing Method[J]. Computer Engineering, 2019, 45(8): 178-183.