基于前后文n-gram模型的古汉语句子切分

doi:10.3969/j.issn.1000-3428.2007.03.069

计算机工程 ›› 2007, Vol. 33 ›› Issue (03): 192-193. doi: 10.3969/j.issn.1000-3428.2007.03.069

基于前后文n-gram模型的古汉语句子切分

陈天莹1，陈蓉1，潘璐璐1，李红军1,2，于中华1

(1. 四川大学计算机学院，成都610064；2. 西南科技大学计算机学院，绵阳 621002)

收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-02-05 发布日期:2007-02-05

Archaic Chinese Punctuating Sentences Based on Context N-gram Model

CHEN Tianying 1, CHEN Rong1, PAN Lulu1, LI Hongjun1,2, YU Zhonghua1

(1. Dept. of Computer Science, Sichuan University, Chengdu 610064; 2. Dept. of Computer Science, Southwest University of Science and Technology, Mianyang 621002)

Received:1900-01-01 Revised:1900-01-01 Online:2007-02-05 Published:2007-02-05

摘要/Abstract

摘要： 提出了基于前后文n-gram模型的古汉语句子切分算法，该算法能够在数据稀疏的情况下，通过收集上下文信息，对切分位置进行比较准确的预测，从而较好地处理小规模训练语料的情况，降低数据稀疏对切分准确率的影响。采用《论语》对所提出的算法进行了句子切分实验，达到了81%的召回率和52%的准确率。

关键词: n-gram模型, 数据稀疏, 平滑技术, 基于前后文的n-gram模型

Abstract: An algorithm of punctuating the sentences in archaic Chinese language based on context n-gram model is proposed in the paper. The algorithm can make comparatively accurate prediction of the punctuating-positions of the text under data-sparse instances by collecting and calculating context information to better analyze small-scaled corpus and meanwhile, to bring down the effects of the data-sparse plight on the global accuracy. At last, the paper selects the analects of Confucius ( Lunyu ) to test the algorithm introduced, and the results show that the recall and the precision achieve 81% and 52% respectively.

Key words: N-gram model, Data sparse, Smoothing technology, N-gram model based on context

陈天莹;陈蓉;潘璐璐;李红军;于中华. 基于前后文n-gram模型的古汉语句子切分[J]. 计算机工程, 2007, 33(03): 192-193.

CHEN Tianying ; CHEN Rong; PAN Lulu; LI Hongjun; YU Zhonghua. Archaic Chinese Punctuating Sentences Based on Context N-gram Model[J]. Computer Engineering, 2007, 33(03): 192-193.

http://www.ecice06.com/CN/Y2007/V33/I03/192

[1]	汤佳欣, 陈阳, 周孟莹, 王新. 深度学习方法在兴趣点推荐中的应用研究综述[J]. 计算机工程, 2022, 48(1): 12-23,42.
[2]	邓路佳,刘平山. 基于GMM-FMs的广告点击率预测研究[J]. 计算机工程, 2019, 45(5): 122-126.
[3]	杨林,顾军华,官磊. PMUS-HOSGD张量分解方法及其在标签推荐中的应用[J]. 计算机工程, 2018, 44(11): 300-305,312.
[4]	贾伟洋,李书琴,李昕宇,刘斌. 基于离散量和用户兴趣贴近度的协同过滤推荐算法[J]. 计算机工程, 2018, 44(1): 226-232,237.
[5]	钱晓捷,张路一. 融合评分结构特征与偏好距离的协同过滤推荐算法[J]. 计算机工程, 2017, 43(5): 185-190,196.
[6]	杨亚东,熊庆国. 基于动态标签偏好信任概率矩阵分解模型的推荐算法[J]. 计算机工程, 2017, 43(10): 160-166.
[7]	张青,吕钊. 基于主题扩展的领域问题分类方法[J]. 计算机工程, 2016, 42(9): 202-207,213.
[8]	周泓宇,梁刚,冯程,刘江冬. 结合正反相似度的协同过滤推荐算法[J]. 计算机工程, 2016, 42(10): 51-56.
[9]	任看看,钱雪忠. 协同过滤算法中的用户相似性度量方法研究[J]. 计算机工程, 2015, 41(8): 18-22,31.
[10]	柯良文,王靖. 基于用户特征迁移的协同过滤推荐[J]. 计算机工程, 2015, 41(1): 37-43.
[11]	高宏伟，何加铭，郑紫微，曾兴斌. 无线多跳网中的视频传输速率优化算法[J]. 计算机工程, 2014, 40(7): 23-26.
[12]	程小林，熊焰，刘青文，陆琦玮. 一种基于自适应局部融合参数的协同过滤方法[J]. 计算机工程, 2014, 40(1): 39-44.
[13]	杨帅, 薛文, 谢永红, 王晓宇, 祝小杰. 基于单分类的协同过滤推荐算法[J]. 计算机工程, 2011, 37(19): 59-61.
[14]	方彬, 胡侠, 王灿. 基于用户行为的盲人图书推荐方法[J]. 计算机工程, 2011, 37(15): 271-273.
[15]	胡福华, 郑小林, 干红华. 基于相似度传递的协同过滤算法[J]. 计算机工程, 2011, 37(10): 50-51.

选择文件类型/文献管理软件名称

选择包含的内容

基于前后文n-gram模型的古汉语句子切分

Archaic Chinese Punctuating Sentences Based on Context N-gram Model

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于前后文n-gram模型的古汉语句子切分

Archaic Chinese Punctuating Sentences Based on Context N-gram Model

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价