基于KNN的不良文本过滤方法

doi:10.3969/j.issn.1000-3428.2009.24.023

计算机工程 ›› 2009, Vol. 35 ›› Issue (24): 69-71. doi: 10.3969/j.issn.1000-3428.2009.24.023

基于KNN的不良文本过滤方法

王洪彬，刘晓洁

(四川大学计算机学院，成都 610065)

收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2009-12-20 发布日期:2009-12-20

Reactionary Text Filtering Method Based on K-Nearest Neighbor

WANG Hong-bin, LIU Xiao-jie

(School of Computer, Sichuan University, Chengdu 610065)

Received:1900-01-01 Revised:1900-01-01 Online:2009-12-20 Published:2009-12-20

摘要/Abstract

摘要： 不良文本过滤是当前的一个研究热点。通过对χ2 统计量的具体分析，证明χ2 统计量在2类文本特征项提取过程中特有的优势。提出正面文本阈值δ，并从理论上推断出该值的大小。在此基础上改进KNN算法，消除了KNN算法中N的不确定性，彻底实现了无参性，大幅减少了分类所用的时间。实验证明，该算法符合Web实时在线分类的要求。

关键词: KNN算法, 不良文本过滤, χ2统计量

Abstract: Reactionary text filtering is a hot research now. This paper proves that statistics χ2 has the unique advantages in the features extraction of the two types of texts based on statistics χ2 analysis. It proposes the threshold δ of the positive texts and infers the value of it in theory, and the K-Nearest Neighbor(KNN) algorithm is improved. This algorithm eliminates the uncertainty of KNN algorithm, realizes no reference, and reduces the time used in the text categorization. Experimental results show that the algorithm meets the real-time online text categorization.

Key words: K-Nearest Neighbor(KNN) algorithm, reactionary text filtering, statistics χ2

中图分类号:

TP391

王洪彬;刘晓洁. 基于KNN的不良文本过滤方法[J]. 计算机工程, 2009, 35(24): 69-71.

WANG Hong-bin; LIU Xiao-jie. Reactionary Text Filtering Method Based on K-Nearest Neighbor[J]. Computer Engineering, 2009, 35(24): 69-71.

http://www.ecice06.com/CN/Y2009/V35/I24/69

[1]	戴志诚,李小年,陈增照,何秀玲. 基于KNN算法的可变权值室内指纹定位算法[J]. 计算机工程, 2019, 45(6): 310-314.
[2]	刘应东, 牛惠民. 基于k-最近邻图的小样本KNN分类算法[J]. 计算机工程, 2011, 37(9): 198-200.
[3]	刘金岭. 基于主题的中文短信文本分类研究[J]. 计算机工程, 2010, 36(4): 30-32.
[4]	周芳. 基于KNN-ANN算法的边际电价预测[J]. 计算机工程, 2010, 36(11): 188-189,194.
[5]	李仕进;陈蓉;田玲;陈云惠;张昱;蒋永光;于中华. 基于贝叶斯方法的中医“症-证”分析[J]. 计算机工程, 2008, 34(1): 212-214.
[6]	卢鋆;吴忠望;王宇;卢昱. 基于kNN算法的异常行为检测方法研究[J]. 计算机工程, 2007, 33(07): 133-134.
[7]	张桂玲;孙济洲. 基于系统调用顺序和频度特性的入侵检测模型 [J]. 计算机工程, 2006, 32(13): 18-19,4.

选择文件类型/文献管理软件名称

选择包含的内容

基于KNN的不良文本过滤方法

Reactionary Text Filtering Method Based on K-Nearest Neighbor

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 7

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于KNN的不良文本过滤方法

Reactionary Text Filtering Method Based on K-Nearest Neighbor

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 7

编辑推荐

Metrics

本文评价