基于向量空间模型的过滤不良文本方法

doi:10.3969/j.issn.1000-3428.2006.10.002

计算机工程 ›› 2006, Vol. 32 ›› Issue (10): 4-5,8.

基于向量空间模型的过滤不良文本方法

李强，李建华

上海交通大学信息安全工程学院，上海 200030

出版日期:2006-05-20 发布日期:2006-05-20

Method of Filting Reactionary Text Based on Vector Space Model

LI Qiang, LI Jianhua

College of Information Security Engineering, Shanghai Jiaotong University, Shanghai 200030

Online:2006-05-20 Published:2006-05-20

摘要/Abstract

摘要： 就向量空间模型文本表示方法以及归一化技术对不良文本过滤性能的影响进行了研究，并基于平衡样本集和不平衡样本集分别进行了试验。试验和结果分析表明，Na?ve Bayes 方法由于采用概率模型进行文本表示，在不平衡样本集上显示了较差的准确度，而基于向量空间模型进行文本表示的方法，如中心向量法(VSM)、支持向量机(SVM)等在平衡或非平衡样本上取得了较好的准确度，并用于过滤不良文本的文本内容安全监管中。

关键词: 文本表示；文本归一化；向量空间模型；支持向量机；Naive Bayes 模型

Abstract: This paper researches the vector space model for expressing text, and two datasets are used to evaluate the text expressing method, one is a balance data set, the other is a non-balance data set, which is used for filtering some specific text. It gets good precision using VSM and SVM on both data sets, however the result is poor using Naive Bayes model on the non-balance data set, especially to filter unseen reactionary Web text. The paper concludes that term weighting and normalization are very important technique to improve the precision

Key words: Text expressing; Text normalization; Vector space model; Support vector machine; Naive Bayes model

李强，李建华. 基于向量空间模型的过滤不良文本方法[J]. 计算机工程, 2006, 32(10): 4-5,8.

LI Qiang, LI Jianhua. Method of Filting Reactionary Text Based on Vector Space Model[J]. Computer Engineering, 2006, 32(10): 4-5,8.

https://www.ecice06.com/CN/Y2006/V32/I10/4

选择文件类型/文献管理软件名称

选择包含的内容

基于向量空间模型的过滤不良文本方法

Method of Filting Reactionary Text Based on Vector Space Model

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于向量空间模型的过滤不良文本方法

Method of Filting Reactionary Text Based on Vector Space Model

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

编辑推荐

Metrics

本文评价