摘要: 研究一种基于最小风险贝叶斯决策的垃圾短信过滤方法。对于以文本信息为主的短信,采用信息增益的方法进行特征选择,使用基于最小风险贝叶斯决策方法进行分类。通过自建短信语料库对该方法进行了实验。实验结果表明,该方法能够准确地对短信进行分类,降低合法短信的分类错误率,分类正确率达到99.3%,符合了短信分类要求。
关键词:
垃圾短信,
短信过滤,
文本分类,
朴素贝叶斯
Abstract: This paper analyzes a junk message filtering system based on the minimum risk Bayesian filtering algorithm, adopts Information Gain(IG) to select the feature, uses the minimum risk-based Bayesian filtering algorithm to classify. The experimental result, which data set is constructed by the real SMS from mobile company, shows that the method has a good performance on classification and low error rate on legit messages. The legit messages recall has achieve 99.3%. It’s suitable for SMS classification.
Key words:
junk message,
SMS filtering,
text classification,
naï,
ve Bayesian
中图分类号:
李 辉; 张 琦;卢湖川. 基于内容的垃圾短信过滤[J]. 计算机工程, 2008, 34(12): 154-156.
LI Hui ; ZHANG Qi ; LU Hu-chuan. Junk SMS Filtering Based on Context[J]. Computer Engineering, 2008, 34(12): 154-156.