作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 开发研究与工程应用 • 上一篇    下一篇

基于文本内容的敏感词决策树信息过滤算法

邓一贵a,伍玉英b   

  1. (重庆大学a. 信息与网络管理中心;b. 计算机学院,重庆400030)
  • 收稿日期:2013-08-21 出版日期:2014-09-15 发布日期:2014-09-12
  • 作者简介:邓一贵(1971 - ),男,高级工程师、博士,主研方向:信息安全;伍玉英,硕士研究生。

Information Filtering Algorithm of Text Content-based Sensitive Words Decision Tree

DENG Yi-gui a ,WU Yu-ying b   

  1. (a. Information and Campus Network Management Center;b. School of Computer Science,Chongqing University,Chongqing 400030,China)
  • Received:2013-08-21 Online:2014-09-15 Published:2014-09-12

摘要: 随着互联网的高速发展,各种各样的信息资源呈指数级增长,随之出现许多负面影响,需要构建一个安全健康的网络环境。为此,提出针对网页文本内容的敏感信息过滤算法(SWDT-IFA)。该算法不依赖词典与分词,通过构建敏感词决策树,将网页文本内容以数据流形式检索决策树,记录敏感词词频、区域信息以及敏感词级别,计算文本整体敏感度,过滤敏感文本。实验结果表明,SWDT-IFA 算法具有较高的查准率和查全率,且执行时间能 够满足当前网络环境的实时性要求。

关键词: 文本过滤, 敏感级别, 决策树, 分流, 词频

Abstract: With the development of Internet,many negative effects come out as the exponential growth of various information resources,which means that a more secure and healthy network environment should be constructed right now.In order to solve this problem,this paper proposes a Sensitive Word Decision Tree for Information Filtering Algorithm (SWDT-IFA) for content-based Web pages. The algorithm takes no consideration of dictionary and word segmentation, builds the foundation on the sensitive words decision tree,lets the web text retrieval decision tree in form of data stream, records word frequency,regional information and sensitive level,and calculates the sensitive degree of the text to filter the sensitivity. Experimental results show that the SWDT-IFA algorithm has precision ratio and recall ratio,and low time complexity which can require the real-time demand of network environment.

Key words: text filtering, sensitive level, decision tree, distributary, word frequency

中图分类号: