作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2012, Vol. 38 ›› Issue (20): 179-183. doi: 10.3969/j.issn.1000-3428.2012.20.046

• 人工智能及识别技术 • 上一篇    下一篇

行为特征分析模式下的网页分类技术研究

汤亚玲1,崔志明2   

  1. (1. 安徽工业大学计算机学院,安徽 马鞍山 243002;2. 苏州大学智能信息处理及应用研究所,江苏 苏州 215006)
  • 收稿日期:2011-12-12 修回日期:2012-02-10 出版日期:2012-10-20 发布日期:2012-10-17
  • 作者简介:汤亚玲(1974-),男,副教授,主研方向:数据挖掘,网络数据库系统;崔志明,教授、博士生导师
  • 基金资助:
    国家自然科学基金资助项目(60473142);安徽省高校省级自然科学研究基金资助重点项目(KJ2010A051, KJ2011A039);安徽省高校省级优秀青年人才基金资助项目(2009SQRZ076)

Research on Web Page Categorization Technology Under Behavior Characteristic Analysis Pattern

TANG Ya-ling 1, CUI Zhi-ming 2   

  1. (1. School of Computer, Anhui University of Technology, Maanshan 243002, China; 2. Institute of Intelligent Information Processing and Application, Soochow University, Suzhou 215006, China)
  • Received:2011-12-12 Revised:2012-02-10 Online:2012-10-20 Published:2012-10-17

摘要: 现有网页分类技术忽略用户个性行为的差异。为此,提出一种结合用户行为特征分析的网页分类技术。运用知识规则发现、页面特征提取等方法,分析Web用户的访问历史和个性化定制信息,学习并掌握用户的行为和兴趣。针对用户的认知特征,提供合适的Web页面分类模式,能在一定程度上改进单纯统计学网页分类方法在自然语言理解上的不足。实验数据表明,该分类方法与多种统计学方法相结合实施网页分类均能有效地提高分类准确率,使网页分类结果更接近分类的真实情形和要求。

关键词: 网页分类, 行为特征, 数据挖掘, 逆向推理, 关联规则, 序列模式

Abstract: This paper introduces a kind of Web page categorization technology through analysis of characters of users’ behavior, along with current hotspot of researching on Web pages categorization. Trough grasping users’ behavior and interest by analyzing the history of Web user’s access, and by concluding knowledge rules out also with pages’ characters distilled. It provides a kind of appropriate categorization pattern on Web pages based on users’ knowledge level, and surely improves classifying effect without language meanings understood contrast with pure statistic categorization. Experimental results indicate that this pattern of categorization combining kinds of statistic algorithm can improve accuracy of categorization, and make the classifying results more closer to real facts and people’s knowledge desire.

Key words: Web page categorization, behavior characteristic, data mining, reverse-reasoning, association rule, sequence pattern

中图分类号: