作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

基于信息熵的二次聚类推荐算法

李辉,石钊,易军凯   

  1. (北京化工大学信息科学与技术学院,北京 100029)
  • 收稿日期:2015-04-24 出版日期:2016-05-15 发布日期:2016-05-13
  • 作者简介:李辉(1975-),男,副教授、博士,主研方向为推荐系统、密码学、信息安全;石钊,硕士研究生;易军凯,教授。
  • 基金资助:
    2015年国家科技支撑计划基金资助项目(2015BAK39B02);2015年北京化工大学学科建设基金资助项目(XK1520)。

Secondary Clustering Recommendation Algorithm Based on Information Entropy

LI Hui,SHI Zhao,YI Junkai   

  1. (College of Information Science and Technology,Beijing University of Chemical Technology,Beijing 100029,China)
  • Received:2015-04-24 Online:2016-05-15 Published:2016-05-13

摘要: 用户对网页文本缺少主动评价信息会影响最终推荐结果的准确程度。为此,提出一种新的二次聚类推荐算法,通过对用户所浏览过的网页文本特征词的提取及相关权重的计算,得出每一个网页的文本信息熵值与最邻近熵差。利用连续型随机变量的均匀分布计算得到最邻近熵差阈值,借助平均熵值逼近确定二次聚类初始聚类簇数和簇心,结合对数函数拟合的方法计算推荐数量,通过2次文本聚类,运用欧氏距离和信息熵值确定推荐内容。实验结果表明,该推荐算法在实际系统中运行稳定,与单纯只进行2次聚类运算的推荐算法相比,推荐准确程度有所提高。

关键词: 最邻近熵差阈值, 平均熵值逼近, 二次聚类, 对数拟合, 推荐区域, 推荐算法

Abstract: The accuracy of final recommendation results is always affected by less active evaluation information of webpage texts which comes from users.Therefore,a secondary clustering recommendation algorithm based on information entropy is proposed.By extracting the feature words and calculating the corresponding weights,the information entropy value of each webpage text is browsed by users and the nearest entropy difference is got,and the threshold value of the nearest entropy difference is determined by using the continuous random variable of uniform distribution.With the help of the average entropy value approximation,the initial cluster numbers and hearts of the secondary clustering are cleared.The number of the recommendation results is obtained by the logarithmic function fitting.The recommended contents are determined by twice text clustering,combining with Euclidean distance and the information entropy value.Experimental results show that the recommendation algorithm is stable during the real system operation and improves the accuracy of final recommendation results compared with the secondary clustering recommendation algorithm without information entropy.

Key words: threshold value of the nearest neighbor entropy difference, average entropy value approximation, secondary clustering, logarithm fitting, recommendation area, recommendation algorithm

中图分类号: