Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2020, Vol. 46 ›› Issue (10): 81-87. doi: 10.19678/j.issn.1000-3428.0058973

• Artificial Intelligence and Pattern Recognition • Previous Articles     Next Articles

An Automatic Label Generation Method for Clustering of Network User Behavior

BI Meng1, SHAO Zhong1, XU Jian2   

  1. 1. Software College, Shenyang University of Technology, Shenyang 110023, China;
    2. Software College, Northeastern University, Shenyang 110169, China
  • Received:2020-07-17 Revised:2020-08-17 Published:2020-08-24

一种用于网络用户行为聚类的标签自动生成方法

毕猛1, 邵中1, 徐剑2   

  1. 1. 沈阳工业大学 软件学院, 沈阳 110023;
    2. 东北大学 软件学院, 沈阳 110169
  • 作者简介:毕猛(1982-),男,工程师、博士,主研方向为网络与信息安全、机器学习、数据聚类分析;邵中,副教授;徐剑,副教授、博士生导师。
  • 基金资助:
    国家自然科学基金(61872069);中央高校基本科研业务费专项资金(N2017012)。

Abstract: Existing user behavior clustering methods require the determined size of user behavior data,and the generated cluster labels lack explicit semantics.To solve these problems,this paper proposes an automatic cluster label generation method for clustering analysis of network user behavior.The method applies the Latent Factor Model(LFM) and matrix decomposition method to the raw data of network user behavior for missing value processing.Based on the attribute features of user behavior data,the user behavior cluster is performed and behavior features are added during clustering.At the same time,cluster labels are generated based on behavior feature information to improve the accuracy of user behavior clustering.Experimental results on datasets of Last.fm,Movielens and CiteULike show that the proposed method does not require the determined size of user behavior data,and can automatically generate cluster labels with more explicit semantics while keeping a high clustering accuracy.

Key words: user behavior, clustering, Latent Factor Model(LFM), matrix decomposition, label

摘要: 针对目前多数聚类算法需要事先确定网络用户行为数据规模以及生成的簇标签缺乏明确语义的问题,提出一种用于网络用户行为聚类分析的簇标签自动生成方法。应用潜在因子模型和矩阵分解方法对原始网络用户行为数据补充缺失值,根据网络用户行为数据的属性特征进行用户行为聚类并在聚类过程中增加行为特征,同时利用行为特征信息产生簇标签以提高网络用户行为的聚类准确性。在Last.fm、Movielens和CiteULike数据集上的实验结果表明,该方法无需事先确定网络用户行为数据规模,并且可在保证较高聚类准确率的前提下自动生成语义更明确的簇标签。

关键词: 用户行为, 聚类, 潜在因子模型, 矩阵分解, 标签

CLC Number: