计算机工程

• 开发研究与工程应用 • 上一篇    下一篇

基于微博短文本的用户兴趣建模方法

邱云飞1,王琳颍1,邵良杉2,郭红梅3   

  1. (1. 辽宁工程技术大学软件学院,辽宁 葫芦岛 125100;2. 辽宁工程技术大学系统工程研究所,辽宁 阜新 123000;3. 阜新市实验高中,辽宁 阜新 123000)
  • 收稿日期:2013-01-05 出版日期:2014-02-15 发布日期:2014-02-13
  • 作者简介:邱云飞(1976-),男,教授、博士,主研方向:数据挖掘;王琳颍,硕士研究生;邵良杉,教授、博士生导师;郭红梅,硕士
  • 基金项目:
    国家自然科学基金资助项目(70971059);辽宁省创新团队基金资助项目(2009T045);辽宁省高等学校杰出青年学者成长计划基金资助项目(JQ2012027)

User Interest Modeling Approach Based on Short Text of Micro-blog

QIU Yun-fei 1, WANG Lin-ying 1, SHAO Liang-shan 2, GUO Hong-mei 3   

  1. (1. School of Software, Liaoning Technical University, Huludao 125100, China; 2. System Engineering Institute, Liaoning Technical University, Fuxin 123000, China; 3. Experimental High School of Fuxin, Fuxin 123000, China)
  • Received:2013-01-05 Online:2014-02-15 Published:2014-02-13

摘要: 针对微博用户兴趣建模问题,提出一种在微博短文本数据集上建立用户兴趣模型的方法。为缓解短文本造成的数据稀疏性问题,在分析微博短文本结构和内容的基础上,给出微博短文本重构概念,根据微博相关的其他微博短文本和文本中包含的 3种特殊符号,进行文本内容的扩展,从而扩充原始微博的特征信息。利用HowNet2000概念词典将重构后文本的特征词集映射到概念集。以抽象到概念层的文本向量为基础进行聚类,划分用户的兴趣集合,并给出用户兴趣模型的表示机制。实验结果表明,短文本重构和概念映射提高了聚类效果,与基于协同过滤的微博用户兴趣建模方法相比,平衡均值提高29.1%,表明构建的微博用户兴趣模型具有较好的性能。

关键词: 微博, 短文本重构, 概念映射, 短文本聚类, 用户兴趣模型

Abstract: In this paper, a method on modeling user’s interests based on short text of micro-blog is presented. In order to overcome the lack of information in short text, on the base of analyzing the structure and content of micro-blog short text, this paper proposes an approach on micro-blog short text reconstruction, and namely, according to the other related and the three kinds of special symbols of the text, extends the content, thereby extending the characteristic information of original micro-blog. It takes advantage of HowNet2000 concept dictionary to map the feature set of reconstruction text to a set of concepts. It clusters the set of concepts to divide user’s interests, and meanwhile, a representation mechanism of user interest model is presented. Experimental results show that the short text reconstruction and concept mapping can improve the effect of clustering. Compared with the modeling based on collaborative filtering, F-Measure value is increased by 29.1%. This means the proposed micro-blog user’s interest modeling has a better performance.

Key words: micro-blog, short-text reconstruction, concept mapping, short-text clustering, user interest model

中图分类号: