作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

一种基于网格与加权信息熵的OPTICS改进算法

安建瑞,张龙波,王雷,金超,怀浩,王晓丹   

  1. (山东理工大学 计算机科学与技术学院,山东 淄博 255049)
  • 收稿日期:2016-01-18 出版日期:2017-02-15 发布日期:2017-02-15
  • 作者简介:安建瑞(1990—),男,硕士研究生,主研方向为数据挖掘;张龙波(通信作者),教授、博士;王雷,讲师、博士;金超、怀浩、王晓丹,硕士研究生。
  • 基金资助:
    国家自然科学基金青年科学基金项目(61502282);山东省自然科学基金青年科学基金项目(ZR2015FQ005);山东理工大学博士科研启动经费资助项目(414023)。

An Improved OPTICS Algorithm Based on Grid and Weighted Information Entropy

AN Jianrui,ZHANG Longbo,WANG Lei,JIN Chao,HUAI Hao,WANG Xiaodan   

  1. (College of Computer Science and Technology,Shandong University of Technology,Zibo,Shandong 255049,China)
  • Received:2016-01-18 Online:2017-02-15 Published:2017-02-15

摘要: 针对现有OPTICS算法时间复杂度高且不适用于数据密集型环境的问题,提出一种基于网格与加权信息熵的改进算法。将数据集合划分为一定数量的网格单元,引入加权信息熵,自适应计算每个网格单元的最小密度阈值。对满足最小密度阈值的网格单元定义密集格的概念,利用质心点代替网格数据点集的方法对数据点进行压缩。采用Geolife Trajectories数据集对算法性能进行测试,从理论分析和实验结果两方面证明了改进算法的有效性。

关键词: 数据密集型环境, 加权信息熵, OPTICS算法, 密度阈值, 质心点

Abstract: Since the existing OPTICS algorithm is time-consuming,highly complex and unsuitable for data-intensive environments,this paper brings about an improved algorithm based on grid and weighted information entropy.It firstly divides the data set into a number of grid cells,and then introduces the weighted information entropy concept to the divided grid units.By calculating the weighted information entropy,it self-adaptively computes the minimum density threshold for each grid cell.For the grid cells that satisfy the minimum density threshold,a dense grid concept is proposed to compress data points by replacing gridded data sets with centroid points.Finally,the GeoLife Trajectories dataset is employed to test the algorithm performance,and the validity of the improved algorithm is proved by both theoretical analysis and experimental results.

Key words: data-intensive environment, weighted information entropy, OPTICS algorithm, density threshold, centroid point

中图分类号: