作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2020, Vol. 46 ›› Issue (11): 132-138,147. doi: 10.19678/j.issn.1000-3428.0056453

• 先进计算与数据处理 • 上一篇    下一篇

引入局部向量点积密度的数据流离群点快速检测算法

毛亚琼1, 田立勤1,2, 王艳1, 毛亚萍3, 王志刚1   

  1. 1. 青海师范大学 计算机学院, 西宁 810008;
    2. 华北科技学院 计算机学院, 北京 065201;
    3. 青海省基础测绘院, 西宁 810000
  • 收稿日期:2019-10-30 修回日期:2020-01-04 发布日期:2020-02-10
  • 作者简介:毛亚琼(1991-),女,硕士研究生,主研方向为数据挖掘;田立勤(通信作者),教授、博士生导师;王艳,博士;毛亚萍,硕士;王志刚,博士。
  • 基金资助:
    国家重点研发计划(2017YFC0804108,2018YFC0808306);中央高校基本科研业务费专项资金(3142019043);河北省重点研发计划(19270318D);青海省物联网重点实验室资助项目(2017-ZJ-Y21);青海省应用基础研究项目(2017-ZJ-752);河北省物联网监控工程技术研究中心项目(3142016020)。

Fast Outlier Detection Algorithm in Data Stream with Local Density of Vector Dot Product

MAO Yaqiong1, TIAN Liqin1,2, WANG Yan1, MAO Yaping3, WANG Zhigang1   

  1. 1. School of Computer, Qinghai Normal University, Xining 810008, China;
    2. School of Computer, North China Institute of Science and Technology, Beijing 065201, China;
    3. Qinghai Basic Surveying and Mapping Institute, Xining 810000, China
  • Received:2019-10-30 Revised:2020-01-04 Published:2020-02-10

摘要: 现有数据流离群点检测算法在面对海量高维数据流时普遍存在运算时间过长的问题。为此,提出一种引入局部向量点积密度的高维数据流离群点快速检测算法。以保存少量中间结果的方式只对窗口内受影响的数据点进行增量计算,同时设计2种优化策略和1条剪枝规则,减少检测过程中各点之间距离的计算次数,降低算法的时空开销,从而提高检测效率。理论分析和实验结果表明,该算法可以在保证检测准确性的情况下有效提高数据流的离群点检测效率,并且可扩展至并行环境进行并行加速。

关键词: 离群点检测, 高维数据流, 局部向量点积密度, 增量计算, 剪枝规则

Abstract: Existing outlier detection algorithms are generally time-consuming to deal with massive high-dimensional data streams.To address the problem,this paper proposes a Fast outlier detection algorithm in data stream with Local Density of Vector dot Product(FASTLDVP).It carries out incremental calculation only for the affected data points in the window,and keeps a small amount of intermediate results.Meanwhile,two optimization strategies and one pruning rule are designed to reduce the number of distance calculation times and the space-time overhead of the algorithm,so as to improve the detection efficiency.Theoretical analysis and experimental results show that this algorithm can effectively improve the detection efficiency of outliers in data stream while ensuring the detection accuracy,and can be extended to parallel environments for parallel acceleration.

Key words: outlier detection, high-dimensional data stream, Local Density of Vector dot Product(LDVP), incremental calculation, pruning rule

中图分类号: