作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2013, Vol. 39 ›› Issue (5): 174-177,182. doi: 10.3969/j.issn.1000-3428.2013.05.038

• 人工智能及识别技术 • 上一篇    下一篇

基于哈夫曼树的雷电数据采样算法

彭永供,邱桃荣,林于渊,黄海泉   

  1. (南昌大学信息工程学院,南昌 330031)
  • 收稿日期:2012-06-05 出版日期:2013-05-15 发布日期:2013-05-14
  • 作者简介:彭永供(1974-),男,实验师、硕士,主研方向:智能信息处理;邱桃荣,教授、博士;林于渊、黄海泉,硕士研究生
  • 基金资助:
    国家自然科学基金资助项目(61070139);江西省自然科学基金资助项目(20114BAB201039);江西省科技支撑计划基金资助项目(20112BBG70087);江西省教育厅科技计划基金资助项目(GJJ11286)

Lightning Data Sampling Algorithm Based on Huffman Tree

PENG Yong-gong, QIU Tao-rong, LIN Yu-yuan, HUANG Hai-quan   

  1. (School of Information Engineering, Nanchang University, Nanchang 330031, China)
  • Received:2012-06-05 Online:2013-05-15 Published:2013-05-14

摘要: 对具有非平衡特征的海量雷电气象数据集,采用基于欧式距离的样本欠采样算法时效率较低。为解决该问题,提出一种基于哈夫曼树的雷电数据采样算法。使用哈夫曼树构建方法估算雷电样本的簇中心及簇内样本个数,利用得到的结果并结合欧式距离的样本欠采样算法进行非雷电样本采样。对27 552条真实数据做采样实验,结果表明,该算法的采样时间约为16 min,不仅能降低数据量,而且能提高算法的时间性能。

关键词: 雷电预报, 非平衡数据, 欠采样算法, 哈夫曼树, 支持向量机

Abstract: In order to solve the problem of bad performance of a sampling algorithm based on Euclidean distance in a large meteorological dataset with an imbalanced characteristics, this paper proposes a lightning data sampling algorithm based on the Huffman tree. An approach to quickly calculate the number of cluster centers and the number of the samples of each cluster is designed by using the technique for building Huffman tree. According to two values attained, this algorithm can be used to generate non-lightning samples from the large meteorological dataset. Sampling testing on the given set with 27 552 records, results show that this algorithm sampling time is about 16 minutes, not only can reduce the amount of data, and can improve the time performance of this algorithm.

Key words: lightning forecasting, imbalance data, undersampled algorithm, Huffman tree, Support Vector Machine(SVM)

中图分类号: