作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2021, Vol. 47 ›› Issue (8): 45-53,61. doi: 10.19678/j.issn.1000-3428.0058601

• 人工智能与模式识别 • 上一篇    下一篇

面向对比序列模式发现的独立精确置换检验算法

吴军, 欧阳艾嘉, 张琳   

  1. 遵义师范学院 信息工程学院, 贵州 遵义 563000
  • 收稿日期:2020-06-10 修回日期:2020-07-14 发布日期:2020-07-16
  • 作者简介:吴军(1990-),男,讲师、硕士,主研方向为数据挖掘、深度学习、生物信息学;欧阳艾嘉,教授、博士;张琳,副教授、硕士。
  • 基金资助:
    国家自然科学基金(61662090);贵州省教育厅青年科技人才成长项目(黔教合KY字[2017]250);贵州省科技厅联合基金(黔科合LH字[2017]7069);贵州省教育厅工程研究中心项目(黔教合KY字[2016]018)。

Independent Exact Permutation Testing Algorithm for Distinguishing Sequential Pattern Discovery

WU Jun, OUYANG Aijia, ZHANG Lin   

  1. School of Information Engineering, Zunyi Normal University, Zunyi, Guizhou 563000, China
  • Received:2020-06-10 Revised:2020-07-14 Published:2020-07-16

摘要: 传统的对比序列模式挖掘算法存在一定数量的假阳性对比序列模式,其提供的错误信息会干扰后续任务的决策。设计一种IEP-DSP算法过滤假阳性对比序列模式。运用spade方法和WRAcc对比性度量找到候选对比序列模式和所有置换数据集合中的对比序列模式,通过模拟置换过程,使用独立精确置换检验方法为不同长度的模式建立独立精确零分布,并计算每个候选对比序列模式的精确p-value,运用错误发现率度量将各个长度的假阳性对比序列模式数量控制在置信度为α的统计显著水平下。在真实数据集和仿真数据集上的实验结果表明,IEP-DSP算法够过滤掉大量的假阳性对比序列模式,相比基于统计显著性检验的方法能保留更多的真对比序列模式,验证了独立精确置换检验相较于标准置换检验的优越性。

关键词: 数据挖掘, 模式发现, 对比序列模式挖掘, 统计显著性检验, 独立精确置换检验

Abstract: Traditional distinguishing sequential pattern mining algorithms usually generate a number of false positive patterns in their results, which hinder the subsequent decisions of tasks. To address the problem, a method named IEP-DSP for filtering out false positive patterns is proposed. The method employs the spade algorithm and the WRAcc measure to produce the distinguishing sequential patterns to be tested and the distinguishing sequential patterns that exist in permutated sequential data sets. Through the simulated permutation process, the independent exact permutation testing method is used to establish independent exact null distributions for patterns with different length, and the exact p-value of the tested patterns can be calculated from these null distributions. The False Discovery Rate(FDR) measure is used to control the number of false positive distinguishing patterns with different length under a confidence level α. The experimental results on real data sets and simulated data sets show that the IEP-DSP algorithm can eliminate a large number of false positive distinguishing patterns while keeping more real distinguishing sequential patterns. At the same time, the advantage of independent exact permutation testing over standard permutation testing is proved.

Key words: data mining, pattern discovery, distinguishing sequential pattern mining, statistical significance testing, independent exact permutation testing

中图分类号: