计算机工程 ›› 2008, Vol. 34 ›› Issue (14): 47-48.doi: 10.3969/j.issn.1000-3428.2008.14.017

• 软件技术与数据库 • 上一篇    下一篇

基于最长公共子序列距离的主旨模式挖掘算法

冯 林,于孝航,孙 焘,沈 骁,潘晓雯   

  1. (大连理工大学大学生创新院,大连 116024)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2008-07-20 发布日期:2008-07-20

基于最长公共子序列距离的主旨模式挖掘算法

FENG Lin, YU Xiao-hang, SUN Tao, SHEN Xiao, PAN Xiao-wen   

  1. (Institute of University Students’ Innovation, Dalian University of Technology, Dalian 116024)
  • Received:1900-01-01 Revised:1900-01-01 Online:2008-07-20 Published:2008-07-20

摘要: 针对现有主旨模式挖掘算法易受噪声干扰的问题,提出一种基于最长公共子序列距离的挖掘算法。在搜索过程中,该算法采用基于子序列距离判别的策略进行了有效的剪枝,对于非等长的候选模式,使用最小描述长度原则求其相关权重,据此选择出现频率最高、最能体现原时间序列特征的主旨模式。实验结果表明,与朴素式搜索相比,该算法的速度至少提升60%。

关键词: 主旨模式, 噪声干扰, 聚类分析, 最小描述长度原则

Abstract: According to the problem that existing algorithms are apt to be interfered by noise, a motif mining algorithm based on the Longest Common Subsequence(LCSS) distance is introduced. The algorithm is pruned efficiently by using the heuristic strategy based on the distance between subsequences during the search. Minimum Description Length(MDL) principle is used to calculate the weights of the unequal-length candidate sequences based on the motif patterns selected. Experiment shows the speed of the algorithm prompts at least 60% compared with that of naive algorithm.

Key words: motif pattern, noise interference, clustering analysis, Minimum Description Length(MDL) principle

中图分类号: