摘要: 针对传统模式挖掘方法挖掘蛋白质序列会生成大量候选模式或多次构造投影数据库,导致效率降低,挖掘过程中会产生不必要的短模式或错误模式等问题,提出基于模式划分的MBioPM算法。理论分析和实验表明,MBioPM算法的性能高于其他相关算法。
关键词:
蛋白质序列,
模式挖掘,
数据挖掘,
生物信息学
Abstract: Traditional algorithms face efficiency problem because of generating a huge number of candidates or constructing projected database many times. These algorithms will generate unnecessary short patterns or even wrong patterns in the process of mining. To attack these problems, a novel mining algorithm called Motif-divide based Biology sequence Pattern Mining(MBioPM) is presented based on “motif-divide” method. The MBioPM algorithm improves the efficience and avoids the problems mentioned above. Theoretical analysis and experimental results show that MBioPM algorithm improves performance as compared with other algorithm.
Key words:
protein sequence,
pattern mining,
data mining,
bioinformatics
中图分类号:
郭 顺;姜青山;王备战;史 亮. 一种新的蛋白质序列模式挖掘算法[J]. 计算机工程, 2009, 35(8): 208-210.
GUO Shun; JIANG Qing-shan; WANG Bei-zhan; SHI Liang. Mining Algorithm for Protein Sequence Pattern[J]. Computer Engineering, 2009, 35(8): 208-210.