Abstract:
Based on the idea of Gibbs sampling, this paper initializes the first population by selecting the candidate motifs corresponding to the peaks, and proposes an improved motif identification algorithm. The definition of the fitness function adds a parameter, the number of occurrences of a motif, more in line with the characteristics of biological data. In order to maintain the diversity of population, the algorithm uses the IUPAC degenerate code for mutation. Test result of real data in the DBTSS database shows that this algorithm has higher identification precision and quick search speed.
Key words:
motif identification,
Genetic Algorithm(GA),
Gibbs sampling,
IUPAC degenerate code
摘要: 借鉴Gibbs采样思想,将序列峰值所对应的候选模体作为遗传算法的初始种群,提出一种改进的模体识别算法。将模体在序列中的出现次数作为变量加入到适应度函数中,使其更符合生物数据的特性。在算法变异操作中加入IUPAC简并码保持种群的多样性。对DBTSS数据库中的真实数据进行测试,结果表明该算法具有较高的识别精度和较快的搜索速度。
关键词:
模体识别,
遗传算法,
Gibbs采样,
IUPAC简并码
CLC Number:
LIU Wen-Yuan, TIAN Liu-Fang, WANG Chang-Wu, WANG Bao-Wen. Motif Identification Based on Gibbs Sampling and Genetic Algorithm[J]. Computer Engineering, 2011, 37(14): 180-182.
刘文远, 田陆芳, 王常武, 王宝文. 基于Gibbs采样与遗传算法的模体识别[J]. 计算机工程, 2011, 37(14): 180-182.