计算机工程 ›› 2011, Vol. 37 ›› Issue (14): 180-182.doi: 10.3969/j.issn.1000-3428.2011.14.060

• 人工智能及识别技术 • 上一篇    下一篇

基于Gibbs采样与遗传算法的模体识别

刘文远,田陆芳,王常武,王宝文   

  1. (燕山大学信息科学与工程学院,河北 秦皇岛 066004)
  • 收稿日期:2011-02-18 出版日期:2011-07-20 发布日期:2011-07-20
  • 作者简介:刘文远(1968-),男,教授、博士生导师,主研方向:软计算,数据库技术,生物信息学;田陆芳,硕士研究生;王常武,教授;王宝文,副教授
  • 基金项目:
    河北省教育厅自然科学研究计划基金资助项目(2009339)

Motif Identification Based on Gibbs Sampling and Genetic Algorithm

LIU Wen-yuan, TIAN Lu-fang, WANG Chang-wu, WANG Bao-wen   

  1. (College of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China)
  • Received:2011-02-18 Online:2011-07-20 Published:2011-07-20

摘要: 借鉴Gibbs采样思想,将序列峰值所对应的候选模体作为遗传算法的初始种群,提出一种改进的模体识别算法。将模体在序列中的出现次数作为变量加入到适应度函数中,使其更符合生物数据的特性。在算法变异操作中加入IUPAC简并码保持种群的多样性。对DBTSS数据库中的真实数据进行测试,结果表明该算法具有较高的识别精度和较快的搜索速度。

关键词: 模体识别, 遗传算法, Gibbs采样, IUPAC简并码

Abstract: Based on the idea of Gibbs sampling, this paper initializes the first population by selecting the candidate motifs corresponding to the peaks, and proposes an improved motif identification algorithm. The definition of the fitness function adds a parameter, the number of occurrences of a motif, more in line with the characteristics of biological data. In order to maintain the diversity of population, the algorithm uses the IUPAC degenerate code for mutation. Test result of real data in the DBTSS database shows that this algorithm has higher identification precision and quick search speed.

Key words: motif identification, Genetic Algorithm(GA), Gibbs sampling, IUPAC degenerate code

中图分类号: