作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2011, Vol. 37 ›› Issue (11): 184-186,189. doi: 10.3969/j.issn.1000-3428.2011.11.063

• 人工智能及识别技术 • 上一篇    下一篇

基于FSA的DNA重复体频率统计算法

陈 聪a,韩建民a,贾 泂a,辛德东b   

  1. (浙江师范大学 a. 数理与信息工程学院;b. 化学与生命科学学院,浙江 金华 321000)
  • 收稿日期:2010-10-12 出版日期:2011-06-05 发布日期:2011-06-05
  • 作者简介:陈 聪(1990-),男,本科生,主研方向:生物信息学;韩建民,副教授、博士;贾 泂,教授;辛德东,讲师
  • 基金资助:
    2010年浙江省新苗人才计划基金资助项目(2010R404017)

Statistical Algorithm for DNA Repeats Frequency Based on Finite State Automaton

CHEN Cong a, HAN Jian-min  a, JIA Jiong  a, XIN De-dong  b   

  1. (a. College of Mathematics Physics and Information Engineering; b. College of Chemistry and Life Sciences, Zhejiang Normal University, Jinhua 321000, China)
  • Received:2010-10-12 Online:2011-06-05 Published:2011-06-05

摘要: 针对现有DNA重复体频率统计算法效率低、灵活性差等不足,基于字符串多模式匹配的有限状态自动机,构造DNA子序列比对自动机,利用KMP算法对自动机进行状态转移优化,由此提出一种高效的重复体频率统计算法。该算法通过对DNA数据库的线性扫描,得到每个DNA子序列在全局数据库中重叠与非重叠的重复体频率统计信息以及指定DNA序列集合的最长公共子序列信息。实验结果表明,该算法具有效率高、匹配精确、信息获取方式灵活、支持在线操作等优势。

关键词: 有限状态自动机, DNA子序列, 重复体频率, 频率统计算法, 最长公共子序列

Abstract: The existing statistical algorithms for DNA repeats frequency have some defects on efficiency, flexibility and so on. Based on Finite State Automaton(FSA) for string multiple patterns matching, this paper constructs a DNA subsequence comparative automaton, and optimizes the automaton’s state transition based on the idea of Knuth-Morris-Pratt(KMP). By on-line scanning DNA database, it can achieve all DNA subsequence’s statistics in the whole database, including the frequency statistics of overlapped or nonoverlapped repeats and the longest common subsequence in a designated DNA sequence set. Experimental results show that the algorithm has advantages of efficiency, precise matching, flexible information acquisition, supporting on-line operation and so on.

Key words: Finite State Automaton(FSA), DNA subsequence, repeats frequency, frequency statistical algorithm, longest common subsequence

中图分类号: