作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 先进计算与数据处理 • 上一篇    下一篇

多基因组索引研究及其改进序列比对算法

何忠峭 1,2,徐云 1,2,3   

  1. ( 1.中国科学技术大学 计算机科学与技术学院,合肥 230027; 2.安徽省高性能计算重点实验室,合肥 230027; 3.国防科学技术大学 高性能计算协同创新中心,长沙 410073)
  • 收稿日期:2017-02-20 出版日期:2018-03-15 发布日期:2018-03-15
  • 作者简介:何忠峭(1992—),男,硕士研究生,主研方向为生物大数据;徐云,教授、博士生导师。
  • 基金资助:
    国家自然科学基金面上项目(61672480)。

Research on Multi-genome Index and Its Improved Sequence Alignment Algorithm

HE Zhongqiao  1,2,XU Yun   1,2,3   

  1. (1.School of Computer Science and Technology,University of Science and Technology of China,Hefei 230027,China; 2.Key Laboratory of High Performance Computing of Anhui Province,Hefei 230027,China; 3.Collaborative Innovation Center of High Performance Computing,National University of Defense Technology,Changsha 410073,China)
  • Received:2017-02-20 Online:2018-03-15 Published:2018-03-15

摘要: 目前的多基因组比对算法需要大量时间和内存开销,多基因组索引(MuGI)的比对算法速度较快,但未能利用多基因组重复信息。为此,提出一种改进的MuGI索引比对算法。运用带单核苷酸多态性剪枝的动态种子扩展算法及多基因组的重复信息,提高比对速度。同时采用按需读取索引的内存管理策略,提高算法的空间效率。实验结果表明,改进算法仅需6 GB运行内存,即可在1 092人基因组上进行比对,并且误配阈值为5的比对速度为MuGI算法的3倍左右。

关键词: 序列比对, 个人基因组计划, 千人基因组计划, 下一代测序, 多基因组算法

Abstract: The current multi-genome alignment algorithm requires a lot of time and memory overhead.Multi-genome Index (MuGI) alignment algorithm is faster,but failed to take advantage of multi-genomic duplication of information.therefore,an improved MuGI index alignment algorithm is proposed,which uses the dynamic seed expansion algorithm with Single Nueleotide Polymorphism(SNP) pruning and utilizes the repeated information of multiple genomes to improve the alignment speed.At the same time,it uses on-demand indexed memory management strategy to improve the space efficiency of the algorithm.Experimental results show that the improved algorithm only needs 6 GB running memory,which can be aligned on 1 092 human genomes and the speed of 5 mismatch is about 3 times faster than MuGI algorithm.

Key words: sequence alignment, Personal Genome Project(PGP), 1 000 human genomes project, Next Generation Sequencing(NGS), Multi-genome Index(MuGI) algorithm

中图分类号: