计算机工程 ›› 2012, Vol. 38 ›› Issue (13): 48-50.doi: 10.3969/j.issn.1000-3428.2012.13.013

• 软件技术与数据库 • 上一篇    下一篇

基于Hadoop的化合物生物活性分析系统

李杰辉a,张 亮a,陈 健b,南 蓬b   

  1. (复旦大学 a. 计算机科学技术学院;b. 生命科学学院,上海 200433)
  • 收稿日期:2011-12-19 出版日期:2012-07-05 发布日期:2012-07-05
  • 作者简介:李杰辉(1985-),男,硕士研究生,主研方向:云计算;张 亮,教授、博士生导师;陈 健,硕士研究生;南 蓬, 副教授
  • 基金项目:
    国家“863”计划基金资助项目(2009AA02Z308)

Compounds Biological Active Analysis System Based on Hadoop

LI Jie-hui a, ZHANG Liang a, CHEN Jian b, NAN Peng b   

  1. (a. School of Computer Science; b. School of Life Sciences, Fudan University, Shanghai 200433, China)
  • Received:2011-12-19 Online:2012-07-05 Published:2012-07-05

摘要: 通过药物虚拟筛选,发现在具有生物活性化合物过程中,存在小分子结构相似性比较计算繁杂等问题。为此,结合MapReduce模型,提出一种基于Hadoop的并行计算模型处理方法,并通过Hadoop工具予以实现。同时根据Hadoop分区算法不均匀、容错功能存在重复计算等问题对该方法进行改进。实验结果表明,该方法平均加速效率达到0.91,具有较好的可靠性和扩展性。

关键词: 生物活性, 生物信息学, 并行计算, 可扩展性, 数据库, Hadoop框架

Abstract: The prediction of biological active compounds, which requires heavy computation in practice, is more and more important in drug discovery and chemical genomics. This paper proposes a Hadoop-based parallelized method for searching and predicting drug-like compounds combining MapReduce model, and the method is realized through Hadoop tool. According to the problems of the inhomogeneity of Hadoop partition algorithm, and computation of fault-tolerant function, the method is improved. Experimental results show that the method provides an effective, scalable and simple way to solve the heavy computation problem, and its average accelerated efficiency achieves 0.91.

Key words: biological activity, bioinformatics, parallel computing, scalability, database, Hadoop frame

中图分类号: