Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2012, Vol. 38 ›› Issue (13): 48-50. doi: 10.3969/j.issn.1000-3428.2012.13.013

• Networks and Communications • Previous Articles     Next Articles

Compounds Biological Active Analysis System Based on Hadoop

LI Jie-hui a, ZHANG Liang a, CHEN Jian b, NAN Peng b   

  1. (a. School of Computer Science; b. School of Life Sciences, Fudan University, Shanghai 200433, China)
  • Received:2011-12-19 Online:2012-07-05 Published:2012-07-05

基于Hadoop的化合物生物活性分析系统

李杰辉a,张 亮a,陈 健b,南 蓬b   

  1. (复旦大学 a. 计算机科学技术学院;b. 生命科学学院,上海 200433)
  • 作者简介:李杰辉(1985-),男,硕士研究生,主研方向:云计算;张 亮,教授、博士生导师;陈 健,硕士研究生;南 蓬, 副教授
  • 基金资助:
    国家“863”计划基金资助项目(2009AA02Z308)

Abstract: The prediction of biological active compounds, which requires heavy computation in practice, is more and more important in drug discovery and chemical genomics. This paper proposes a Hadoop-based parallelized method for searching and predicting drug-like compounds combining MapReduce model, and the method is realized through Hadoop tool. According to the problems of the inhomogeneity of Hadoop partition algorithm, and computation of fault-tolerant function, the method is improved. Experimental results show that the method provides an effective, scalable and simple way to solve the heavy computation problem, and its average accelerated efficiency achieves 0.91.

Key words: biological activity, bioinformatics, parallel computing, scalability, database, Hadoop frame

摘要: 通过药物虚拟筛选,发现在具有生物活性化合物过程中,存在小分子结构相似性比较计算繁杂等问题。为此,结合MapReduce模型,提出一种基于Hadoop的并行计算模型处理方法,并通过Hadoop工具予以实现。同时根据Hadoop分区算法不均匀、容错功能存在重复计算等问题对该方法进行改进。实验结果表明,该方法平均加速效率达到0.91,具有较好的可靠性和扩展性。

关键词: 生物活性, 生物信息学, 并行计算, 可扩展性, 数据库, Hadoop框架

CLC Number: