摘要: 通过药物虚拟筛选,发现在具有生物活性化合物过程中,存在小分子结构相似性比较计算繁杂等问题。为此,结合MapReduce模型,提出一种基于Hadoop的并行计算模型处理方法,并通过Hadoop工具予以实现。同时根据Hadoop分区算法不均匀、容错功能存在重复计算等问题对该方法进行改进。实验结果表明,该方法平均加速效率达到0.91,具有较好的可靠性和扩展性。
关键词:
生物活性,
生物信息学,
并行计算,
可扩展性,
数据库,
Hadoop框架
Abstract: The prediction of biological active compounds, which requires heavy computation in practice, is more and more important in drug discovery and chemical genomics. This paper proposes a Hadoop-based parallelized method for searching and predicting drug-like compounds combining MapReduce model, and the method is realized through Hadoop tool. According to the problems of the inhomogeneity of Hadoop partition algorithm, and computation of fault-tolerant function, the method is improved. Experimental results show that the method provides an effective, scalable and simple way to solve the heavy computation problem, and its average accelerated efficiency achieves 0.91.
Key words:
biological activity,
bioinformatics,
parallel computing,
scalability,
database,
Hadoop frame
中图分类号:
李杰辉, 张亮, 陈健, 南蓬. 基于Hadoop的化合物生物活性分析系统[J]. 计算机工程, 2012, 38(13): 48-50.
LI Jie-Hui, ZHANG Liang, CHEN Jian, NA Peng. Compounds Biological Active Analysis System Based on Hadoop[J]. Computer Engineering, 2012, 38(13): 48-50.