摘要: 研究基本局部比对搜索工具(BLAST)在陆地植物系统发育平台中的应用。数据清洗方面结合基于基因注释的数据抽提与基于BLAST的相似性比对抽提,提取过滤相关的序列信息,控制序列质量,并剔除原始基因注释错误的序列。自测序列质量控制方面结合基于blastn的打分比对和基于blastp的模板比对,报告序列整体质量,控制污染序列和假基因的入库。
关键词:
序列比对,
数据清洗,
基本局部比对搜索工具,
陆地植物系统发育平台
Abstract: This paper researches the application of Basic Local Alignment Search Tool(BLAST) in the Platform for Phylogenetic Analysis of Land Plant Platform(PALPP). In data cleaning, it uses the data extraction based on gene annotation and extraction based on BLAST similarity matching to filter the related sequence information, control the sequence quality and remove the original gene sequence annotation errors. In the quality control of self-sequence data, it uses the way of alignment scoring based on blastn and template matching based on blastp to report the overall quality of sequence, control the storage of the pollution sequences and pseudo genes.
Key words:
sequence alignment,
data cleaning,
Basic Local Alignment Search Tool(BLAST),
Phylogenetic Analysis of Land Plant Platform (PALPP)
中图分类号:
刘奇, 孟珍, 刘勇, 董慧, 林小光, 杲艳平, 周园春, 黎建辉. 基于BLAST的数据清洗与质量控制方案[J]. 计算机工程, 2011, 37(4): 73-75.
LIU Ai, MENG Zhen, LIU Yong, DONG Hui, LIN Xiao-Guang, GAO Yan-Beng, ZHOU Wan-Chun, LI Jian-Hui. Data Cleaning and Quality Control Scheme Based on BLAST[J]. Computer Engineering, 2011, 37(4): 73-75.