作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2007, Vol. 33 ›› Issue (07): 28-29. doi: 10.3969/j.issn.1000-3428.2007.07.010

• 博士论文 • 上一篇    下一篇

基于复杂性K近邻规则的蛋白质亚细胞位点预测

李 斌1,李义兵2,何红波2   

  1. (1. 中南大学信息科学与工程学院,长沙 410083;2. 中南大学物理科学与技术学院,长沙 410083)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-04-05 发布日期:2007-04-05

Complexity KNN Rules Based Prediction of Protein
Subcellular Locations

LI Bin1, LI Yibing2, HE Hongbo2   

  1. (1. School of Information Science and Engineering, Central South University, Changsha 410083;
    2. School of Physics Science and Technology, Central South University, Changsha 410083)
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-04-05 Published:2007-04-05

摘要: 提出了一个基于符号序列LZ复杂性相似度和K近邻规则的蛋白质亚细胞位点类型预测的方法。相比许多其他特征参数,蛋白质序列的LZ复杂性相似度计算无需深入的生物学领域知识和除序列数据以外的其他辅助数据。同时,K近邻规则的延迟学习特性适合于亚细胞位点类型已知的蛋白质数据的动态增加。在标准的RH数据集上对该预测方法进行10重交叉验证,其总体的预测准确率优于4种对照预测方法。

关键词: 生物信息学, LZ复杂性相似度, K近邻, 蛋白质, 亚细胞位点

Abstract: A method to predict the subcellular location of proteins is proposed based on the LZ complexity similarity of symbolic sequences and K nearest neighbor rule. Compared to many other features, the calculation of the LZ complexity similarity between protein sequences requires little detailed field knowledge of biology, nor accessorial data besides the sequences of proteins. The lazy learning characteristic of the K nearest neighbor rule facilitates the prediction of protein subcellular location when the number of proteins, which subcellular location has been determined, increases dynamically. The proposed prediction method is tested on the standard RH dataset using a 10-Fold cross validation. The total precision of the proposed method is better than the results of other four contrast methods.

Key words: Bioinformatics, LZ Complexity similarity, K nearest neighbor(KNN), Protein, Subcellular location