作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2007, Vol. 33 ›› Issue (19): 204-206. doi: 10.3969/j.issn.1000-3428.2007.19.072

• 人工智能及识别技术 • 上一篇    下一篇

基于GA/SVM的微阵列数据特征的选择与分类

余伟峰,王广伦,钱夕元   

  1. (华东理工大学理学院,上海 200237)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-10-05 发布日期:2007-10-05

Feature Selection and Classification of Microarray Gene Expression Data Using Genetic Algorithm and Support Vector Machine

YU Wei-feng, WANG Guang-lun, QIAN Xi-yuan   

  1. (School of Science, East China University of Science and Technology, Shanghai 200237)
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-10-05 Published:2007-10-05

摘要: 微阵列数据样本小、维度高的特点给数据分析造成了困难,而主基因的挑选又十分的重要。该文采用遗传算法挑选主基因,其中,用k最邻居距离作为模式识别方法,用支持向量机构造了诊断系统,用不同核函数进行预测分类性能测试。在经典的白血病数据集上,对34个样本的测试集的分类准确率为100%。

关键词: 微阵列数据, 基因表达, 遗传算法, k最邻居距离, 支持向量机

Abstract: Microarray data has the feature of high dimensions and small samples, which causes difficultis to the analysis. Therefore, it is important to select or discover informative genes from microarray data. This paper presents an informative genes selecting method based on genetic algorithm (GA), in which k nearest neighbors (KNN) is implied as a recognition method. Support vector machine (SVM) is used to construct a tumor classifier system and different kernel functions are used to test the performances. This method has been applied to a classic microarray data set (leukemia data) and achieved 100% classification accuracy on the test data set.

Key words: microarray data, gene expression, genetic algorithm, k nearest neighbors, support vector machine(SVM)

中图分类号: