论文部分内容阅读
在基于微阵列的癌症分类中,由于变量(基因表达)较多,而实验条件较少,因此特征选择和分类方法非常重要。对于疾病诊断,分类器的性能直接影响到最终结果的准确性。本文提出一种新的基因选择和分类方法,这种方法使用基于递归特征排除(RFE)的非线性核支持向量机(SVM)。实验表明本文方法比其它线性分类方法具有更好的整体表现,如线性核支持向量机和Fisher线性判别分析方法;同样本文方法也比一些非线性分类方法更好,如采用非线性核的最小二乘支持向量机(LS-SVM)。实验除了使用测试集,还使用留一校验算法(leave-one- out)用于测试分类器的泛化性能。实验采用可通过互联网获得的AML/ALL数据集和遗传性乳腺癌数据集。
In microarray-based cancer classification, the method of feature selection and classification is very important because of the large number of variables (gene expression) and the lack of experimental conditions. For disease diagnosis, the performance of the classifier directly affects the accuracy of the final result. In this paper, a new method of gene selection and classification is proposed, which uses nonlinear kernel support vector machine (SVM) based on Recursive Feature Exclusion (RFE). Experiments show that this method has better overall performance than other linear classification methods, such as linear kernel support vector machine and Fisher linear discriminant analysis method; the same method is also better than some non-linear classification methods, such as the use of nonlinear kernel minimum two Multiply Support Vector Machines (LS-SVM). In addition to using the test suite for the experiment, a leave-one-out algorithm is also used to test the generalization performance of the classifier. The experiment used AML / ALL datasets and hereditary breast cancer datasets available on the internet.