论文部分内容阅读
目的基于微阵列数据,提出提取肿瘤诊断结果相关的基因标志物。方法混合滤波和缠绕方法,采用奇异值分解,以基因和肿瘤诊断结果的相关强度为主要标准,辨识基因标志物,利用基于信息增益的随机森林法对分类率做辅助修正。本方法在3个公共数据库上和常用分类器上做了测试。结果 MonteCarlo统计实验表明,对于Colon数据集,本文方法在NN,RF分类器上显著优于t-test方法;在Prostate数据集上,本文方法在NB的分类器上显著优于对手,在其他数据集和分类器上本方法优于对手但不显著;而在基因稳定度指标上,本方法普遍优于对手。结论提出1种可以定量的,基于可视化的分析基因和诊断结果相关性的方法,与经典方法相比,本文方法提取的基因不仅具有较强的分类性能和对不同分类算法的适应性,而且在总体上也具有较好的基因稳定度。
Objective Based on microarray data, we propose the extraction of genetic markers related to the diagnosis of tumors. Methods Hybrid filtering and wrapping method were used to singulate the singular value decomposition (SVD). The relative intensities of genes and tumor diagnostic results were used as the main criteria to identify the genetic markers. The random forest method based on information gain was used to make ancillary amendments to the classification rate. This method is tested on three public databases and commonly used classifiers. Results MonteCarlo statistical experiments show that the proposed method is superior to t-test in NN and RF classifiers for Colon datasets. In Prostate datasets, the proposed method is superior to the counterparts in NB classifiers. In the other data This method on the set and classifier is superior to the opponent but not significant, while the method is generally superior to the opponent on the gene stability index. Conclusion A quantitative method based on visual analysis of gene and diagnostic results is proposed. Compared with the classical methods, the genes extracted by this method not only have strong classification performance and adaptability to different classification algorithms, Overall, it also has better gene stability.