偏最小二乘判别分析结合F-score用于蛋白质组学质谱数据的特征筛选(英文)

来源 :计算机与应用化学 | 被引量 : 0次 | 上传用户:ruyudeishui
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
提出了一种基于偏最小二乘判别分析和F-score的特征筛选方法,并将其用于蛋白质组学质谱数据分析。该方法主要包含3个步骤:(1)用LIMPIC算法对原始数据进行预处理;(2)计算每个变量的F-score值并将所有变量按F-score值降底的顺序排列;(3)采用偏最小二乘判别分析交互检验按前向选择法选择最佳变量子集。用本方法对一组结肠癌数据进行分析,最终从原始的16331个质荷比变量中选择了8个特征质荷比作为潜在的生物标记物。用所选择的特征对独立测试集的样本进行判别,其灵敏度和特异性分别达到了95.24%和100%。结果表明,所提出的方法可用于蛋白质组学质谱数据的特征筛选及样本分类。 A method of feature selection based on partial least-squares discriminant analysis and F-score is proposed and applied to proteomics mass spectrometry data analysis. The method mainly consists of three steps: (1) pre-processing the original data with the LIMPIC algorithm; (2) calculating the F-score value of each variable and arranging all the variables in the descending order of the F-score; (3) ) Partial Least Squares Discriminant Analysis Interactive Test Select the best subset of variables by the forward selection method. A set of colon cancer data was analyzed using this method, and eight characteristic mass-to-charge ratios were finally selected as potential biomarkers from the original 16,341 mass-to-charge ratios. The sensitivity and specificity of the selected test set were 95.24% and 100%, respectively, for discriminating samples from independent test sets. The results show that the proposed method can be used for the characterization and sample classification of proteomics mass spectrometry data.
其他文献
目的:探讨平卧菊三七茎提取物(Ethanol extraction from Gynura Procumbens stem,EEGS)对二乙基亚硝胺(Diethylnirtosamine,DEN)诱导小鼠化学性肝损伤的保护作用。方法:40只