A PAUC-based estimation technique for disease classification and biomarker selection.

IF 0.4 4区数学 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY

Statistical Applications in Genetics and Molecular Biology Pub Date : 2012-10-01 DOI:10.1515/1544-6115.1792

Matthias Schmid, Torsten Hothorn, Friedemann Krause, Christina Rabe

{"title":"A PAUC-based estimation technique for disease classification and biomarker selection.","authors":"Matthias Schmid, Torsten Hothorn, Friedemann Krause, Christina Rabe","doi":"10.1515/1544-6115.1792","DOIUrl":null,"url":null,"abstract":"<p><p>The partial area under the receiver operating characteristic curve (PAUC) is a well-established performance measure to evaluate biomarker combinations for disease classification. Because the PAUC is defined as the area under the ROC curve within a restricted interval of false positive rates, it enables practitioners to quantify sensitivity rates within pre-specified specificity ranges. This issue is of considerable importance for the development of medical screening tests. Although many authors have highlighted the importance of PAUC, there exist only few methods that use the PAUC as an objective function for finding optimal combinations of biomarkers. In this paper, we introduce a boosting method for deriving marker combinations that is explicitly based on the PAUC criterion. The proposed method can be applied in high-dimensional settings where the number of biomarkers exceeds the number of observations. Additionally, the proposed method incorporates a recently proposed variable selection technique (stability selection) that results in sparse prediction rules incorporating only those biomarkers that make relevant contributions to predicting the outcome of interest. Using both simulated data and real data, we demonstrate that our method performs well with respect to both variable selection and prediction accuracy. Specifically, if the focus is on a limited range of specificity values, the new method results in better predictions than other established techniques for disease classification.</p>","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"11 5","pages":""},"PeriodicalIF":0.4000,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/1544-6115.1792","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Applications in Genetics and Molecular Biology","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1515/1544-6115.1792","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 8

Abstract

The partial area under the receiver operating characteristic curve (PAUC) is a well-established performance measure to evaluate biomarker combinations for disease classification. Because the PAUC is defined as the area under the ROC curve within a restricted interval of false positive rates, it enables practitioners to quantify sensitivity rates within pre-specified specificity ranges. This issue is of considerable importance for the development of medical screening tests. Although many authors have highlighted the importance of PAUC, there exist only few methods that use the PAUC as an objective function for finding optimal combinations of biomarkers. In this paper, we introduce a boosting method for deriving marker combinations that is explicitly based on the PAUC criterion. The proposed method can be applied in high-dimensional settings where the number of biomarkers exceeds the number of observations. Additionally, the proposed method incorporates a recently proposed variable selection technique (stability selection) that results in sparse prediction rules incorporating only those biomarkers that make relevant contributions to predicting the outcome of interest. Using both simulated data and real data, we demonstrate that our method performs well with respect to both variable selection and prediction accuracy. Specifically, if the focus is on a limited range of specificity values, the new method results in better predictions than other established techniques for disease classification.

查看原文本刊更多论文

基于pauc的疾病分类和生物标志物选择估计技术。

受试者工作特征曲线下的部分面积(paoc)是一种公认的评估生物标志物组合用于疾病分类的性能指标。由于pac被定义为在假阳性率的限定区间内ROC曲线下的面积，因此从业人员可以在预先规定的特异性范围内量化敏感性。这一问题对发展医疗筛查试验具有相当重要的意义。尽管许多作者都强调了pac的重要性，但使用pac作为寻找生物标志物最佳组合的目标函数的方法很少。在本文中，我们介绍了一种基于paoc准则的标记组合的增强方法。所提出的方法可以应用于高维设置，其中生物标志物的数量超过了观测的数量。此外，所提出的方法结合了最近提出的变量选择技术(稳定性选择)，该技术导致稀疏预测规则仅包含那些对预测感兴趣的结果做出相关贡献的生物标志物。通过模拟数据和实际数据，我们证明了我们的方法在变量选择和预测精度方面都有很好的效果。具体来说，如果重点是在一个有限范围的特异性值上，新方法的预测结果比其他已建立的疾病分类技术更好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Statistical Applications in Genetics and Molecular Biology BIOCHEMISTRY & MOLECULAR BIOLOGY-MATHEMATICAL & COMPUTATIONAL BIOLOGY

自引率

11.10%

发文量

期刊介绍： Statistical Applications in Genetics and Molecular Biology seeks to publish significant research on the application of statistical ideas to problems arising from computational biology. The focus of the papers should be on the relevant statistical issues but should contain a succinct description of the relevant biological problem being considered. The range of topics is wide and will include topics such as linkage mapping, association studies, gene finding and sequence alignment, protein structure prediction, design and analysis of microarray data, molecular evolution and phylogenetic trees, DNA topology, and data base search strategies. Both original research and review articles will be warmly received.