{"title":"Methodological approach from the Best Overall Team in the sbv IMPROVER Diagnostic Signature Challenge","authors":"A. Tarca, N. Than, R. Romero","doi":"10.4161/sysb.25980","DOIUrl":null,"url":null,"abstract":"The sbv IMPROVER Diagnostic Signature Challenge used crowdsourcing to identify the best methods to classify clinical samples using transcriptomics data. Participating teams used public microarray data sets to develop prediction models in four disease areas, and then made predictions on blinded test data generated by the organizers. Here we describe the approach of the team for the Perinatology Research Branch (Team PRB; AL Tarca, R Romero), that was awarded the best performing entrant prize out of 54 entrants. The key elements of our approach included: (1) selection of training data sets by trial and error; (2) removal of batch effects by pre-processing the test and training data together; (3) the use of statistical significance and magnitude of change to select biomarkers; and (4) optimization of the number of biomarkers via the cross-validated performance of a simple linear discriminant analysis (LDA) model. Not only were our resulting models ranked consistently high, but they also generated parsimonious signatures of as low as two genes, unlike most of the other top-ranked teams that used hundreds of genes for prediction.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"1 1","pages":"217 - 227"},"PeriodicalIF":0.0000,"publicationDate":"2013-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4161/sysb.25980","citationCount":"24","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Systems biomedicine (Austin, Tex.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4161/sysb.25980","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 24
Abstract
The sbv IMPROVER Diagnostic Signature Challenge used crowdsourcing to identify the best methods to classify clinical samples using transcriptomics data. Participating teams used public microarray data sets to develop prediction models in four disease areas, and then made predictions on blinded test data generated by the organizers. Here we describe the approach of the team for the Perinatology Research Branch (Team PRB; AL Tarca, R Romero), that was awarded the best performing entrant prize out of 54 entrants. The key elements of our approach included: (1) selection of training data sets by trial and error; (2) removal of batch effects by pre-processing the test and training data together; (3) the use of statistical significance and magnitude of change to select biomarkers; and (4) optimization of the number of biomarkers via the cross-validated performance of a simple linear discriminant analysis (LDA) model. Not only were our resulting models ranked consistently high, but they also generated parsimonious signatures of as low as two genes, unlike most of the other top-ranked teams that used hundreds of genes for prediction.
sbv IMPROVER诊断签名挑战赛采用众包的方式,利用转录组学数据确定对临床样本进行分类的最佳方法。参与团队使用公共微阵列数据集开发了四个疾病领域的预测模型,然后根据组织者生成的盲法测试数据进行预测。在这里,我们描述了围产期研究部门(团队PRB;AL Tarca, R Romero),从54个参赛者中获得了最佳表现参赛者奖。我们方法的关键要素包括:(1)通过试错法选择训练数据集;(2)通过对测试数据和训练数据进行预处理,去除批次效应;(3)利用统计显著性和变化幅度来选择生物标志物;(4)通过交叉验证的简单线性判别分析(LDA)模型优化生物标记物的数量。我们的结果模型不仅排名一直很高,而且它们还生成了低至两个基因的简约特征,这与其他大多数排名靠前的团队使用数百个基因进行预测不同。