Biomarker Discovery: Introduction to Statistical Learning and Integrative Bioinformatics Approaches

General, Applied and Systems Toxicology Pub Date : 2011-09-15 DOI:10.1002/9780470744307.GAT223

D. Repsilber, M. Jacobsen

{"title":"Biomarker Discovery: Introduction to Statistical Learning and Integrative Bioinformatics Approaches","authors":"D. Repsilber, M. Jacobsen","doi":"10.1002/9780470744307.GAT223","DOIUrl":null,"url":null,"abstract":"In toxicology, biomarkers are needed for use in screenings, time series and dilution series exposure studies for safety evaluation and risk assessment. They need to be easily and reproducibly measurable, and are therefore sought amongst molecular features using OMICs high-throughput technologies in assays of blood and other easily accessible tissue. This chapter conveys methods for screening OMICs datasets for candidate biomarkers for classification. We begin focussing on single biomarker detection, and survey improvements to the t-test as well as multiplicity corrections regarding this objective. Biomarker panels (biosignatures) are patterns of several combined single features. We describe their detection using three different methods of statistical learning. Here, a special focus is on avoiding overfitting through appropriate use of cross-validation. More sophisticated approaches using gene-set enrichment algorithms and steps towards integrated bioinformatics analyses are explained. Making use of a priori knowledge about regulatory structures (gene groups, correlation structures) may further improve classification efficiency of the detected biosignatures. As the red line, we exemplify analysis possibilities using the famous Golub gene expression dataset and the appropriate R-scripts – enabling the reader to reproduce every step on his own desktop. \n \n \nKeywords: \n \nbiomarker; \nfeature selection; \nmultivariate signature; \ncross-validation; \ndiagnosis; \nprediction; \nstatistical learning; \nintegrative bioinformatics","PeriodicalId":325382,"journal":{"name":"General, Applied and Systems Toxicology","volume":"117 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"General, Applied and Systems Toxicology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/9780470744307.GAT223","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In toxicology, biomarkers are needed for use in screenings, time series and dilution series exposure studies for safety evaluation and risk assessment. They need to be easily and reproducibly measurable, and are therefore sought amongst molecular features using OMICs high-throughput technologies in assays of blood and other easily accessible tissue. This chapter conveys methods for screening OMICs datasets for candidate biomarkers for classification. We begin focussing on single biomarker detection, and survey improvements to the t-test as well as multiplicity corrections regarding this objective. Biomarker panels (biosignatures) are patterns of several combined single features. We describe their detection using three different methods of statistical learning. Here, a special focus is on avoiding overfitting through appropriate use of cross-validation. More sophisticated approaches using gene-set enrichment algorithms and steps towards integrated bioinformatics analyses are explained. Making use of a priori knowledge about regulatory structures (gene groups, correlation structures) may further improve classification efficiency of the detected biosignatures. As the red line, we exemplify analysis possibilities using the famous Golub gene expression dataset and the appropriate R-scripts – enabling the reader to reproduce every step on his own desktop. Keywords: biomarker; feature selection; multivariate signature; cross-validation; diagnosis; prediction; statistical learning; integrative bioinformatics

查看原文本刊更多论文

生物标记物发现:统计学习和综合生物信息学方法导论

在毒理学中，生物标志物需要用于筛选、时间序列和稀释序列暴露研究，以进行安全性评估和风险评估。它们需要容易和可重复测量，因此在血液和其他容易获得的组织的分析中，使用组学高通量技术在分子特征中寻找。本章传达了筛选候选生物标志物的组学数据集进行分类的方法。我们开始关注单一生物标志物的检测，并调查了t检验的改进以及关于这一目标的多重性修正。生物标记面板(生物特征)是几个组合的单一特征的模式。我们使用三种不同的统计学习方法来描述它们的检测。这里，特别关注的是通过适当使用交叉验证来避免过拟合。更复杂的方法使用基因集富集算法和步骤向综合生物信息学分析解释。利用调控结构(基因群、相关结构)的先验知识，可以进一步提高被检测生物特征的分类效率。作为红线，我们使用著名的Golub基因表达数据集和适当的r -脚本举例说明分析的可能性-使读者能够在自己的桌面上复制每一步。关键词:生物标志物;特征选择;多元的签名;交叉验证;诊断;预测;统计学习;综合生物信息学

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

General, Applied and Systems Toxicology

自引率

0.00%

发文量