{"title":"Biomarker Discovery: Introduction to Statistical Learning and Integrative Bioinformatics Approaches","authors":"D. Repsilber, M. Jacobsen","doi":"10.1002/9780470744307.GAT223","DOIUrl":null,"url":null,"abstract":"In toxicology, biomarkers are needed for use in screenings, time series and dilution series exposure studies for safety evaluation and risk assessment. They need to be easily and reproducibly measurable, and are therefore sought amongst molecular features using OMICs high-throughput technologies in assays of blood and other easily accessible tissue. This chapter conveys methods for screening OMICs datasets for candidate biomarkers for classification. We begin focussing on single biomarker detection, and survey improvements to the t-test as well as multiplicity corrections regarding this objective. Biomarker panels (biosignatures) are patterns of several combined single features. We describe their detection using three different methods of statistical learning. Here, a special focus is on avoiding overfitting through appropriate use of cross-validation. More sophisticated approaches using gene-set enrichment algorithms and steps towards integrated bioinformatics analyses are explained. Making use of a priori knowledge about regulatory structures (gene groups, correlation structures) may further improve classification efficiency of the detected biosignatures. As the red line, we exemplify analysis possibilities using the famous Golub gene expression dataset and the appropriate R-scripts – enabling the reader to reproduce every step on his own desktop. \n \n \nKeywords: \n \nbiomarker; \nfeature selection; \nmultivariate signature; \ncross-validation; \ndiagnosis; \nprediction; \nstatistical learning; \nintegrative bioinformatics","PeriodicalId":325382,"journal":{"name":"General, Applied and Systems Toxicology","volume":"117 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"General, Applied and Systems Toxicology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/9780470744307.GAT223","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In toxicology, biomarkers are needed for use in screenings, time series and dilution series exposure studies for safety evaluation and risk assessment. They need to be easily and reproducibly measurable, and are therefore sought amongst molecular features using OMICs high-throughput technologies in assays of blood and other easily accessible tissue. This chapter conveys methods for screening OMICs datasets for candidate biomarkers for classification. We begin focussing on single biomarker detection, and survey improvements to the t-test as well as multiplicity corrections regarding this objective. Biomarker panels (biosignatures) are patterns of several combined single features. We describe their detection using three different methods of statistical learning. Here, a special focus is on avoiding overfitting through appropriate use of cross-validation. More sophisticated approaches using gene-set enrichment algorithms and steps towards integrated bioinformatics analyses are explained. Making use of a priori knowledge about regulatory structures (gene groups, correlation structures) may further improve classification efficiency of the detected biosignatures. As the red line, we exemplify analysis possibilities using the famous Golub gene expression dataset and the appropriate R-scripts – enabling the reader to reproduce every step on his own desktop.
Keywords:
biomarker;
feature selection;
multivariate signature;
cross-validation;
diagnosis;
prediction;
statistical learning;
integrative bioinformatics