生物标记物发现:统计学习和综合生物信息学方法导论

D. Repsilber, M. Jacobsen
{"title":"生物标记物发现:统计学习和综合生物信息学方法导论","authors":"D. Repsilber, M. Jacobsen","doi":"10.1002/9780470744307.GAT223","DOIUrl":null,"url":null,"abstract":"In toxicology, biomarkers are needed for use in screenings, time series and dilution series exposure studies for safety evaluation and risk assessment. They need to be easily and reproducibly measurable, and are therefore sought amongst molecular features using OMICs high-throughput technologies in assays of blood and other easily accessible tissue. This chapter conveys methods for screening OMICs datasets for candidate biomarkers for classification. We begin focussing on single biomarker detection, and survey improvements to the t-test as well as multiplicity corrections regarding this objective. Biomarker panels (biosignatures) are patterns of several combined single features. We describe their detection using three different methods of statistical learning. Here, a special focus is on avoiding overfitting through appropriate use of cross-validation. More sophisticated approaches using gene-set enrichment algorithms and steps towards integrated bioinformatics analyses are explained. Making use of a priori knowledge about regulatory structures (gene groups, correlation structures) may further improve classification efficiency of the detected biosignatures. As the red line, we exemplify analysis possibilities using the famous Golub gene expression dataset and the appropriate R-scripts – enabling the reader to reproduce every step on his own desktop. \n \n \nKeywords: \n \nbiomarker; \nfeature selection; \nmultivariate signature; \ncross-validation; \ndiagnosis; \nprediction; \nstatistical learning; \nintegrative bioinformatics","PeriodicalId":325382,"journal":{"name":"General, Applied and Systems Toxicology","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2011-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Biomarker Discovery: Introduction to Statistical Learning and Integrative Bioinformatics Approaches\",\"authors\":\"D. Repsilber, M. Jacobsen\",\"doi\":\"10.1002/9780470744307.GAT223\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In toxicology, biomarkers are needed for use in screenings, time series and dilution series exposure studies for safety evaluation and risk assessment. They need to be easily and reproducibly measurable, and are therefore sought amongst molecular features using OMICs high-throughput technologies in assays of blood and other easily accessible tissue. This chapter conveys methods for screening OMICs datasets for candidate biomarkers for classification. We begin focussing on single biomarker detection, and survey improvements to the t-test as well as multiplicity corrections regarding this objective. Biomarker panels (biosignatures) are patterns of several combined single features. We describe their detection using three different methods of statistical learning. Here, a special focus is on avoiding overfitting through appropriate use of cross-validation. More sophisticated approaches using gene-set enrichment algorithms and steps towards integrated bioinformatics analyses are explained. Making use of a priori knowledge about regulatory structures (gene groups, correlation structures) may further improve classification efficiency of the detected biosignatures. As the red line, we exemplify analysis possibilities using the famous Golub gene expression dataset and the appropriate R-scripts – enabling the reader to reproduce every step on his own desktop. \\n \\n \\nKeywords: \\n \\nbiomarker; \\nfeature selection; \\nmultivariate signature; \\ncross-validation; \\ndiagnosis; \\nprediction; \\nstatistical learning; \\nintegrative bioinformatics\",\"PeriodicalId\":325382,\"journal\":{\"name\":\"General, Applied and Systems Toxicology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-09-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"General, Applied and Systems Toxicology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1002/9780470744307.GAT223\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"General, Applied and Systems Toxicology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/9780470744307.GAT223","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在毒理学中,生物标志物需要用于筛选、时间序列和稀释序列暴露研究,以进行安全性评估和风险评估。它们需要容易和可重复测量,因此在血液和其他容易获得的组织的分析中,使用组学高通量技术在分子特征中寻找。本章传达了筛选候选生物标志物的组学数据集进行分类的方法。我们开始关注单一生物标志物的检测,并调查了t检验的改进以及关于这一目标的多重性修正。生物标记面板(生物特征)是几个组合的单一特征的模式。我们使用三种不同的统计学习方法来描述它们的检测。这里,特别关注的是通过适当使用交叉验证来避免过拟合。更复杂的方法使用基因集富集算法和步骤向综合生物信息学分析解释。利用调控结构(基因群、相关结构)的先验知识,可以进一步提高被检测生物特征的分类效率。作为红线,我们使用著名的Golub基因表达数据集和适当的r -脚本举例说明分析的可能性-使读者能够在自己的桌面上复制每一步。关键词:生物标志物;特征选择;多元的签名;交叉验证;诊断;预测;统计学习;综合生物信息学
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Biomarker Discovery: Introduction to Statistical Learning and Integrative Bioinformatics Approaches
In toxicology, biomarkers are needed for use in screenings, time series and dilution series exposure studies for safety evaluation and risk assessment. They need to be easily and reproducibly measurable, and are therefore sought amongst molecular features using OMICs high-throughput technologies in assays of blood and other easily accessible tissue. This chapter conveys methods for screening OMICs datasets for candidate biomarkers for classification. We begin focussing on single biomarker detection, and survey improvements to the t-test as well as multiplicity corrections regarding this objective. Biomarker panels (biosignatures) are patterns of several combined single features. We describe their detection using three different methods of statistical learning. Here, a special focus is on avoiding overfitting through appropriate use of cross-validation. More sophisticated approaches using gene-set enrichment algorithms and steps towards integrated bioinformatics analyses are explained. Making use of a priori knowledge about regulatory structures (gene groups, correlation structures) may further improve classification efficiency of the detected biosignatures. As the red line, we exemplify analysis possibilities using the famous Golub gene expression dataset and the appropriate R-scripts – enabling the reader to reproduce every step on his own desktop. Keywords: biomarker; feature selection; multivariate signature; cross-validation; diagnosis; prediction; statistical learning; integrative bioinformatics
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信