A spectrum of explainable and interpretable machine learning approaches for genomic studies

IF 4.4 2区 数学 Q1 STATISTICS & PROBABILITY
A. M. Conard, Alan DenAdel, Lorin Crawford
{"title":"A spectrum of explainable and interpretable machine learning approaches for genomic studies","authors":"A. M. Conard, Alan DenAdel, Lorin Crawford","doi":"10.1002/wics.1617","DOIUrl":null,"url":null,"abstract":"The advancement of high‐throughput genomic assays has led to enormous growth in the availability of large‐scale biological datasets. Over the last two decades, these increasingly complex data have required statistical approaches that are more sophisticated than traditional linear models. Machine learning methodologies such as neural networks have yielded state‐of‐the‐art performance for prediction‐based tasks in many biomedical applications. However, a notable downside of these machine learning models is that they typically do not reveal how or why accurate predictions are made. In many areas of biomedicine, this “black box” property can be less than desirable—particularly when there is a need to perform in silico hypothesis testing about a biological system, in addition to justifying model findings for downstream decision‐making, such as determining the best next experiment or treatment strategy. Explainable and interpretable machine learning approaches have emerged to overcome this issue. While explainable methods attempt to derive post hoc understanding of what a model has learned, interpretable models are designed to inherently provide an intelligible definition of their parameters and architecture. Here, we review the model transparency spectrum moving from black box and explainable, to interpretable machine learning methodology. Motivated by applications in genomics, we provide background on the advances across this spectrum, detailing specific approaches in both supervised and unsupervised learning. Importantly, we focus on the promise of incorporating existing biological knowledge when constructing interpretable machine learning methods for biomedical applications. We then close with considerations and opportunities for new development in this space.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":null,"pages":null},"PeriodicalIF":4.4000,"publicationDate":"2023-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Wiley Interdisciplinary Reviews-Computational Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1002/wics.1617","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 4

Abstract

The advancement of high‐throughput genomic assays has led to enormous growth in the availability of large‐scale biological datasets. Over the last two decades, these increasingly complex data have required statistical approaches that are more sophisticated than traditional linear models. Machine learning methodologies such as neural networks have yielded state‐of‐the‐art performance for prediction‐based tasks in many biomedical applications. However, a notable downside of these machine learning models is that they typically do not reveal how or why accurate predictions are made. In many areas of biomedicine, this “black box” property can be less than desirable—particularly when there is a need to perform in silico hypothesis testing about a biological system, in addition to justifying model findings for downstream decision‐making, such as determining the best next experiment or treatment strategy. Explainable and interpretable machine learning approaches have emerged to overcome this issue. While explainable methods attempt to derive post hoc understanding of what a model has learned, interpretable models are designed to inherently provide an intelligible definition of their parameters and architecture. Here, we review the model transparency spectrum moving from black box and explainable, to interpretable machine learning methodology. Motivated by applications in genomics, we provide background on the advances across this spectrum, detailing specific approaches in both supervised and unsupervised learning. Importantly, we focus on the promise of incorporating existing biological knowledge when constructing interpretable machine learning methods for biomedical applications. We then close with considerations and opportunities for new development in this space.
用于基因组研究的一系列可解释和可解释的机器学习方法
高通量基因组分析的进步导致大规模生物数据集的可用性大幅增长。在过去的二十年里,这些日益复杂的数据需要比传统线性模型更复杂的统计方法。神经网络等机器学习方法在许多生物医学应用中为基于预测的任务带来了最先进的性能。然而,这些机器学习模型的一个显著缺点是,它们通常不会揭示如何或为什么做出准确的预测。在生物医学的许多领域,这种“黑匣子”特性可能不太理想,尤其是当需要对生物系统进行计算机假设测试时,除了为下游决策证明模型发现的合理性外,例如确定最佳的下一个实验或治疗策略。为了克服这个问题,出现了可解释和可解释的机器学习方法。虽然可解释的方法试图获得对模型所学内容的事后理解,但可解释的模型被设计为固有地提供其参数和架构的可理解定义。在这里,我们回顾了从黑匣子和可解释到可解释的机器学习方法的模型透明度谱。受基因组学应用的启发,我们提供了这一领域的进展背景,详细介绍了监督和非监督学习的具体方法。重要的是,我们专注于在构建用于生物医学应用的可解释机器学习方法时结合现有生物学知识的前景。然后,我们以这一领域新发展的考虑和机遇作为结束语。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
6.20
自引率
0.00%
发文量
31
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信