A spectrum of explainable and interpretable machine learning approaches for genomic studies

IF 4.4 2区数学 Q1 STATISTICS & PROBABILITY

Wiley Interdisciplinary Reviews-Computational Statistics Pub Date : 2023-05-04 DOI:10.1002/wics.1617

A. M. Conard, Alan DenAdel, Lorin Crawford

{"title":"A spectrum of explainable and interpretable machine learning approaches for genomic studies","authors":"A. M. Conard, Alan DenAdel, Lorin Crawford","doi":"10.1002/wics.1617","DOIUrl":null,"url":null,"abstract":"The advancement of high‐throughput genomic assays has led to enormous growth in the availability of large‐scale biological datasets. Over the last two decades, these increasingly complex data have required statistical approaches that are more sophisticated than traditional linear models. Machine learning methodologies such as neural networks have yielded state‐of‐the‐art performance for prediction‐based tasks in many biomedical applications. However, a notable downside of these machine learning models is that they typically do not reveal how or why accurate predictions are made. In many areas of biomedicine, this “black box” property can be less than desirable—particularly when there is a need to perform in silico hypothesis testing about a biological system, in addition to justifying model findings for downstream decision‐making, such as determining the best next experiment or treatment strategy. Explainable and interpretable machine learning approaches have emerged to overcome this issue. While explainable methods attempt to derive post hoc understanding of what a model has learned, interpretable models are designed to inherently provide an intelligible definition of their parameters and architecture. Here, we review the model transparency spectrum moving from black box and explainable, to interpretable machine learning methodology. Motivated by applications in genomics, we provide background on the advances across this spectrum, detailing specific approaches in both supervised and unsupervised learning. Importantly, we focus on the promise of incorporating existing biological knowledge when constructing interpretable machine learning methods for biomedical applications. We then close with considerations and opportunities for new development in this space.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":" ","pages":""},"PeriodicalIF":4.4000,"publicationDate":"2023-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Wiley Interdisciplinary Reviews-Computational Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1002/wics.1617","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 4

Abstract

The advancement of high‐throughput genomic assays has led to enormous growth in the availability of large‐scale biological datasets. Over the last two decades, these increasingly complex data have required statistical approaches that are more sophisticated than traditional linear models. Machine learning methodologies such as neural networks have yielded state‐of‐the‐art performance for prediction‐based tasks in many biomedical applications. However, a notable downside of these machine learning models is that they typically do not reveal how or why accurate predictions are made. In many areas of biomedicine, this “black box” property can be less than desirable—particularly when there is a need to perform in silico hypothesis testing about a biological system, in addition to justifying model findings for downstream decision‐making, such as determining the best next experiment or treatment strategy. Explainable and interpretable machine learning approaches have emerged to overcome this issue. While explainable methods attempt to derive post hoc understanding of what a model has learned, interpretable models are designed to inherently provide an intelligible definition of their parameters and architecture. Here, we review the model transparency spectrum moving from black box and explainable, to interpretable machine learning methodology. Motivated by applications in genomics, we provide background on the advances across this spectrum, detailing specific approaches in both supervised and unsupervised learning. Importantly, we focus on the promise of incorporating existing biological knowledge when constructing interpretable machine learning methods for biomedical applications. We then close with considerations and opportunities for new development in this space.

查看原文本刊更多论文

用于基因组研究的一系列可解释和可解释的机器学习方法

高通量基因组分析的进步导致大规模生物数据集的可用性大幅增长。在过去的二十年里，这些日益复杂的数据需要比传统线性模型更复杂的统计方法。神经网络等机器学习方法在许多生物医学应用中为基于预测的任务带来了最先进的性能。然而，这些机器学习模型的一个显著缺点是，它们通常不会揭示如何或为什么做出准确的预测。在生物医学的许多领域，这种“黑匣子”特性可能不太理想，尤其是当需要对生物系统进行计算机假设测试时，除了为下游决策证明模型发现的合理性外，例如确定最佳的下一个实验或治疗策略。为了克服这个问题，出现了可解释和可解释的机器学习方法。虽然可解释的方法试图获得对模型所学内容的事后理解，但可解释的模型被设计为固有地提供其参数和架构的可理解定义。在这里，我们回顾了从黑匣子和可解释到可解释的机器学习方法的模型透明度谱。受基因组学应用的启发，我们提供了这一领域的进展背景，详细介绍了监督和非监督学习的具体方法。重要的是，我们专注于在构建用于生物医学应用的可解释机器学习方法时结合现有生物学知识的前景。然后，我们以这一领域新发展的考虑和机遇作为结束语。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Wiley Interdisciplinary Reviews-Computational Statistics STATISTICS & PROBABILITY-

CiteScore

6.20

自引率

0.00%

发文量