用于发现的可解释机器学习:统计学的挑战和机遇

IF 7.4 1区数学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Annual Review of Statistics and Its Application Pub Date : 2023-11-17 DOI:10.1146/annurev-statistics-040120-030919

Genevera I. Allen, Luqin Gan, Lili Zheng

{"title":"用于发现的可解释机器学习:统计学的挑战和机遇","authors":"Genevera I. Allen, Luqin Gan, Lili Zheng","doi":"10.1146/annurev-statistics-040120-030919","DOIUrl":null,"url":null,"abstract":"New technologies have led to vast troves of large and complex data sets across many scientific domains and industries. People routinely use machine learning techniques not only to process, visualize, and make predictions from these big data, but also to make data-driven discoveries. These discoveries are often made using interpretable machine learning, or machine learning models and techniques that yield human-understandable insights. In this article, we discuss and review the field of interpretable machine learning, focusing especially on the techniques, as they are often employed to generate new knowledge or make discoveries from large data sets. We outline the types of discoveries that can be made using interpretable machine learning in both supervised and unsupervised settings. Additionally, we focus on the grand challenge of how to validate these discoveries in a data-driven manner, which promotes trust in machine learning systems and reproducibility in science. We discuss validation both from a practical perspective, reviewing approaches based on data-splitting and stability, as well as from a theoretical perspective, reviewing statistical results on model selection consistency and uncertainty quantification via statistical inference. Finally, we conclude by highlighting open challenges in using interpretable machine learning techniques to make discoveries, including gaps between theory and practice for validating data-driven discoveries.Expected final online publication date for the Annual Review of Statistics and Its Application, Volume 11 is March 2024. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"59 9","pages":""},"PeriodicalIF":7.4000,"publicationDate":"2023-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Interpretable Machine Learning for Discovery: Statistical Challenges and Opportunities\",\"authors\":\"Genevera I. Allen, Luqin Gan, Lili Zheng\",\"doi\":\"10.1146/annurev-statistics-040120-030919\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"New technologies have led to vast troves of large and complex data sets across many scientific domains and industries. People routinely use machine learning techniques not only to process, visualize, and make predictions from these big data, but also to make data-driven discoveries. These discoveries are often made using interpretable machine learning, or machine learning models and techniques that yield human-understandable insights. In this article, we discuss and review the field of interpretable machine learning, focusing especially on the techniques, as they are often employed to generate new knowledge or make discoveries from large data sets. We outline the types of discoveries that can be made using interpretable machine learning in both supervised and unsupervised settings. Additionally, we focus on the grand challenge of how to validate these discoveries in a data-driven manner, which promotes trust in machine learning systems and reproducibility in science. We discuss validation both from a practical perspective, reviewing approaches based on data-splitting and stability, as well as from a theoretical perspective, reviewing statistical results on model selection consistency and uncertainty quantification via statistical inference. Finally, we conclude by highlighting open challenges in using interpretable machine learning techniques to make discoveries, including gaps between theory and practice for validating data-driven discoveries.Expected final online publication date for the Annual Review of Statistics and Its Application, Volume 11 is March 2024. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.\",\"PeriodicalId\":48855,\"journal\":{\"name\":\"Annual Review of Statistics and Its Application\",\"volume\":\"59 9\",\"pages\":\"\"},\"PeriodicalIF\":7.4000,\"publicationDate\":\"2023-11-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annual Review of Statistics and Its Application\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1146/annurev-statistics-040120-030919\",\"RegionNum\":1,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annual Review of Statistics and Its Application","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1146/annurev-statistics-040120-030919","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

新技术带来了跨越许多科学领域和行业的大量复杂数据集。人们经常使用机器学习技术，不仅可以处理、可视化并从这些大数据中做出预测，还可以进行数据驱动的发现。这些发现通常是使用可解释的机器学习，或者机器学习模型和技术来产生人类可以理解的见解。在本文中，我们讨论和回顾了可解释机器学习领域，特别关注这些技术，因为它们经常被用来产生新知识或从大型数据集中发现。我们概述了在监督和无监督设置中使用可解释机器学习可以获得的发现类型。此外，我们专注于如何以数据驱动的方式验证这些发现的重大挑战，这促进了对机器学习系统的信任和科学的可重复性。我们从实践的角度，回顾了基于数据分裂和稳定性的方法，从理论的角度，回顾了模型选择一致性和不确定性量化的统计结果。最后，我们强调了使用可解释机器学习技术进行发现的开放性挑战，包括验证数据驱动发现的理论与实践之间的差距。预计《统计年鉴及其应用》第11卷的最终在线出版日期为2024年3月。修订后的估计数请参阅http://www.annualreviews.org/page/journal/pubdates。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Interpretable Machine Learning for Discovery: Statistical Challenges and Opportunities

New technologies have led to vast troves of large and complex data sets across many scientific domains and industries. People routinely use machine learning techniques not only to process, visualize, and make predictions from these big data, but also to make data-driven discoveries. These discoveries are often made using interpretable machine learning, or machine learning models and techniques that yield human-understandable insights. In this article, we discuss and review the field of interpretable machine learning, focusing especially on the techniques, as they are often employed to generate new knowledge or make discoveries from large data sets. We outline the types of discoveries that can be made using interpretable machine learning in both supervised and unsupervised settings. Additionally, we focus on the grand challenge of how to validate these discoveries in a data-driven manner, which promotes trust in machine learning systems and reproducibility in science. We discuss validation both from a practical perspective, reviewing approaches based on data-splitting and stability, as well as from a theoretical perspective, reviewing statistical results on model selection consistency and uncertainty quantification via statistical inference. Finally, we conclude by highlighting open challenges in using interpretable machine learning techniques to make discoveries, including gaps between theory and practice for validating data-driven discoveries.Expected final online publication date for the Annual Review of Statistics and Its Application, Volume 11 is March 2024. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Annual Review of Statistics and Its Application MATHEMATICS, INTERDISCIPLINARY APPLICATIONS-STATISTICS & PROBABILITY

CiteScore

13.40

自引率

1.30%

发文量

期刊介绍： The Annual Review of Statistics and Its Application publishes comprehensive review articles focusing on methodological advancements in statistics and the utilization of computational tools facilitating these advancements. It is abstracted and indexed in Scopus, Science Citation Index Expanded, and Inspec.