可解释的人工智能揭示了无监督学习模型中的聪明汉斯效应

IF 18.8 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Nature Machine Intelligence Pub Date : 2025-03-17 DOI:10.1038/s42256-025-01000-2

Jacob Kauffmann, Jonas Dippel, Lukas Ruff, Wojciech Samek, Klaus-Robert Müller, Grégoire Montavon

{"title":"可解释的人工智能揭示了无监督学习模型中的聪明汉斯效应","authors":"Jacob Kauffmann, Jonas Dippel, Lukas Ruff, Wojciech Samek, Klaus-Robert Müller, Grégoire Montavon","doi":"10.1038/s42256-025-01000-2","DOIUrl":null,"url":null,"abstract":"Unsupervised learning has become an essential building block of artifical intelligence systems. The representations it produces, for example, in foundation models, are critical to a wide variety of downstream applications. It is therefore important to carefully examine unsupervised models to ensure not only that they produce accurate predictions on the available data but also that these accurate predictions do not arise from a Clever Hans (CH) effect. Here, using specially developed explainable artifical intelligence techniques and applying them to popular representation learning and anomaly detection models for image data, we show that CH effects are widespread in unsupervised learning. In particular, through use cases on medical and industrial inspection data, we demonstrate that CH effects systematically lead to significant performance loss of downstream models under plausible dataset shifts or reweighting of different data subgroups. Our empirical findings are enriched by theoretical insights, which point to inductive biases in the unsupervised learning machine as a primary source of CH effects. Overall, our work sheds light on unexplored risks associated with practical applications of unsupervised learning and suggests ways to systematically mitigate CH effects, thereby making unsupervised learning more robust. Building on recent explainable AI techniques, this Article highlights the pervasiveness of Clever Hans effects in unsupervised learning and the substantial risks associated with these effects in terms of the prediction accuracy on new data.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 3","pages":"412-422"},"PeriodicalIF":18.8000,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s42256-025-01000-2.pdf","citationCount":"0","resultStr":"{\"title\":\"Explainable AI reveals Clever Hans effects in unsupervised learning models\",\"authors\":\"Jacob Kauffmann, Jonas Dippel, Lukas Ruff, Wojciech Samek, Klaus-Robert Müller, Grégoire Montavon\",\"doi\":\"10.1038/s42256-025-01000-2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Unsupervised learning has become an essential building block of artifical intelligence systems. The representations it produces, for example, in foundation models, are critical to a wide variety of downstream applications. It is therefore important to carefully examine unsupervised models to ensure not only that they produce accurate predictions on the available data but also that these accurate predictions do not arise from a Clever Hans (CH) effect. Here, using specially developed explainable artifical intelligence techniques and applying them to popular representation learning and anomaly detection models for image data, we show that CH effects are widespread in unsupervised learning. In particular, through use cases on medical and industrial inspection data, we demonstrate that CH effects systematically lead to significant performance loss of downstream models under plausible dataset shifts or reweighting of different data subgroups. Our empirical findings are enriched by theoretical insights, which point to inductive biases in the unsupervised learning machine as a primary source of CH effects. Overall, our work sheds light on unexplored risks associated with practical applications of unsupervised learning and suggests ways to systematically mitigate CH effects, thereby making unsupervised learning more robust. Building on recent explainable AI techniques, this Article highlights the pervasiveness of Clever Hans effects in unsupervised learning and the substantial risks associated with these effects in terms of the prediction accuracy on new data.\",\"PeriodicalId\":48533,\"journal\":{\"name\":\"Nature Machine Intelligence\",\"volume\":\"7 3\",\"pages\":\"412-422\"},\"PeriodicalIF\":18.8000,\"publicationDate\":\"2025-03-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.nature.com/articles/s42256-025-01000-2.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Nature Machine Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.nature.com/articles/s42256-025-01000-2\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.nature.com/articles/s42256-025-01000-2","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

无监督学习已经成为人工智能系统的重要组成部分。它产生的表示，例如，在基础模型中，对于各种各样的下游应用程序是至关重要的。因此，仔细检查无监督模型是很重要的，不仅要确保它们根据现有数据做出准确的预测，还要确保这些准确的预测不是来自聪明的汉斯（CH）效应。在这里，我们使用专门开发的可解释的人工智能技术，并将其应用于流行的图像数据表示学习和异常检测模型，我们表明CH效应在无监督学习中广泛存在。特别是，通过医疗和工业检测数据的用例，我们证明了CH效应在合理的数据集移动或不同数据子组的重新加权下系统性地导致下游模型的显着性能损失。我们的实证研究结果丰富了理论见解，这些见解指出无监督学习机中的归纳偏差是CH效应的主要来源。总的来说，我们的工作揭示了与无监督学习实际应用相关的未探索风险，并提出了系统地减轻CH影响的方法，从而使无监督学习更加稳健。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Explainable AI reveals Clever Hans effects in unsupervised learning models

查看原文本刊更多论文

Explainable AI reveals Clever Hans effects in unsupervised learning models

Unsupervised learning has become an essential building block of artifical intelligence systems. The representations it produces, for example, in foundation models, are critical to a wide variety of downstream applications. It is therefore important to carefully examine unsupervised models to ensure not only that they produce accurate predictions on the available data but also that these accurate predictions do not arise from a Clever Hans (CH) effect. Here, using specially developed explainable artifical intelligence techniques and applying them to popular representation learning and anomaly detection models for image data, we show that CH effects are widespread in unsupervised learning. In particular, through use cases on medical and industrial inspection data, we demonstrate that CH effects systematically lead to significant performance loss of downstream models under plausible dataset shifts or reweighting of different data subgroups. Our empirical findings are enriched by theoretical insights, which point to inductive biases in the unsupervised learning machine as a primary source of CH effects. Overall, our work sheds light on unexplored risks associated with practical applications of unsupervised learning and suggests ways to systematically mitigate CH effects, thereby making unsupervised learning more robust. Building on recent explainable AI techniques, this Article highlights the pervasiveness of Clever Hans effects in unsupervised learning and the substantial risks associated with these effects in terms of the prediction accuracy on new data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Nature Machine Intelligence Multiple-

CiteScore

36.90

自引率

2.10%

发文量

127

期刊介绍： Nature Machine Intelligence is a distinguished publication that presents original research and reviews on various topics in machine learning, robotics, and AI. Our focus extends beyond these fields, exploring their profound impact on other scientific disciplines, as well as societal and industrial aspects. We recognize limitless possibilities wherein machine intelligence can augment human capabilities and knowledge in domains like scientific exploration, healthcare, medical diagnostics, and the creation of safe and sustainable cities, transportation, and agriculture. Simultaneously, we acknowledge the emergence of ethical, social, and legal concerns due to the rapid pace of advancements. To foster interdisciplinary discussions on these far-reaching implications, Nature Machine Intelligence serves as a platform for dialogue facilitated through Comments, News Features, News & Views articles, and Correspondence. Our goal is to encourage a comprehensive examination of these subjects. Similar to all Nature-branded journals, Nature Machine Intelligence operates under the guidance of a team of skilled editors. We adhere to a fair and rigorous peer-review process, ensuring high standards of copy-editing and production, swift publication, and editorial independence.