Winston Chen, Yifan Jiang, William Stafford Noble, Yang Young Lu
{"title":"机器学习模型中误差控制的非加性交互发现","authors":"Winston Chen, Yifan Jiang, William Stafford Noble, Yang Young Lu","doi":"10.1038/s42256-025-01086-8","DOIUrl":null,"url":null,"abstract":"Machine learning (ML) models are powerful tools for detecting complex patterns, yet their ‘black-box’ nature limits their interpretability, hindering their use in critical domains like healthcare and finance. Interpretable ML methods aim to explain how features influence model predictions but often focus on univariate feature importance, overlooking complex feature interactions. Although recent efforts extend interpretability to feature interactions, existing approaches struggle with robustness and error control, especially under data perturbations. In this study, we introduce Diamond, a method for trustworthy feature interaction discovery. Diamond uniquely integrates the model-X knockoffs framework to control the false discovery rate, ensuring a low proportion of falsely detected interactions. Diamond includes a non-additivity distillation procedure that refines existing interaction importance measures to isolate non-additive interaction effects and preserve false discovery rate control. This approach addresses the limitations of off-the-shelf interaction measures, which, when used naively, can lead to inaccurate discoveries. Diamond’s applicability spans a broad class of ML models, including deep neural networks, transformers, tree-based models and factorization-based models. Empirical evaluations on both simulated and real datasets across various biomedical studies demonstrate its utility in enabling reliable data-driven scientific discoveries. Diamond represents a significant step forward in leveraging ML for scientific innovation and hypothesis generation. Diamond, a statistically rigorous method, is capable of finding meaningful feature interactions within machine learning models, making black-box models more interpretable for science and medicine.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 9","pages":"1541-1554"},"PeriodicalIF":23.9000,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.comhttps://www.nature.com/articles/s42256-025-01086-8.pdf","citationCount":"0","resultStr":"{\"title\":\"Error-controlled non-additive interaction discovery in machine learning models\",\"authors\":\"Winston Chen, Yifan Jiang, William Stafford Noble, Yang Young Lu\",\"doi\":\"10.1038/s42256-025-01086-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine learning (ML) models are powerful tools for detecting complex patterns, yet their ‘black-box’ nature limits their interpretability, hindering their use in critical domains like healthcare and finance. Interpretable ML methods aim to explain how features influence model predictions but often focus on univariate feature importance, overlooking complex feature interactions. Although recent efforts extend interpretability to feature interactions, existing approaches struggle with robustness and error control, especially under data perturbations. In this study, we introduce Diamond, a method for trustworthy feature interaction discovery. Diamond uniquely integrates the model-X knockoffs framework to control the false discovery rate, ensuring a low proportion of falsely detected interactions. Diamond includes a non-additivity distillation procedure that refines existing interaction importance measures to isolate non-additive interaction effects and preserve false discovery rate control. This approach addresses the limitations of off-the-shelf interaction measures, which, when used naively, can lead to inaccurate discoveries. Diamond’s applicability spans a broad class of ML models, including deep neural networks, transformers, tree-based models and factorization-based models. Empirical evaluations on both simulated and real datasets across various biomedical studies demonstrate its utility in enabling reliable data-driven scientific discoveries. Diamond represents a significant step forward in leveraging ML for scientific innovation and hypothesis generation. Diamond, a statistically rigorous method, is capable of finding meaningful feature interactions within machine learning models, making black-box models more interpretable for science and medicine.\",\"PeriodicalId\":48533,\"journal\":{\"name\":\"Nature Machine Intelligence\",\"volume\":\"7 9\",\"pages\":\"1541-1554\"},\"PeriodicalIF\":23.9000,\"publicationDate\":\"2025-09-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.nature.comhttps://www.nature.com/articles/s42256-025-01086-8.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Nature Machine Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.nature.com/articles/s42256-025-01086-8\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.nature.com/articles/s42256-025-01086-8","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Error-controlled non-additive interaction discovery in machine learning models
Machine learning (ML) models are powerful tools for detecting complex patterns, yet their ‘black-box’ nature limits their interpretability, hindering their use in critical domains like healthcare and finance. Interpretable ML methods aim to explain how features influence model predictions but often focus on univariate feature importance, overlooking complex feature interactions. Although recent efforts extend interpretability to feature interactions, existing approaches struggle with robustness and error control, especially under data perturbations. In this study, we introduce Diamond, a method for trustworthy feature interaction discovery. Diamond uniquely integrates the model-X knockoffs framework to control the false discovery rate, ensuring a low proportion of falsely detected interactions. Diamond includes a non-additivity distillation procedure that refines existing interaction importance measures to isolate non-additive interaction effects and preserve false discovery rate control. This approach addresses the limitations of off-the-shelf interaction measures, which, when used naively, can lead to inaccurate discoveries. Diamond’s applicability spans a broad class of ML models, including deep neural networks, transformers, tree-based models and factorization-based models. Empirical evaluations on both simulated and real datasets across various biomedical studies demonstrate its utility in enabling reliable data-driven scientific discoveries. Diamond represents a significant step forward in leveraging ML for scientific innovation and hypothesis generation. Diamond, a statistically rigorous method, is capable of finding meaningful feature interactions within machine learning models, making black-box models more interpretable for science and medicine.
期刊介绍:
Nature Machine Intelligence is a distinguished publication that presents original research and reviews on various topics in machine learning, robotics, and AI. Our focus extends beyond these fields, exploring their profound impact on other scientific disciplines, as well as societal and industrial aspects. We recognize limitless possibilities wherein machine intelligence can augment human capabilities and knowledge in domains like scientific exploration, healthcare, medical diagnostics, and the creation of safe and sustainable cities, transportation, and agriculture. Simultaneously, we acknowledge the emergence of ethical, social, and legal concerns due to the rapid pace of advancements.
To foster interdisciplinary discussions on these far-reaching implications, Nature Machine Intelligence serves as a platform for dialogue facilitated through Comments, News Features, News & Views articles, and Correspondence. Our goal is to encourage a comprehensive examination of these subjects.
Similar to all Nature-branded journals, Nature Machine Intelligence operates under the guidance of a team of skilled editors. We adhere to a fair and rigorous peer-review process, ensuring high standards of copy-editing and production, swift publication, and editorial independence.