{"title":"Comparing Explanations of Molecular Machine Learning Models Generated with Different Methods for the Calculation of Shapley Values.","authors":"Alec Lamens, Jürgen Bajorath","doi":"10.1002/minf.202500067","DOIUrl":null,"url":null,"abstract":"<p><p>Feature attribution methods from explainable artificial intelligence (XAI) provide explanations of machine learning models by quantifying feature importance for predictions of test instances. While features determining individual predictions have frequently been identified in machine learning applications, the consistency of feature importance-based explanations of machine learning models using different attribution methods has not been thoroughly investigated. We have systematically compared model explanations in molecular machine learning. Therefore, a test system of highly accurate compound activity predictions for different targets using different machine learning methods was generated. For these predictions, explanations were computed using methodological variants of the Shapley value formalism, a popular feature attribution approach in machine learning adapted from game theory. Predictions of each model were assessed using a model-agnostic and model-specific Shapley value-based method. The resulting feature importance distributions were characterized and compared by a global statistical analysis using diverse measures. Unexpectedly, methodological variants for Shapley value calculations yielded distinct feature importance distributions for highly accurate predictions. There was only little agreement between alternative model explanations. Our findings suggest that feature importance-based explanations of machine learning predictions should include an assessment of consistency using alternative methods.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"44 3","pages":"e202500067"},"PeriodicalIF":2.8000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11925390/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/minf.202500067","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
Comparing Explanations of Molecular Machine Learning Models Generated with Different Methods for the Calculation of Shapley Values.
Feature attribution methods from explainable artificial intelligence (XAI) provide explanations of machine learning models by quantifying feature importance for predictions of test instances. While features determining individual predictions have frequently been identified in machine learning applications, the consistency of feature importance-based explanations of machine learning models using different attribution methods has not been thoroughly investigated. We have systematically compared model explanations in molecular machine learning. Therefore, a test system of highly accurate compound activity predictions for different targets using different machine learning methods was generated. For these predictions, explanations were computed using methodological variants of the Shapley value formalism, a popular feature attribution approach in machine learning adapted from game theory. Predictions of each model were assessed using a model-agnostic and model-specific Shapley value-based method. The resulting feature importance distributions were characterized and compared by a global statistical analysis using diverse measures. Unexpectedly, methodological variants for Shapley value calculations yielded distinct feature importance distributions for highly accurate predictions. There was only little agreement between alternative model explanations. Our findings suggest that feature importance-based explanations of machine learning predictions should include an assessment of consistency using alternative methods.
期刊介绍:
Molecular Informatics is a peer-reviewed, international forum for publication of high-quality, interdisciplinary research on all molecular aspects of bio/cheminformatics and computer-assisted molecular design. Molecular Informatics succeeded QSAR & Combinatorial Science in 2010.
Molecular Informatics presents methodological innovations that will lead to a deeper understanding of ligand-receptor interactions, macromolecular complexes, molecular networks, design concepts and processes that demonstrate how ideas and design concepts lead to molecules with a desired structure or function, preferably including experimental validation.
The journal''s scope includes but is not limited to the fields of drug discovery and chemical biology, protein and nucleic acid engineering and design, the design of nanomolecular structures, strategies for modeling of macromolecular assemblies, molecular networks and systems, pharmaco- and chemogenomics, computer-assisted screening strategies, as well as novel technologies for the de novo design of biologically active molecules. As a unique feature Molecular Informatics publishes so-called "Methods Corner" review-type articles which feature important technological concepts and advances within the scope of the journal.