Systematic generation and analysis of counterfactuals for compound activity predictions using multi-task models

IF 3.597 Q2 Pharmacology, Toxicology and Pharmaceutics
MedChemComm Pub Date : 2024-04-08 DOI:10.1039/D4MD00128A
Alec Lamens and Jürgen Bajorath
{"title":"Systematic generation and analysis of counterfactuals for compound activity predictions using multi-task models","authors":"Alec Lamens and Jürgen Bajorath","doi":"10.1039/D4MD00128A","DOIUrl":null,"url":null,"abstract":"<p >Most machine learning (ML) methods produce predictions that are hard or impossible to understand. The black box nature of predictive models obscures potential learning bias and makes it difficult to recognize and trace problems. Moreover, the inability to rationalize model decisions causes reluctance to accept predictions for experimental design. For ML, limited trust in predictions presents a substantial problem and continues to limit its impact in interdisciplinary research, including early-phase drug discovery. As a desirable remedy, approaches from explainable artificial intelligence (XAI) are increasingly applied to shed light on the ML black box and help to rationalize predictions. Among these is the concept of counterfactuals (CFs), which are best understood as test cases with small modifications yielding opposing prediction outcomes (such as different class labels in object classification). For ML applications in medicinal chemistry, for example, compound activity predictions, CFs are particularly intuitive because these hypothetical molecules enable immediate comparisons with actual test compounds that do not require expert ML knowledge and are accessible to practicing chemists. Such comparisons often reveal structural moieties in compounds that determine their predictions and can be further investigated. Herein, we adapt and extend a recently introduced concept for the systematic generation of molecular CFs to multi-task predictions of different classes of protein kinase inhibitors, analyze CFs in detail, rationalize the origins of CF formation in multi-task modeling, and present exemplary explanations of predictions.</p>","PeriodicalId":88,"journal":{"name":"MedChemComm","volume":" 5","pages":" 1547-1555"},"PeriodicalIF":3.5970,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"MedChemComm","FirstCategoryId":"1085","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2024/md/d4md00128a","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Pharmacology, Toxicology and Pharmaceutics","Score":null,"Total":0}
引用次数: 0

Abstract

Most machine learning (ML) methods produce predictions that are hard or impossible to understand. The black box nature of predictive models obscures potential learning bias and makes it difficult to recognize and trace problems. Moreover, the inability to rationalize model decisions causes reluctance to accept predictions for experimental design. For ML, limited trust in predictions presents a substantial problem and continues to limit its impact in interdisciplinary research, including early-phase drug discovery. As a desirable remedy, approaches from explainable artificial intelligence (XAI) are increasingly applied to shed light on the ML black box and help to rationalize predictions. Among these is the concept of counterfactuals (CFs), which are best understood as test cases with small modifications yielding opposing prediction outcomes (such as different class labels in object classification). For ML applications in medicinal chemistry, for example, compound activity predictions, CFs are particularly intuitive because these hypothetical molecules enable immediate comparisons with actual test compounds that do not require expert ML knowledge and are accessible to practicing chemists. Such comparisons often reveal structural moieties in compounds that determine their predictions and can be further investigated. Herein, we adapt and extend a recently introduced concept for the systematic generation of molecular CFs to multi-task predictions of different classes of protein kinase inhibitors, analyze CFs in detail, rationalize the origins of CF formation in multi-task modeling, and present exemplary explanations of predictions.

Abstract Image

Abstract Image

利用多任务模型系统地生成和分析用于化合物活性预测的反事实数据
大多数机器学习 (ML) 方法产生的预测结果很难或根本无法理解。预测模型的黑箱性质掩盖了潜在的学习偏差,难以识别和追踪问题。此外,由于无法合理解释模型决策,人们也不愿意在实验设计中接受预测结果。对于 ML 而言,对预测的有限信任是一个重大问题,并将继续限制其在跨学科研究(包括早期药物发现)中的影响。作为一种可取的补救措施,可解释人工智能(XAI)方法越来越多地被用于揭示 ML 黑箱,并帮助使预测合理化。其中包括反事实(counterfactuals,CFs)的概念,反事实最好理解为对预测结果(如对象分类中的不同类标签)进行微小修改的测试案例。对于药物化学中的 ML 应用(例如化合物活性预测)来说,CFs 尤其直观,因为这些假定的分子可以与实际的测试化合物进行直接比较,而不需要 ML 专家的知识,实践化学家也可以进行比较。这种比较往往能揭示化合物中决定其预测结果的结构分子,并可对其进行进一步研究。在本文中,我们将最近引入的系统生成分子CF的概念调整并扩展到不同类别蛋白激酶抑制剂的多任务预测中,详细分析了CF,合理解释了多任务建模中CF形成的起源,并提出了预测的示范性解释。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
MedChemComm
MedChemComm BIOCHEMISTRY & MOLECULAR BIOLOGY-CHEMISTRY, MEDICINAL
CiteScore
4.70
自引率
0.00%
发文量
0
审稿时长
2.2 months
期刊介绍: Research and review articles in medicinal chemistry and related drug discovery science; the official journal of the European Federation for Medicinal Chemistry. In 2020, MedChemComm will change its name to RSC Medicinal Chemistry. Issue 12, 2019 will be the last issue as MedChemComm.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信