Investigating the reliability and interpretability of machine learning frameworks for chemical retrosynthesis†

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY
Friedrich Hastedt, Rowan M. Bailey, Klaus Hellgardt, Sophia N. Yaliraki, Ehecatl Antonio del Rio Chanona and Dongda Zhang
{"title":"Investigating the reliability and interpretability of machine learning frameworks for chemical retrosynthesis†","authors":"Friedrich Hastedt, Rowan M. Bailey, Klaus Hellgardt, Sophia N. Yaliraki, Ehecatl Antonio del Rio Chanona and Dongda Zhang","doi":"10.1039/D4DD00007B","DOIUrl":null,"url":null,"abstract":"<p >Machine learning models for chemical retrosynthesis have attracted substantial interest in recent years. Unaddressed challenges, particularly the absence of robust evaluation metrics for performance comparison, and the lack of black-box interpretability, obscure model limitations and impede progress in the field. We present an automated benchmarking pipeline designed for effective model performance comparisons. With an emphasis on user-friendly design, we aim to streamline accessibility and facilitate utilisation within the research community. Additionally, we suggest and perform a new interpretability study to uncover the degree of chemical understanding acquired by retrosynthesis models. Our results reveal that frameworks based on chemical reaction rules yield the most diverse, chemically valid, and feasible reactions, whereas purely data-driven frameworks suffer from unfeasible and invalid predictions. The interpretability study emphasises that incorporating reaction rules not only enhances model performance but also improves interpretability. For simple molecules, we show that Graph Neural Networks identify relevant functional groups in the product molecule, offering model interpretability. Sequence-to-sequence Transformers are not found to provide such an explanation. As the molecule and reaction mechanism grow more complex, both data-driven models propose unfeasible disconnections without offering a chemical rationale. We stress the importance of incorporating chemically meaningful descriptors within deep-learning models. Our study provides valuable guidance for the future development of retrosynthesis frameworks.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":6.2000,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00007b?page=search","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital discovery","FirstCategoryId":"1085","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2024/dd/d4dd00007b","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Machine learning models for chemical retrosynthesis have attracted substantial interest in recent years. Unaddressed challenges, particularly the absence of robust evaluation metrics for performance comparison, and the lack of black-box interpretability, obscure model limitations and impede progress in the field. We present an automated benchmarking pipeline designed for effective model performance comparisons. With an emphasis on user-friendly design, we aim to streamline accessibility and facilitate utilisation within the research community. Additionally, we suggest and perform a new interpretability study to uncover the degree of chemical understanding acquired by retrosynthesis models. Our results reveal that frameworks based on chemical reaction rules yield the most diverse, chemically valid, and feasible reactions, whereas purely data-driven frameworks suffer from unfeasible and invalid predictions. The interpretability study emphasises that incorporating reaction rules not only enhances model performance but also improves interpretability. For simple molecules, we show that Graph Neural Networks identify relevant functional groups in the product molecule, offering model interpretability. Sequence-to-sequence Transformers are not found to provide such an explanation. As the molecule and reaction mechanism grow more complex, both data-driven models propose unfeasible disconnections without offering a chemical rationale. We stress the importance of incorporating chemically meaningful descriptors within deep-learning models. Our study provides valuable guidance for the future development of retrosynthesis frameworks.

Abstract Image

研究用于化学逆合成的机器学习框架的可靠性和可解释性
近年来,用于化学逆合成的机器学习模型引起了广泛关注。但其中存在的挑战尚未得到解决,特别是缺乏用于性能比较的稳健评估指标,以及缺乏黑盒子可解释性,这些都掩盖了模型的局限性,阻碍了该领域的发展。我们提出了一个自动基准管道,旨在进行有效的模型性能比较。我们将重点放在用户友好型设计上,旨在简化可访问性并促进研究界的使用。此外,我们建议并开展了一项新的可解释性研究,以揭示逆合成模型对化学的理解程度。我们的研究结果表明,基于化学反应规则的框架能产生最多样、化学上最有效和最可行的反应,而纯数据驱动的框架则存在预测不可行和无效的问题。可解释性研究强调,纳入反应规则不仅能提高模型性能,还能改善可解释性。对于简单的分子,我们表明图形神经网络可以识别产品分子中的相关官能团,从而提供模型的可解释性。而序列到序列变换器则无法提供这样的解释。随着分子和反应机理变得越来越复杂,这两种数据驱动模型都提出了不可行的断开,却没有提供化学原理。我们强调在深度学习模型中加入化学意义描述符的重要性。我们的研究为逆合成框架的未来发展提供了宝贵的指导。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
2.80
自引率
0.00%
发文量
0
文献相关原料
公司名称 产品信息 采购帮参考价格
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信