An explainable deep learning platform for molecular discovery.

IF 13.1 1区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS
Felix Wong, Satotaka Omori, Alicia Li, Aarti Krishnan, Ryan S Lach, Joseph Rufo, Maxwell Z Wilson, James J Collins
{"title":"An explainable deep learning platform for molecular discovery.","authors":"Felix Wong, Satotaka Omori, Alicia Li, Aarti Krishnan, Ryan S Lach, Joseph Rufo, Maxwell Z Wilson, James J Collins","doi":"10.1038/s41596-024-01084-x","DOIUrl":null,"url":null,"abstract":"<p><p>Deep learning approaches have been increasingly applied to the discovery of novel chemical compounds. These predictive approaches can accurately model compounds and increase true discovery rates, but they are typically black box in nature and do not generate specific chemical insights. Explainable deep learning aims to 'open up' the black box by providing generalizable and human-understandable reasoning for model predictions. These explanations can augment molecular discovery by identifying structural classes of compounds with desired activity in lieu of lone compounds. Additionally, these explanations can guide hypothesis generation and make searching large chemical spaces more efficient. Here we present an explainable deep learning platform that enables vast chemical spaces to be mined and the chemical substructures underlying predicted activity to be identified. The platform relies on Chemprop, a software package implementing graph neural networks as a deep learning model architecture. In contrast to similar approaches, graph neural networks have been shown to be state of the art for molecular property prediction. Focusing on discovering structural classes of antibiotics, this protocol provides guidelines for experimental data generation, model implementation and model explainability and evaluation. This protocol does not require coding proficiency or specialized hardware, and it can be executed in as little as 1-2 weeks, starting from data generation and ending in the testing of model predictions. The platform can be broadly applied to discover structural classes of other small molecules, including anticancer, antiviral and senolytic drugs, as well as to discover structural classes of inorganic molecules with desired physical and chemical properties.</p>","PeriodicalId":18901,"journal":{"name":"Nature Protocols","volume":" ","pages":""},"PeriodicalIF":13.1000,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Protocols","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1038/s41596-024-01084-x","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Deep learning approaches have been increasingly applied to the discovery of novel chemical compounds. These predictive approaches can accurately model compounds and increase true discovery rates, but they are typically black box in nature and do not generate specific chemical insights. Explainable deep learning aims to 'open up' the black box by providing generalizable and human-understandable reasoning for model predictions. These explanations can augment molecular discovery by identifying structural classes of compounds with desired activity in lieu of lone compounds. Additionally, these explanations can guide hypothesis generation and make searching large chemical spaces more efficient. Here we present an explainable deep learning platform that enables vast chemical spaces to be mined and the chemical substructures underlying predicted activity to be identified. The platform relies on Chemprop, a software package implementing graph neural networks as a deep learning model architecture. In contrast to similar approaches, graph neural networks have been shown to be state of the art for molecular property prediction. Focusing on discovering structural classes of antibiotics, this protocol provides guidelines for experimental data generation, model implementation and model explainability and evaluation. This protocol does not require coding proficiency or specialized hardware, and it can be executed in as little as 1-2 weeks, starting from data generation and ending in the testing of model predictions. The platform can be broadly applied to discover structural classes of other small molecules, including anticancer, antiviral and senolytic drugs, as well as to discover structural classes of inorganic molecules with desired physical and chemical properties.

一个可解释的分子发现深度学习平台。
深度学习方法越来越多地应用于新化合物的发现。这些预测方法可以准确地模拟化合物,提高真正的发现率,但它们本质上是典型的黑匣子,不能产生具体的化学见解。可解释的深度学习旨在通过为模型预测提供可概括和人类可理解的推理来“打开”黑箱。这些解释可以通过识别具有所需活性的化合物的结构类别来代替单个化合物来增加分子发现。此外,这些解释可以指导假设的产生,并使搜索大型化学空间更有效。在这里,我们提出了一个可解释的深度学习平台,可以挖掘大量的化学空间,并识别预测活动背后的化学子结构。该平台依赖于Chemprop,这是一个将图神经网络实现为深度学习模型架构的软件包。与类似的方法相比,图神经网络已被证明是分子性质预测的最新技术。该方案以发现抗生素的结构类别为重点,为实验数据的生成、模型的实现以及模型的可解释性和评估提供指导。该协议不需要编码熟练程度或专门的硬件,它可以在短短的1-2周内执行,从数据生成开始,结束于模型预测的测试。该平台可广泛应用于发现其他小分子的结构类别,包括抗癌、抗病毒和抗衰老药物,以及发现具有所需物理和化学性质的无机分子的结构类别。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Nature Protocols
Nature Protocols 生物-生化研究方法
CiteScore
29.10
自引率
0.70%
发文量
128
审稿时长
4 months
期刊介绍: Nature Protocols focuses on publishing protocols used to address significant biological and biomedical science research questions, including methods grounded in physics and chemistry with practical applications to biological problems. The journal caters to a primary audience of research scientists and, as such, exclusively publishes protocols with research applications. Protocols primarily aimed at influencing patient management and treatment decisions are not featured. The specific techniques covered encompass a wide range, including but not limited to: Biochemistry, Cell biology, Cell culture, Chemical modification, Computational biology, Developmental biology, Epigenomics, Genetic analysis, Genetic modification, Genomics, Imaging, Immunology, Isolation, purification, and separation, Lipidomics, Metabolomics, Microbiology, Model organisms, Nanotechnology, Neuroscience, Nucleic-acid-based molecular biology, Pharmacology, Plant biology, Protein analysis, Proteomics, Spectroscopy, Structural biology, Synthetic chemistry, Tissue culture, Toxicology, and Virology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信