Deep Molecular Dreaming: Inverse machine learning for de-novo molecular design and interpretability with surjective representations

Cynthia X Shen, M. Krenn, S. Eppel, Alán Aspuru-Guzik
{"title":"Deep Molecular Dreaming: Inverse machine learning for de-novo molecular design and interpretability with surjective representations","authors":"Cynthia X Shen, M. Krenn, S. Eppel, Alán Aspuru-Guzik","doi":"10.1088/2632-2153/ac09d6","DOIUrl":null,"url":null,"abstract":"Computer-based de-novo design of functional molecules is one of the most prominent challenges in cheminformatics today. As a result, generative and evolutionary inverse designs from the field of artificial intelligence have emerged at a rapid pace, with aims to optimize molecules for a particular chemical property. These models 'indirectly' explore the chemical space; by learning latent spaces, policies, distributions or by applying mutations on populations of molecules. However, the recent development of the SELFIES string representation of molecules, a surjective alternative to SMILES, have made possible other potential techniques. Based on SELFIES, we therefore propose PASITHEA, a direct gradient-based molecule optimization that applies inceptionism techniques from computer vision. PASITHEA exploits the use of gradients by directly reversing the learning process of a neural network, which is trained to predict real-valued chemical properties. Effectively, this forms an inverse regression model, which is capable of generating molecular variants optimized for a certain property. Although our results are preliminary, we observe a shift in distribution of a chosen property during inverse-training, a clear indication of PASITHEA's viability. A striking property of inceptionism is that we can directly probe the model's understanding of the chemical space it was trained on. We expect that extending PASITHEA to larger datasets, molecules and more complex properties will lead to advances in the design of new functional molecules as well as the interpretation and explanation of machine learning models.","PeriodicalId":18148,"journal":{"name":"Mach. Learn. Sci. Technol.","volume":"82 1","pages":"03"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"28","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mach. Learn. Sci. Technol.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1088/2632-2153/ac09d6","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 28

Abstract

Computer-based de-novo design of functional molecules is one of the most prominent challenges in cheminformatics today. As a result, generative and evolutionary inverse designs from the field of artificial intelligence have emerged at a rapid pace, with aims to optimize molecules for a particular chemical property. These models 'indirectly' explore the chemical space; by learning latent spaces, policies, distributions or by applying mutations on populations of molecules. However, the recent development of the SELFIES string representation of molecules, a surjective alternative to SMILES, have made possible other potential techniques. Based on SELFIES, we therefore propose PASITHEA, a direct gradient-based molecule optimization that applies inceptionism techniques from computer vision. PASITHEA exploits the use of gradients by directly reversing the learning process of a neural network, which is trained to predict real-valued chemical properties. Effectively, this forms an inverse regression model, which is capable of generating molecular variants optimized for a certain property. Although our results are preliminary, we observe a shift in distribution of a chosen property during inverse-training, a clear indication of PASITHEA's viability. A striking property of inceptionism is that we can directly probe the model's understanding of the chemical space it was trained on. We expect that extending PASITHEA to larger datasets, molecules and more complex properties will lead to advances in the design of new functional molecules as well as the interpretation and explanation of machine learning models.
深度分子梦:反机器学习用于从头分子设计和满射表征的可解释性
基于计算机的功能分子从头设计是当今化学信息学中最突出的挑战之一。因此,人工智能领域的生成式和进化式逆设计迅速涌现,旨在优化分子的特定化学性质。这些模型“间接”探索化学空间;通过学习潜在空间、策略、分布,或者通过对分子群体施加突变。然而,最近对分子的自拍照字符串表示(一种替代SMILES的满射)的发展,使其他潜在的技术成为可能。因此,基于自拍,我们提出了PASITHEA,这是一种直接基于梯度的分子优化,应用了计算机视觉中的inception技术。PASITHEA通过直接逆转神经网络的学习过程来利用梯度,该神经网络被训练来预测实值化学性质。实际上,这形成了一个逆回归模型,该模型能够生成针对某一特性优化的分子变体。虽然我们的结果是初步的,但我们观察到在反向训练期间所选属性的分布发生了变化,这清楚地表明了PASITHEA的可行性。inception主义的一个显著特性是,我们可以直接探测模型对它所训练的化学空间的理解。我们期望将PASITHEA扩展到更大的数据集、分子和更复杂的属性,将导致新功能分子的设计以及机器学习模型的解释和解释取得进展。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信