SimSon:用于分子性质预测的smile简单对比学习。

Chae Eun Lee, Jin Sob Kim, Jin Hong Min, Sung Won Han
{"title":"SimSon:用于分子性质预测的smile简单对比学习。","authors":"Chae Eun Lee, Jin Sob Kim, Jin Hong Min, Sung Won Han","doi":"10.1093/bioinformatics/btaf275","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Molecular property prediction with deep learning has accelerated drug discovery and retrosynthesis. However, the shortage of labeled molecular data and the challenge of generalizing across the vast chemical spaces pose significant hurdles for leveraging deep learning in molecular property prediction. This study proposes a self-supervised framework designed to acquire a Simplified Molecular Input Line Entry System (SMILES) representation, which we have dubbed Simple SMILES contrastive learning (SimSon). SimSon was pre-trained using unlabeled SMILES data through contrastive learning to grasp the SMILES representations.</p><p><strong>Results: </strong>Our findings demonstrate that contrastive learning with randomized SMILES enriches the ability of the model to generalize and its robustness as it captures the global semantic context at the molecular level. In downstream tasks, SimSon performs competitively when compared to graph-based methods and even outperforms them on certain benchmark datasets. These results indicate that SimSon effectively captures structural information from SMILES, exhibiting remarkable generalization and robustness. The potential applications of SimSon extend to bioinformatics and cheminformatics, encompassing areas such as drug discovery and drug-drug interaction prediction.</p><p><strong>Availability and implementation: </strong>The source code is available at https://github.com/lee00206/SimSon.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SimSon: simple contrastive learning of SMILES for molecular property prediction.\",\"authors\":\"Chae Eun Lee, Jin Sob Kim, Jin Hong Min, Sung Won Han\",\"doi\":\"10.1093/bioinformatics/btaf275\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Motivation: </strong>Molecular property prediction with deep learning has accelerated drug discovery and retrosynthesis. However, the shortage of labeled molecular data and the challenge of generalizing across the vast chemical spaces pose significant hurdles for leveraging deep learning in molecular property prediction. This study proposes a self-supervised framework designed to acquire a Simplified Molecular Input Line Entry System (SMILES) representation, which we have dubbed Simple SMILES contrastive learning (SimSon). SimSon was pre-trained using unlabeled SMILES data through contrastive learning to grasp the SMILES representations.</p><p><strong>Results: </strong>Our findings demonstrate that contrastive learning with randomized SMILES enriches the ability of the model to generalize and its robustness as it captures the global semantic context at the molecular level. In downstream tasks, SimSon performs competitively when compared to graph-based methods and even outperforms them on certain benchmark datasets. These results indicate that SimSon effectively captures structural information from SMILES, exhibiting remarkable generalization and robustness. The potential applications of SimSon extend to bioinformatics and cheminformatics, encompassing areas such as drug discovery and drug-drug interaction prediction.</p><p><strong>Availability and implementation: </strong>The source code is available at https://github.com/lee00206/SimSon.</p>\",\"PeriodicalId\":93899,\"journal\":{\"name\":\"Bioinformatics (Oxford, England)\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-05-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics (Oxford, England)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/bioinformatics/btaf275\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btaf275","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

动机:基于深度学习的分子性质预测加速了药物的发现和反合成。然而,标记分子数据的缺乏以及在广阔的化学空间中推广的挑战为利用深度学习进行分子性质预测带来了重大障碍。本研究提出了一个自我监督框架,旨在获得简化分子输入行输入系统(SMILES)表示,我们将其称为“SimSon”(Simple SMILES对比学习)。使用未标记的SMILES数据,通过对比学习对SimSon进行预训练,以掌握SMILES表征。结果:我们的研究结果表明,随机smile的对比学习丰富了模型的泛化能力和鲁棒性,因为它在分子水平上捕获了全局语义上下文。在下游任务中,与基于图形的方法相比,SimSon的执行具有竞争力,甚至在某些基准数据集上优于它们。结果表明,SimSon有效地捕获了smile的结构信息,具有显著的泛化和鲁棒性。SimSon的潜在应用扩展到生物信息学和化学信息学,包括药物发现和药物-药物相互作用预测等领域。可用性和实现:源代码可在https://github.com/lee00206/SimSon.Supplementary信息中获得;补充信息可在Bioinformatics在线上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
SimSon: simple contrastive learning of SMILES for molecular property prediction.

Motivation: Molecular property prediction with deep learning has accelerated drug discovery and retrosynthesis. However, the shortage of labeled molecular data and the challenge of generalizing across the vast chemical spaces pose significant hurdles for leveraging deep learning in molecular property prediction. This study proposes a self-supervised framework designed to acquire a Simplified Molecular Input Line Entry System (SMILES) representation, which we have dubbed Simple SMILES contrastive learning (SimSon). SimSon was pre-trained using unlabeled SMILES data through contrastive learning to grasp the SMILES representations.

Results: Our findings demonstrate that contrastive learning with randomized SMILES enriches the ability of the model to generalize and its robustness as it captures the global semantic context at the molecular level. In downstream tasks, SimSon performs competitively when compared to graph-based methods and even outperforms them on certain benchmark datasets. These results indicate that SimSon effectively captures structural information from SMILES, exhibiting remarkable generalization and robustness. The potential applications of SimSon extend to bioinformatics and cheminformatics, encompassing areas such as drug discovery and drug-drug interaction prediction.

Availability and implementation: The source code is available at https://github.com/lee00206/SimSon.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信