基于上下文丰富训练的分子构象增强基准测试:基于图的变压器与GNN模型

IF 5.7 2区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY
Cecile Valsecchi, Jose A. Arjona-Medina, Natalia Dyubankova, Ramil Nugmanov
{"title":"基于上下文丰富训练的分子构象增强基准测试:基于图的变压器与GNN模型","authors":"Cecile Valsecchi,&nbsp;Jose A. Arjona-Medina,&nbsp;Natalia Dyubankova,&nbsp;Ramil Nugmanov","doi":"10.1186/s13321-025-01004-5","DOIUrl":null,"url":null,"abstract":"<div><p>The field of molecular representation has witnessed a shift towards models trained on molecular structures represented by strings or graphs, with chemical information encoded in nodes and bonds. Graph-based representations offer a more realistic depiction and support 3D geometry and conformer-based augmentation. Graph Neural Networks (GNNs) and Graph-based Transformer models (GTs) represent two paradigms in this field, with GT models emerging as a flexible alternative. In this study, we compare the performance of GT models against GNN models on three datasets. We explore the impact of training procedures, including context-enriched training through pretraining on quantum mechanical atomic-level properties and auxiliary task training. Our analysis focuses on sterimol parameters estimation, binding energy estimation, and generalization performance for transition metal complexes. We find that GT models with context-enriched training provide on par results compared to GNN models, with the added advantages of speed and flexibility. Our findings highlight the potential of GT models as a valid alternative for molecular representation learning tasks.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7000,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01004-5","citationCount":"0","resultStr":"{\"title\":\"Benchmarking molecular conformer augmentation with context-enriched training: graph-based transformer versus GNN models\",\"authors\":\"Cecile Valsecchi,&nbsp;Jose A. Arjona-Medina,&nbsp;Natalia Dyubankova,&nbsp;Ramil Nugmanov\",\"doi\":\"10.1186/s13321-025-01004-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The field of molecular representation has witnessed a shift towards models trained on molecular structures represented by strings or graphs, with chemical information encoded in nodes and bonds. Graph-based representations offer a more realistic depiction and support 3D geometry and conformer-based augmentation. Graph Neural Networks (GNNs) and Graph-based Transformer models (GTs) represent two paradigms in this field, with GT models emerging as a flexible alternative. In this study, we compare the performance of GT models against GNN models on three datasets. We explore the impact of training procedures, including context-enriched training through pretraining on quantum mechanical atomic-level properties and auxiliary task training. Our analysis focuses on sterimol parameters estimation, binding energy estimation, and generalization performance for transition metal complexes. We find that GT models with context-enriched training provide on par results compared to GNN models, with the added advantages of speed and flexibility. Our findings highlight the potential of GT models as a valid alternative for molecular representation learning tasks.</p></div>\",\"PeriodicalId\":617,\"journal\":{\"name\":\"Journal of Cheminformatics\",\"volume\":\"17 1\",\"pages\":\"\"},\"PeriodicalIF\":5.7000,\"publicationDate\":\"2025-05-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01004-5\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Cheminformatics\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://link.springer.com/article/10.1186/s13321-025-01004-5\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cheminformatics","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1186/s13321-025-01004-5","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

分子表示领域已经见证了一种转变,即转向训练由字符串或图形表示的分子结构模型,并在节点和键中编码化学信息。基于图形的表示提供了更真实的描述,并支持3D几何和基于一致性的增强。图神经网络(gnn)和基于图的变压器模型(GT)代表了该领域的两种范式,GT模型作为一种灵活的替代方案出现。在本研究中,我们比较了GT模型和GNN模型在三个数据集上的性能。我们探讨了训练过程的影响,包括通过预训练对量子力学原子级性质进行上下文丰富训练和辅助任务训练。我们的分析集中在sterimol参数估计,结合能估计和过渡金属配合物的泛化性能。我们发现,与GNN模型相比,具有上下文丰富训练的GT模型提供了相同的结果,并且具有速度和灵活性的优势。我们的发现突出了GT模型作为分子表征学习任务的有效替代方案的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Benchmarking molecular conformer augmentation with context-enriched training: graph-based transformer versus GNN models

The field of molecular representation has witnessed a shift towards models trained on molecular structures represented by strings or graphs, with chemical information encoded in nodes and bonds. Graph-based representations offer a more realistic depiction and support 3D geometry and conformer-based augmentation. Graph Neural Networks (GNNs) and Graph-based Transformer models (GTs) represent two paradigms in this field, with GT models emerging as a flexible alternative. In this study, we compare the performance of GT models against GNN models on three datasets. We explore the impact of training procedures, including context-enriched training through pretraining on quantum mechanical atomic-level properties and auxiliary task training. Our analysis focuses on sterimol parameters estimation, binding energy estimation, and generalization performance for transition metal complexes. We find that GT models with context-enriched training provide on par results compared to GNN models, with the added advantages of speed and flexibility. Our findings highlight the potential of GT models as a valid alternative for molecular representation learning tasks.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Cheminformatics
Journal of Cheminformatics CHEMISTRY, MULTIDISCIPLINARY-COMPUTER SCIENCE, INFORMATION SYSTEMS
CiteScore
14.10
自引率
7.00%
发文量
82
审稿时长
3 months
期刊介绍: Journal of Cheminformatics is an open access journal publishing original peer-reviewed research in all aspects of cheminformatics and molecular modelling. Coverage includes, but is not limited to: chemical information systems, software and databases, and molecular modelling, chemical structure representations and their use in structure, substructure, and similarity searching of chemical substance and chemical reaction databases, computer and molecular graphics, computer-aided molecular design, expert systems, QSAR, and data mining techniques.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信