分子性质预测的共振不变图表示。

IF 5.3 2区 化学 Q1 CHEMISTRY, MEDICINAL
Akshat Shirish Zalte, Hao-Wei Pang, Anna C Doner, William H Green
{"title":"分子性质预测的共振不变图表示。","authors":"Akshat Shirish Zalte, Hao-Wei Pang, Anna C Doner, William H Green","doi":"10.1021/acs.jcim.5c00495","DOIUrl":null,"url":null,"abstract":"<p><p>Many successful machine learning models for molecular property prediction rely on Lewis structure representations, commonly encoded as SMILES strings. However, a key limitation arises with molecules exhibiting resonance, where multiple valid Lewis structures represent the same species. This causes inconsistent predictions for the same molecule based on the chosen resonance form in common property prediction frameworks such as Chemprop, which implements a directed message-passing neural network (D-MPNN) architecture on the input molecular graph. To address this issue of resonance variance, we introduce the resonance-invariant graph representation (RIGR) of molecules that ensures, by construction, that all resonance structures are mapped to a single representation, eliminating the need to choose from or generate multiple resonance structures. Implemented with the D-MPNN architecture, RIGR is evaluated on a large data set with resonance-exhibiting radicals and closed-shell molecules, comparing it against the Chemprop featurizer. Using 60% fewer features, RIGR demonstrates comparable or superior prediction performance. Alternative approaches, such as data augmentation with resonance forms, are assessed, and their limitations are explored. Available open-source as an optional featurization scheme in Chemprop, RIGR is benchmarked across a wide range of property prediction tasks, showcasing its potential as a general graph featurizer beyond resonance handling.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.3000,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"RIGR: Resonance-Invariant Graph Representation for Molecular Property Prediction.\",\"authors\":\"Akshat Shirish Zalte, Hao-Wei Pang, Anna C Doner, William H Green\",\"doi\":\"10.1021/acs.jcim.5c00495\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Many successful machine learning models for molecular property prediction rely on Lewis structure representations, commonly encoded as SMILES strings. However, a key limitation arises with molecules exhibiting resonance, where multiple valid Lewis structures represent the same species. This causes inconsistent predictions for the same molecule based on the chosen resonance form in common property prediction frameworks such as Chemprop, which implements a directed message-passing neural network (D-MPNN) architecture on the input molecular graph. To address this issue of resonance variance, we introduce the resonance-invariant graph representation (RIGR) of molecules that ensures, by construction, that all resonance structures are mapped to a single representation, eliminating the need to choose from or generate multiple resonance structures. Implemented with the D-MPNN architecture, RIGR is evaluated on a large data set with resonance-exhibiting radicals and closed-shell molecules, comparing it against the Chemprop featurizer. Using 60% fewer features, RIGR demonstrates comparable or superior prediction performance. Alternative approaches, such as data augmentation with resonance forms, are assessed, and their limitations are explored. Available open-source as an optional featurization scheme in Chemprop, RIGR is benchmarked across a wide range of property prediction tasks, showcasing its potential as a general graph featurizer beyond resonance handling.</p>\",\"PeriodicalId\":44,\"journal\":{\"name\":\"Journal of Chemical Information and Modeling \",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-10-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Chemical Information and Modeling \",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://doi.org/10.1021/acs.jcim.5c00495\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MEDICINAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jcim.5c00495","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 0

摘要

许多成功的分子特性预测机器学习模型依赖于Lewis结构表示,通常编码为SMILES字符串。然而,一个关键的限制出现在分子表现共振,其中多个有效的路易斯结构代表相同的物种。这将导致基于常用属性预测框架(如Chemprop)中所选择的共振形式对同一分子的预测不一致。Chemprop在输入分子图上实现了定向消息传递神经网络(D-MPNN)架构。为了解决共振方差的问题,我们引入了分子的共振不变图表示(RIGR),通过结构确保所有共振结构都映射到单个表示,从而消除了从多个共振结构中进行选择或生成多个共振结构的需要。通过D-MPNN架构实现,RIGR在具有共振自由基和闭壳分子的大型数据集上进行评估,并将其与Chemprop特性进行比较。使用少于60%的特征,RIGR显示出相当或更好的预测性能。其他方法,如数据增强与共振形式,进行了评估,并探讨了其局限性。在Chemprop中,RIGR作为一个可选的特性方案是开源的,它在广泛的属性预测任务中进行了基准测试,展示了它作为一种超越共振处理的通用图形特性器的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

RIGR: Resonance-Invariant Graph Representation for Molecular Property Prediction.

RIGR: Resonance-Invariant Graph Representation for Molecular Property Prediction.

Many successful machine learning models for molecular property prediction rely on Lewis structure representations, commonly encoded as SMILES strings. However, a key limitation arises with molecules exhibiting resonance, where multiple valid Lewis structures represent the same species. This causes inconsistent predictions for the same molecule based on the chosen resonance form in common property prediction frameworks such as Chemprop, which implements a directed message-passing neural network (D-MPNN) architecture on the input molecular graph. To address this issue of resonance variance, we introduce the resonance-invariant graph representation (RIGR) of molecules that ensures, by construction, that all resonance structures are mapped to a single representation, eliminating the need to choose from or generate multiple resonance structures. Implemented with the D-MPNN architecture, RIGR is evaluated on a large data set with resonance-exhibiting radicals and closed-shell molecules, comparing it against the Chemprop featurizer. Using 60% fewer features, RIGR demonstrates comparable or superior prediction performance. Alternative approaches, such as data augmentation with resonance forms, are assessed, and their limitations are explored. Available open-source as an optional featurization scheme in Chemprop, RIGR is benchmarked across a wide range of property prediction tasks, showcasing its potential as a general graph featurizer beyond resonance handling.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
9.80
自引率
10.70%
发文量
529
审稿时长
1.4 months
期刊介绍: The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery. Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field. As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信