Moment of inertia as a simple shape descriptor for diffusion-based shape-constrained molecular generation

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY
Denis Sapegin, Fedor Bakharev, Dmitriy Krupenya, Azamat Gafurov, Konstantin Pildish and Joseph C. Bear
{"title":"Moment of inertia as a simple shape descriptor for diffusion-based shape-constrained molecular generation","authors":"Denis Sapegin, Fedor Bakharev, Dmitriy Krupenya, Azamat Gafurov, Konstantin Pildish and Joseph C. Bear","doi":"10.1039/D5DD00318K","DOIUrl":null,"url":null,"abstract":"<p >The article introduces <em>MLConformerGenerator</em>, a machine-learning framework for shape-constrained molecular generation that combines an Equivariant Diffusion Model (EDM), guided by a compact shape descriptor based on the principal components of the moment of inertia tensor, and a Graph Convolutional Network (GCN) model for bond prediction. The compact yet informative descriptor provides concise representation of molecular shape, enabling scalable learning from large datasets and synthetic conformers generated from 2D molecular inputs. The use of a GCN for bond prediction is evaluated in comparison to deterministic methods. The suggested approach provides an ability to fine-tune the model to generate datasets with chemical-feature distributions closely matching those of target datasets of real conformers. The proposed model supports generation conditioned on both explicit conformers and arbitrary shapes, offering flexibility for applications such as dataset augmentation and structure-based molecule design. Trained on over 1.6 million molecules, the model demonstrates the ability to generate chemically valid, structurally diverse molecules that conform to target shape constraints. It achieves an average shape similarity of 0.53 to a reference conformer, with peak similarity exceeding 0.9 – a performance comparable to that of analogous models relying on more complex descriptors. The results show that integrating physically grounded descriptors with modern generative architectures provides a robust and effective strategy for shape-constrained molecular design.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 10","pages":" 2927-2941"},"PeriodicalIF":6.2000,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00318k?page=search","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital discovery","FirstCategoryId":"1085","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2025/dd/d5dd00318k","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

The article introduces MLConformerGenerator, a machine-learning framework for shape-constrained molecular generation that combines an Equivariant Diffusion Model (EDM), guided by a compact shape descriptor based on the principal components of the moment of inertia tensor, and a Graph Convolutional Network (GCN) model for bond prediction. The compact yet informative descriptor provides concise representation of molecular shape, enabling scalable learning from large datasets and synthetic conformers generated from 2D molecular inputs. The use of a GCN for bond prediction is evaluated in comparison to deterministic methods. The suggested approach provides an ability to fine-tune the model to generate datasets with chemical-feature distributions closely matching those of target datasets of real conformers. The proposed model supports generation conditioned on both explicit conformers and arbitrary shapes, offering flexibility for applications such as dataset augmentation and structure-based molecule design. Trained on over 1.6 million molecules, the model demonstrates the ability to generate chemically valid, structurally diverse molecules that conform to target shape constraints. It achieves an average shape similarity of 0.53 to a reference conformer, with peak similarity exceeding 0.9 – a performance comparable to that of analogous models relying on more complex descriptors. The results show that integrating physically grounded descriptors with modern generative architectures provides a robust and effective strategy for shape-constrained molecular design.

Abstract Image

惯性矩作为基于扩散的形状约束分子生成的简单形状描述符
本文介绍了MLConformerGenerator,这是一个用于形状约束分子生成的机器学习框架,它结合了由基于惯性量张量主成分的紧凑形状描述符指导的等变扩散模型(EDM)和用于键预测的图卷积网络(GCN)模型。紧凑但信息丰富的描述符提供了分子形状的简明表示,能够从大型数据集和从2D分子输入生成的合成构象中进行可扩展的学习。与确定性方法相比,评估了GCN用于键预测的使用。建议的方法提供了一种微调模型的能力,以生成具有与真实构象的目标数据集密切匹配的化学特征分布的数据集。所提出的模型支持以显式构象和任意形状为条件的生成,为数据集增强和基于结构的分子设计等应用提供了灵活性。经过超过160万个分子的训练,该模型展示了生成符合目标形状约束的化学有效、结构多样的分子的能力。它实现了与参考整形器的平均形状相似度为0.53,峰值相似度超过0.9 -与依赖更复杂描述符的类似模型的性能相当。结果表明,将物理基础描述符与现代生成架构相结合,为形状受限的分子设计提供了一种稳健有效的策略。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
2.80
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信