NMRMind: A Transformer-Based Model Enabling the Elucidation from Multidimensional NMR to Structures

IF 6.7 1区 化学 Q1 CHEMISTRY, ANALYTICAL
Xi Xue, Hanyu Sun, Jingying Sun, Luc Patiny, Xiangying Liu, Kai Chen, Jingjie Yan, Liangning Li, Xue Liu, Shu Xu, Dongming Zhang, Yafeng Deng, Yingda Zang, Yaling Gong, Jie Ma, Xiaojian Wang
{"title":"NMRMind: A Transformer-Based Model Enabling the Elucidation from Multidimensional NMR to Structures","authors":"Xi Xue, Hanyu Sun, Jingying Sun, Luc Patiny, Xiangying Liu, Kai Chen, Jingjie Yan, Liangning Li, Xue Liu, Shu Xu, Dongming Zhang, Yafeng Deng, Yingda Zang, Yaling Gong, Jie Ma, Xiaojian Wang","doi":"10.1021/acs.analchem.5c03783","DOIUrl":null,"url":null,"abstract":"Nuclear magnetic resonance (NMR) data provides rich quantum information on molecular structure, which is closely related to chemical structure and widely used for structural characterization in chemical discovery. Despite substantial advances in spectral analysis techniques, few existing models have demonstrated satisfactory performance in accurate NMR interpretation. Herein, we introduce NMRMind, a Transformer-based generative framework that directly elucidates molecular structures from NMR spectral data. NMRMind was pretrained on a data set comprising 45 million 1D NMR spectra and subsequently fine-tuned on a self-curated benchmark consisting of 2.2 million 1D and 2D NMR spectra. Using a mixed-modality dropout strategy during training, NMRMind achieved excellent performance, attaining a Top-1 accuracy of 92.07% across all input conditions on the structure elucidation task with a speed of &lt;0.05 s per elucidation. Additionally, NMRMind maintained a Top-1 accuracy of 85.10% when only one-dimensional and two-dimensional NMR data were used as input, without considering molecular formulas or fragments. Moreover, the application of NMRMind facilitated the discovery of six previously uncharacterized natural products from <i>Magnolia officinalis</i> and successfully elucidated the structures of six unexpected products resulting from synthetic reactions, thereby expanding the accessible chemical space and providing novel insights into chemical mechanisms. These results demonstrate that NMRMind is a powerful and generalizable platform for chemistry research.","PeriodicalId":27,"journal":{"name":"Analytical Chemistry","volume":"114 1","pages":""},"PeriodicalIF":6.7000,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Analytical Chemistry","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.analchem.5c03783","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Nuclear magnetic resonance (NMR) data provides rich quantum information on molecular structure, which is closely related to chemical structure and widely used for structural characterization in chemical discovery. Despite substantial advances in spectral analysis techniques, few existing models have demonstrated satisfactory performance in accurate NMR interpretation. Herein, we introduce NMRMind, a Transformer-based generative framework that directly elucidates molecular structures from NMR spectral data. NMRMind was pretrained on a data set comprising 45 million 1D NMR spectra and subsequently fine-tuned on a self-curated benchmark consisting of 2.2 million 1D and 2D NMR spectra. Using a mixed-modality dropout strategy during training, NMRMind achieved excellent performance, attaining a Top-1 accuracy of 92.07% across all input conditions on the structure elucidation task with a speed of <0.05 s per elucidation. Additionally, NMRMind maintained a Top-1 accuracy of 85.10% when only one-dimensional and two-dimensional NMR data were used as input, without considering molecular formulas or fragments. Moreover, the application of NMRMind facilitated the discovery of six previously uncharacterized natural products from Magnolia officinalis and successfully elucidated the structures of six unexpected products resulting from synthetic reactions, thereby expanding the accessible chemical space and providing novel insights into chemical mechanisms. These results demonstrate that NMRMind is a powerful and generalizable platform for chemistry research.

Abstract Image

核磁共振思维:一个基于变压器的模型,实现了从多维核磁共振到结构的解析
核磁共振(NMR)数据提供了丰富的分子结构量子信息,与化学结构密切相关,广泛用于化学发现中的结构表征。尽管光谱分析技术取得了长足的进步,但很少有现有的模型在精确的核磁共振解释中表现出令人满意的性能。在这里,我们介绍了NMRMind,一个基于变压器的生成框架,可以直接从核磁共振光谱数据中阐明分子结构。NMRMind在包含4500万张1D NMR波谱的数据集上进行了预训练,随后在包含220万张1D和2D NMR波谱的自策划基准上进行了微调。在训练过程中使用混合模态退出策略,NMRMind在所有输入条件下都达到了92.07%的Top-1准确率,每个解析速度为0.05 s。此外,当仅使用一维和二维核磁共振数据作为输入,不考虑分子式或片段时,NMRMind保持了85.10%的Top-1准确率。此外,NMRMind的应用促进了厚朴六种以前未被表征的天然产物的发现,并成功阐明了合成反应产生的六种意想不到的产物的结构,从而扩大了可访问的化学空间,并为化学机制提供了新的见解。这些结果表明,NMRMind是一个强大的、可推广的化学研究平台。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Analytical Chemistry
Analytical Chemistry 化学-分析化学
CiteScore
12.10
自引率
12.20%
发文量
1949
审稿时长
1.4 months
期刊介绍: Analytical Chemistry, a peer-reviewed research journal, focuses on disseminating new and original knowledge across all branches of analytical chemistry. Fundamental articles may explore general principles of chemical measurement science and need not directly address existing or potential analytical methodology. They can be entirely theoretical or report experimental results. Contributions may cover various phases of analytical operations, including sampling, bioanalysis, electrochemistry, mass spectrometry, microscale and nanoscale systems, environmental analysis, separations, spectroscopy, chemical reactions and selectivity, instrumentation, imaging, surface analysis, and data processing. Papers discussing known analytical methods should present a significant, original application of the method, a notable improvement, or results on an important analyte.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信