Xi Xue, Hanyu Sun, Jingying Sun, Luc Patiny, Xiangying Liu, Kai Chen, Jingjie Yan, Liangning Li, Xue Liu, Shu Xu, Dongming Zhang, Yafeng Deng, Yingda Zang, Yaling Gong, Jie Ma, Xiaojian Wang
{"title":"NMRMind: A Transformer-Based Model Enabling the Elucidation from Multidimensional NMR to Structures","authors":"Xi Xue, Hanyu Sun, Jingying Sun, Luc Patiny, Xiangying Liu, Kai Chen, Jingjie Yan, Liangning Li, Xue Liu, Shu Xu, Dongming Zhang, Yafeng Deng, Yingda Zang, Yaling Gong, Jie Ma, Xiaojian Wang","doi":"10.1021/acs.analchem.5c03783","DOIUrl":null,"url":null,"abstract":"Nuclear magnetic resonance (NMR) data provides rich quantum information on molecular structure, which is closely related to chemical structure and widely used for structural characterization in chemical discovery. Despite substantial advances in spectral analysis techniques, few existing models have demonstrated satisfactory performance in accurate NMR interpretation. Herein, we introduce NMRMind, a Transformer-based generative framework that directly elucidates molecular structures from NMR spectral data. NMRMind was pretrained on a data set comprising 45 million 1D NMR spectra and subsequently fine-tuned on a self-curated benchmark consisting of 2.2 million 1D and 2D NMR spectra. Using a mixed-modality dropout strategy during training, NMRMind achieved excellent performance, attaining a Top-1 accuracy of 92.07% across all input conditions on the structure elucidation task with a speed of <0.05 s per elucidation. Additionally, NMRMind maintained a Top-1 accuracy of 85.10% when only one-dimensional and two-dimensional NMR data were used as input, without considering molecular formulas or fragments. Moreover, the application of NMRMind facilitated the discovery of six previously uncharacterized natural products from <i>Magnolia officinalis</i> and successfully elucidated the structures of six unexpected products resulting from synthetic reactions, thereby expanding the accessible chemical space and providing novel insights into chemical mechanisms. These results demonstrate that NMRMind is a powerful and generalizable platform for chemistry research.","PeriodicalId":27,"journal":{"name":"Analytical Chemistry","volume":"114 1","pages":""},"PeriodicalIF":6.7000,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Analytical Chemistry","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.analchem.5c03783","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Nuclear magnetic resonance (NMR) data provides rich quantum information on molecular structure, which is closely related to chemical structure and widely used for structural characterization in chemical discovery. Despite substantial advances in spectral analysis techniques, few existing models have demonstrated satisfactory performance in accurate NMR interpretation. Herein, we introduce NMRMind, a Transformer-based generative framework that directly elucidates molecular structures from NMR spectral data. NMRMind was pretrained on a data set comprising 45 million 1D NMR spectra and subsequently fine-tuned on a self-curated benchmark consisting of 2.2 million 1D and 2D NMR spectra. Using a mixed-modality dropout strategy during training, NMRMind achieved excellent performance, attaining a Top-1 accuracy of 92.07% across all input conditions on the structure elucidation task with a speed of <0.05 s per elucidation. Additionally, NMRMind maintained a Top-1 accuracy of 85.10% when only one-dimensional and two-dimensional NMR data were used as input, without considering molecular formulas or fragments. Moreover, the application of NMRMind facilitated the discovery of six previously uncharacterized natural products from Magnolia officinalis and successfully elucidated the structures of six unexpected products resulting from synthetic reactions, thereby expanding the accessible chemical space and providing novel insights into chemical mechanisms. These results demonstrate that NMRMind is a powerful and generalizable platform for chemistry research.
期刊介绍:
Analytical Chemistry, a peer-reviewed research journal, focuses on disseminating new and original knowledge across all branches of analytical chemistry. Fundamental articles may explore general principles of chemical measurement science and need not directly address existing or potential analytical methodology. They can be entirely theoretical or report experimental results. Contributions may cover various phases of analytical operations, including sampling, bioanalysis, electrochemistry, mass spectrometry, microscale and nanoscale systems, environmental analysis, separations, spectroscopy, chemical reactions and selectivity, instrumentation, imaging, surface analysis, and data processing. Papers discussing known analytical methods should present a significant, original application of the method, a notable improvement, or results on an important analyte.