Carlos A Padilla, Luis M Díaz-Sánchez, Cristian Blanco-Tirado, Aldo F Combariza, Marianny Y Combariza
{"title":"人工智能引导的 MALDI 基质设计:探索用于低分子量化合物质谱分析的电子转移化学空间","authors":"Carlos A Padilla, Luis M Díaz-Sánchez, Cristian Blanco-Tirado, Aldo F Combariza, Marianny Y Combariza","doi":"10.1021/jasms.4c00186","DOIUrl":null,"url":null,"abstract":"<p><p>The development of matrices for Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry (MALDI MS) has traditionally relied on experimental efforts. Here, we propose a Goal-Directed artificial intelligence generative model, fueled by computational chemistry calculated data, to construct a chemical space optimized for Electron Transfer (ET) processes in MALDI analysis. We utilized a group of 30 reported ET matrices, subjected to structural enumeration and molecular properties prediction using semiempirical and <i>ab initio</i> calculations, to establish a comprehensive database comprising diverse structural and property data. Subsequently, employing a protocol of structural enumeration with 68 canonical SMILES of Bemis-Murcko (BM) fragments, we expanded the structural complexity of the initial library. This process generated 82753 compounds organized into 10 scaffold levels, with a p50 index from the Cyclic System Retrieval (CSR) curve of scaffolds of 50%. From the resulting enumerated library, a diverse subset of structures was selected by using the Jarvis-Patrick clustering method. These structures, along with their associated properties measured from quantum mechanics and experimental data, were used to train a Machine Learning (ML) model to predict ionization energy (<i>E</i><sub><i>i</i></sub>) values. Subsequently, a Scoring Neural Network (SNN), coupled with our Goal-Directed generative model using a Recurrent Neural Network (RNN) with Deep Learning (DL) architectures, was trained. The generative model was guided using a prior network within a Reinforcement/Transfer Learning environment. The final AI-generative model learned that structures with high unsaturation, H/C ratios under 1, and molecular weights between 100 and 300 u are favorable for ET MALDI matrices, as well as those with few aromatic rings and zero aliphatic rings. Other molecular features were also favored. The resulting AI-generated library exhibits <i>E</i><sub><i>i</i></sub> values over 8.0 eV, akin to those of reported \"good\" ET MALDI matrices, indicating successful design with high synthesis accessibility scores. In conclusion, our generative model provided valuable insights into the molecular features ideal for ET MALDI compounds while generating a wide range of structurally diverse molecules within a similar molecular property space. The next critical step in this process is to synthesize a selection of these generated compounds for the experimental validation and further characterization.</p>","PeriodicalId":672,"journal":{"name":"Journal of the American Society for Mass Spectrometry","volume":null,"pages":null},"PeriodicalIF":3.1000,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AI-Guided Design of MALDI Matrices: Exploring the Electron Transfer Chemical Space for Mass Spectrometric Analysis of Low-Molecular-Weight Compounds.\",\"authors\":\"Carlos A Padilla, Luis M Díaz-Sánchez, Cristian Blanco-Tirado, Aldo F Combariza, Marianny Y Combariza\",\"doi\":\"10.1021/jasms.4c00186\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The development of matrices for Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry (MALDI MS) has traditionally relied on experimental efforts. Here, we propose a Goal-Directed artificial intelligence generative model, fueled by computational chemistry calculated data, to construct a chemical space optimized for Electron Transfer (ET) processes in MALDI analysis. We utilized a group of 30 reported ET matrices, subjected to structural enumeration and molecular properties prediction using semiempirical and <i>ab initio</i> calculations, to establish a comprehensive database comprising diverse structural and property data. Subsequently, employing a protocol of structural enumeration with 68 canonical SMILES of Bemis-Murcko (BM) fragments, we expanded the structural complexity of the initial library. This process generated 82753 compounds organized into 10 scaffold levels, with a p50 index from the Cyclic System Retrieval (CSR) curve of scaffolds of 50%. From the resulting enumerated library, a diverse subset of structures was selected by using the Jarvis-Patrick clustering method. These structures, along with their associated properties measured from quantum mechanics and experimental data, were used to train a Machine Learning (ML) model to predict ionization energy (<i>E</i><sub><i>i</i></sub>) values. Subsequently, a Scoring Neural Network (SNN), coupled with our Goal-Directed generative model using a Recurrent Neural Network (RNN) with Deep Learning (DL) architectures, was trained. The generative model was guided using a prior network within a Reinforcement/Transfer Learning environment. The final AI-generative model learned that structures with high unsaturation, H/C ratios under 1, and molecular weights between 100 and 300 u are favorable for ET MALDI matrices, as well as those with few aromatic rings and zero aliphatic rings. Other molecular features were also favored. The resulting AI-generated library exhibits <i>E</i><sub><i>i</i></sub> values over 8.0 eV, akin to those of reported \\\"good\\\" ET MALDI matrices, indicating successful design with high synthesis accessibility scores. In conclusion, our generative model provided valuable insights into the molecular features ideal for ET MALDI compounds while generating a wide range of structurally diverse molecules within a similar molecular property space. The next critical step in this process is to synthesize a selection of these generated compounds for the experimental validation and further characterization.</p>\",\"PeriodicalId\":672,\"journal\":{\"name\":\"Journal of the American Society for Mass Spectrometry\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2024-10-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the American Society for Mass Spectrometry\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://doi.org/10.1021/jasms.4c00186\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Society for Mass Spectrometry","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/jasms.4c00186","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
摘要
基质辅助激光解吸电离质谱法(MALDI MS)的基质开发历来依赖于实验工作。在此,我们提出了一种目标导向型人工智能生成模型,该模型以计算化学计算数据为基础,构建了一个优化的化学空间,用于 MALDI 分析中的电子转移(ET)过程。我们利用一组 30 个已报道的 ET 矩阵,通过半经验计算和 ab initio 计算进行结构列举和分子性质预测,建立了一个包含各种结构和性质数据的综合数据库。随后,我们利用 68 个 Bemis-Murcko(BM)片段的典型 SMILES 结构枚举协议,扩大了初始库的结构复杂性。这一过程产生了 82753 个化合物,分为 10 个支架级别,支架循环系统检索(CSR)曲线的 p50 指数为 50%。通过使用 Jarvis-Patrick 聚类方法,从由此产生的枚举式化合物库中筛选出不同的结构子集。这些结构及其通过量子力学和实验数据测得的相关特性被用于训练机器学习(ML)模型,以预测电离能(Ei)值。随后,我们使用具有深度学习(DL)架构的循环神经网络(RNN)训练了一个评分神经网络(SNN),并结合我们的目标导向生成模型。生成模型在强化/迁移学习环境中使用先验网络进行引导。最终的人工智能生成模型发现,不饱和度高、H/C 比值低于 1、分子量在 100 到 300 u 之间的结构,以及芳香环少、脂肪环为零的结构,对 ET MALDI 矩阵有利。其他分子特征也受到青睐。由此生成的人工智能库的 Ei 值超过 8.0 eV,与已报道的 "好 "ET MALDI 基质的 Ei 值相近,表明设计成功,合成可得性得分高。总之,我们的生成模型为 ET MALDI 理想化合物的分子特征提供了宝贵的见解,同时在相似的分子特性空间内生成了大量结构多样的分子。这一过程的下一个关键步骤是合成这些生成化合物中的一部分,以便进行实验验证和进一步表征。
AI-Guided Design of MALDI Matrices: Exploring the Electron Transfer Chemical Space for Mass Spectrometric Analysis of Low-Molecular-Weight Compounds.
The development of matrices for Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry (MALDI MS) has traditionally relied on experimental efforts. Here, we propose a Goal-Directed artificial intelligence generative model, fueled by computational chemistry calculated data, to construct a chemical space optimized for Electron Transfer (ET) processes in MALDI analysis. We utilized a group of 30 reported ET matrices, subjected to structural enumeration and molecular properties prediction using semiempirical and ab initio calculations, to establish a comprehensive database comprising diverse structural and property data. Subsequently, employing a protocol of structural enumeration with 68 canonical SMILES of Bemis-Murcko (BM) fragments, we expanded the structural complexity of the initial library. This process generated 82753 compounds organized into 10 scaffold levels, with a p50 index from the Cyclic System Retrieval (CSR) curve of scaffolds of 50%. From the resulting enumerated library, a diverse subset of structures was selected by using the Jarvis-Patrick clustering method. These structures, along with their associated properties measured from quantum mechanics and experimental data, were used to train a Machine Learning (ML) model to predict ionization energy (Ei) values. Subsequently, a Scoring Neural Network (SNN), coupled with our Goal-Directed generative model using a Recurrent Neural Network (RNN) with Deep Learning (DL) architectures, was trained. The generative model was guided using a prior network within a Reinforcement/Transfer Learning environment. The final AI-generative model learned that structures with high unsaturation, H/C ratios under 1, and molecular weights between 100 and 300 u are favorable for ET MALDI matrices, as well as those with few aromatic rings and zero aliphatic rings. Other molecular features were also favored. The resulting AI-generated library exhibits Ei values over 8.0 eV, akin to those of reported "good" ET MALDI matrices, indicating successful design with high synthesis accessibility scores. In conclusion, our generative model provided valuable insights into the molecular features ideal for ET MALDI compounds while generating a wide range of structurally diverse molecules within a similar molecular property space. The next critical step in this process is to synthesize a selection of these generated compounds for the experimental validation and further characterization.
期刊介绍:
The Journal of the American Society for Mass Spectrometry presents research papers covering all aspects of mass spectrometry, incorporating coverage of fields of scientific inquiry in which mass spectrometry can play a role.
Comprehensive in scope, the journal publishes papers on both fundamentals and applications of mass spectrometry. Fundamental subjects include instrumentation principles, design, and demonstration, structures and chemical properties of gas-phase ions, studies of thermodynamic properties, ion spectroscopy, chemical kinetics, mechanisms of ionization, theories of ion fragmentation, cluster ions, and potential energy surfaces. In addition to full papers, the journal offers Communications, Application Notes, and Accounts and Perspectives