Towards Generalizable In Silico Predictions of Differential Ion Mobility Using Machine Learning and Customized Fingerprint Engineering

IF 6.7 1区 化学 Q1 CHEMISTRY, ANALYTICAL
Cailum M. K. Stienstra, Christopher R. M. Ryan, Daniel Demczuk, Justine R. Bissonnette, Anish Arjuna, J. Larry Campbell, W. Scott Hopkins
{"title":"Towards Generalizable In Silico Predictions of Differential Ion Mobility Using Machine Learning and Customized Fingerprint Engineering","authors":"Cailum M. K. Stienstra, Christopher R. M. Ryan, Daniel Demczuk, Justine R. Bissonnette, Anish Arjuna, J. Larry Campbell, W. Scott Hopkins","doi":"10.1021/acs.analchem.5c00737","DOIUrl":null,"url":null,"abstract":"Differential mobility spectrometry (DMS), a tool for separating chemically similar species (including isomers), is readily coupled to mass spectrometry to improve selectivity in analytical workflows. DMS dispersion curves, which describe the dynamic mobility experienced by an ion in a gaseous environment, show the maximum ion transmission for an analyte through the DMS instrument as a function of the separation voltage (SV) and compensation voltage (CV) conditions. To date, there exists no fast, general prediction tool for the dispersion behavior of ions. Here, we demonstrate a machine learning (ML) model that achieves generalized dispersion prediction using an <i>in silico</i> feature addition pipeline. We employ a data set containing 1141 dispersion curve measurements of anions and cations recorded in pure N<sub>2</sub> environments and in N<sub>2</sub> environments doped with 1.5% methanol (MeOH). Our feature addition pipeline can compute 1591 RDKit and Mordred descriptors using only SMILES codes, which are then normalized to sampled molecular distributions (<i>n</i> = 100 000) using cumulative density functions (CDFs). This tool can be thought of as a “learned” feature fingerprint generation pipeline, which could be applied to almost any molecular (bio)cheminformatics tasks. Our best performing model, which for the first time considers solvent-modified environments, has a mean absolute error (MAE) of 2.1 ± 0.2 V for dispersion curve prediction, a significant improvement over the previous state-of-the-art work. We use explainability techniques (<i>e.g.</i>, SHAP analysis) to show that this feature addition pipeline is a semideterministic process for feature sets, and we discuss “best practices” to understand feature sets and maximize model performance. We expect that this tool could be used for prescreening to accelerate or even automate the use of DMS in complex analytical workflows (<i>e.g.</i>, 2D LC×DMS separation) and perform automated identification of transmission windows and increase the “self-driving” potential of the instrument. We make our models available as a free and accessible tool at https://github.com/HopkinsLaboratory/DispersionCurveGUI.","PeriodicalId":27,"journal":{"name":"Analytical Chemistry","volume":"32 1","pages":""},"PeriodicalIF":6.7000,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Analytical Chemistry","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.analchem.5c00737","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Differential mobility spectrometry (DMS), a tool for separating chemically similar species (including isomers), is readily coupled to mass spectrometry to improve selectivity in analytical workflows. DMS dispersion curves, which describe the dynamic mobility experienced by an ion in a gaseous environment, show the maximum ion transmission for an analyte through the DMS instrument as a function of the separation voltage (SV) and compensation voltage (CV) conditions. To date, there exists no fast, general prediction tool for the dispersion behavior of ions. Here, we demonstrate a machine learning (ML) model that achieves generalized dispersion prediction using an in silico feature addition pipeline. We employ a data set containing 1141 dispersion curve measurements of anions and cations recorded in pure N2 environments and in N2 environments doped with 1.5% methanol (MeOH). Our feature addition pipeline can compute 1591 RDKit and Mordred descriptors using only SMILES codes, which are then normalized to sampled molecular distributions (n = 100 000) using cumulative density functions (CDFs). This tool can be thought of as a “learned” feature fingerprint generation pipeline, which could be applied to almost any molecular (bio)cheminformatics tasks. Our best performing model, which for the first time considers solvent-modified environments, has a mean absolute error (MAE) of 2.1 ± 0.2 V for dispersion curve prediction, a significant improvement over the previous state-of-the-art work. We use explainability techniques (e.g., SHAP analysis) to show that this feature addition pipeline is a semideterministic process for feature sets, and we discuss “best practices” to understand feature sets and maximize model performance. We expect that this tool could be used for prescreening to accelerate or even automate the use of DMS in complex analytical workflows (e.g., 2D LC×DMS separation) and perform automated identification of transmission windows and increase the “self-driving” potential of the instrument. We make our models available as a free and accessible tool at https://github.com/HopkinsLaboratory/DispersionCurveGUI.

Abstract Image

利用机器学习和定制指纹工程实现差分离子迁移率的可推广的硅预测
差分迁移率光谱法(DMS)是一种分离化学上相似物质(包括异构体)的工具,它很容易与质谱法相结合,以提高分析工作流程中的选择性。DMS色散曲线描述了离子在气体环境中所经历的动态迁移率,它显示了分析物通过DMS仪器的最大离子透射率是分离电压(SV)和补偿电压(CV)条件的函数。到目前为止,还没有快速、通用的预测离子色散行为的工具。在这里,我们展示了一个机器学习(ML)模型,该模型使用硅特征加法管道实现广义色散预测。我们使用了包含1141个阴离子和阳离子在纯N2环境和掺杂1.5%甲醇(MeOH)的N2环境下的色散曲线测量数据集。我们的特征添加管道可以仅使用SMILES代码计算1591个RDKit和Mordred描述符,然后使用累积密度函数(CDFs)将其归一化为采样分子分布(n = 100 000)。这个工具可以被认为是一个“学习”特征指纹生成管道,它可以应用于几乎任何分子(生物)化学信息学任务。我们表现最好的模型首次考虑了溶剂修饰的环境,其色散曲线预测的平均绝对误差(MAE)为2.1±0.2 V,比以前的最先进的工作有了显著的改进。我们使用可解释性技术(例如,SHAP分析)来表明这种特征添加管道是特征集的半确定性过程,并且我们讨论了“最佳实践”来理解特征集并最大化模型性能。我们期望该工具可用于预筛选,以加速甚至自动化DMS在复杂分析工作流程中的使用(例如,2D LC×DMS分离),并执行传输窗口的自动识别,并增加仪器的“自动驾驶”潜力。我们将我们的模型作为免费和可访问的工具在https://github.com/HopkinsLaboratory/DispersionCurveGUI上提供。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Analytical Chemistry
Analytical Chemistry 化学-分析化学
CiteScore
12.10
自引率
12.20%
发文量
1949
审稿时长
1.4 months
期刊介绍: Analytical Chemistry, a peer-reviewed research journal, focuses on disseminating new and original knowledge across all branches of analytical chemistry. Fundamental articles may explore general principles of chemical measurement science and need not directly address existing or potential analytical methodology. They can be entirely theoretical or report experimental results. Contributions may cover various phases of analytical operations, including sampling, bioanalysis, electrochemistry, mass spectrometry, microscale and nanoscale systems, environmental analysis, separations, spectroscopy, chemical reactions and selectivity, instrumentation, imaging, surface analysis, and data processing. Papers discussing known analytical methods should present a significant, original application of the method, a notable improvement, or results on an important analyte.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信