Towards Generalizable In Silico Predictions of Differential Ion Mobility Using Machine Learning and Customized Fingerprint Engineering

IF 6.7 1区化学 Q1 CHEMISTRY, ANALYTICAL

Analytical Chemistry Pub Date : 2025-04-10 DOI:10.1021/acs.analchem.5c00737

Cailum M. K. Stienstra, Christopher R. M. Ryan, Daniel Demczuk, Justine R. Bissonnette, Anish Arjuna, J. Larry Campbell, W. Scott Hopkins

{"title":"Towards Generalizable In Silico Predictions of Differential Ion Mobility Using Machine Learning and Customized Fingerprint Engineering","authors":"Cailum M. K. Stienstra, Christopher R. M. Ryan, Daniel Demczuk, Justine R. Bissonnette, Anish Arjuna, J. Larry Campbell, W. Scott Hopkins","doi":"10.1021/acs.analchem.5c00737","DOIUrl":null,"url":null,"abstract":"Differential mobility spectrometry (DMS), a tool for separating chemically similar species (including isomers), is readily coupled to mass spectrometry to improve selectivity in analytical workflows. DMS dispersion curves, which describe the dynamic mobility experienced by an ion in a gaseous environment, show the maximum ion transmission for an analyte through the DMS instrument as a function of the separation voltage (SV) and compensation voltage (CV) conditions. To date, there exists no fast, general prediction tool for the dispersion behavior of ions. Here, we demonstrate a machine learning (ML) model that achieves generalized dispersion prediction using an in silico feature addition pipeline. We employ a data set containing 1141 dispersion curve measurements of anions and cations recorded in pure N2 environments and in N2 environments doped with 1.5% methanol (MeOH). Our feature addition pipeline can compute 1591 RDKit and Mordred descriptors using only SMILES codes, which are then normalized to sampled molecular distributions (n = 100 000) using cumulative density functions (CDFs). This tool can be thought of as a “learned” feature fingerprint generation pipeline, which could be applied to almost any molecular (bio)cheminformatics tasks. Our best performing model, which for the first time considers solvent-modified environments, has a mean absolute error (MAE) of 2.1 ± 0.2 V for dispersion curve prediction, a significant improvement over the previous state-of-the-art work. We use explainability techniques (e.g., SHAP analysis) to show that this feature addition pipeline is a semideterministic process for feature sets, and we discuss “best practices” to understand feature sets and maximize model performance. We expect that this tool could be used for prescreening to accelerate or even automate the use of DMS in complex analytical workflows (e.g., 2D LC×DMS separation) and perform automated identification of transmission windows and increase the “self-driving” potential of the instrument. We make our models available as a free and accessible tool at https://github.com/HopkinsLaboratory/DispersionCurveGUI.","PeriodicalId":27,"journal":{"name":"Analytical Chemistry","volume":"32 1","pages":""},"PeriodicalIF":6.7000,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Analytical Chemistry","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.analchem.5c00737","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Differential mobility spectrometry (DMS), a tool for separating chemically similar species (including isomers), is readily coupled to mass spectrometry to improve selectivity in analytical workflows. DMS dispersion curves, which describe the dynamic mobility experienced by an ion in a gaseous environment, show the maximum ion transmission for an analyte through the DMS instrument as a function of the separation voltage (SV) and compensation voltage (CV) conditions. To date, there exists no fast, general prediction tool for the dispersion behavior of ions. Here, we demonstrate a machine learning (ML) model that achieves generalized dispersion prediction using an in silico feature addition pipeline. We employ a data set containing 1141 dispersion curve measurements of anions and cations recorded in pure N₂ environments and in N₂ environments doped with 1.5% methanol (MeOH). Our feature addition pipeline can compute 1591 RDKit and Mordred descriptors using only SMILES codes, which are then normalized to sampled molecular distributions (n = 100 000) using cumulative density functions (CDFs). This tool can be thought of as a “learned” feature fingerprint generation pipeline, which could be applied to almost any molecular (bio)cheminformatics tasks. Our best performing model, which for the first time considers solvent-modified environments, has a mean absolute error (MAE) of 2.1 ± 0.2 V for dispersion curve prediction, a significant improvement over the previous state-of-the-art work. We use explainability techniques (e.g., SHAP analysis) to show that this feature addition pipeline is a semideterministic process for feature sets, and we discuss “best practices” to understand feature sets and maximize model performance. We expect that this tool could be used for prescreening to accelerate or even automate the use of DMS in complex analytical workflows (e.g., 2D LC×DMS separation) and perform automated identification of transmission windows and increase the “self-driving” potential of the instrument. We make our models available as a free and accessible tool at https://github.com/HopkinsLaboratory/DispersionCurveGUI.

Abstract Image

查看原文本刊更多论文

利用机器学习和定制指纹工程实现差分离子迁移率的可推广的硅预测

差分迁移率光谱法（DMS）是一种分离化学上相似物质（包括异构体）的工具，它很容易与质谱法相结合，以提高分析工作流程中的选择性。DMS色散曲线描述了离子在气体环境中所经历的动态迁移率，它显示了分析物通过DMS仪器的最大离子透射率是分离电压（SV）和补偿电压（CV）条件的函数。到目前为止，还没有快速、通用的预测离子色散行为的工具。在这里，我们展示了一个机器学习（ML）模型，该模型使用硅特征加法管道实现广义色散预测。我们使用了包含1141个阴离子和阳离子在纯N2环境和掺杂1.5%甲醇（MeOH）的N2环境下的色散曲线测量数据集。我们的特征添加管道可以仅使用SMILES代码计算1591个RDKit和Mordred描述符，然后使用累积密度函数（CDFs）将其归一化为采样分子分布（n = 100 000）。这个工具可以被认为是一个“学习”特征指纹生成管道，它可以应用于几乎任何分子（生物）化学信息学任务。我们表现最好的模型首次考虑了溶剂修饰的环境，其色散曲线预测的平均绝对误差（MAE）为2.1±0.2 V，比以前的最先进的工作有了显著的改进。我们使用可解释性技术（例如，SHAP分析）来表明这种特征添加管道是特征集的半确定性过程，并且我们讨论了“最佳实践”来理解特征集并最大化模型性能。我们期望该工具可用于预筛选，以加速甚至自动化DMS在复杂分析工作流程中的使用（例如，2D LC×DMS分离），并执行传输窗口的自动识别，并增加仪器的“自动驾驶”潜力。我们将我们的模型作为免费和可访问的工具在https://github.com/HopkinsLaboratory/DispersionCurveGUI上提供。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Analytical Chemistry 化学-分析化学

CiteScore

12.10

自引率

12.20%

发文量

1949

审稿时长

1.4 months

期刊介绍： Analytical Chemistry, a peer-reviewed research journal, focuses on disseminating new and original knowledge across all branches of analytical chemistry. Fundamental articles may explore general principles of chemical measurement science and need not directly address existing or potential analytical methodology. They can be entirely theoretical or report experimental results. Contributions may cover various phases of analytical operations, including sampling, bioanalysis, electrochemistry, mass spectrometry, microscale and nanoscale systems, environmental analysis, separations, spectroscopy, chemical reactions and selectivity, instrumentation, imaging, surface analysis, and data processing. Papers discussing known analytical methods should present a significant, original application of the method, a notable improvement, or results on an important analyte.