Examining structure-based surrogate selection for quantitative non-targeted analysis.

IF 3.8 2区 化学 Q1 BIOCHEMICAL RESEARCH METHODS
Nathaniel Charest, Shirley Pu, James P McCord, Antony J Williams, Jon R Sobus
{"title":"Examining structure-based surrogate selection for quantitative non-targeted analysis.","authors":"Nathaniel Charest, Shirley Pu, James P McCord, Antony J Williams, Jon R Sobus","doi":"10.1007/s00216-025-05919-8","DOIUrl":null,"url":null,"abstract":"<p><p>Quantitative non-targeted analysis (qNTA) is an important tool for characterizing emerging contaminants in environmental, biological, and product-based samples. While traditional non-targeted analysis (NTA) focuses on chemical identification, qNTA additionally produces chemical concentration estimates. These estimates can inform provisional risk-based decisions and prioritize targets for follow-up analysis. Many common qNTA and \"semi-quantitative\" approaches rely on surrogate chemicals for calibration and model predictions. Despite their importance, surrogates are often chosen based on a combination of intuition and/or availability rather than rational (i.e., structure-based) selection. The lack of rational selection limits the degree to which qNTA can be objectively, mathematically assessed and improved. In this work, we systematically assess the extent to which chemical structure should inform the selection of qNTA surrogates using a dataset from liquid chromatography high-resolution mass spectrometry (LC-HRMS) experiments. First, we calculate a chemical space embedding using available LC-HRMS training data (n=385 chemicals) and 2D molecular descriptors deemed important to electrospray ionization efficiency. Then, using data from EPA's Non-Targeted Analysis Collaborative Trial (ENTACT), we calculate the leverage of measured analytes (n=533 chemicals) within the embedded chemical space. Based on leverage calculations, we implement multiple structure-based surrogate selection strategies and compare those to random selection using qNTA metrics for accuracy, uncertainty, and reliability. Finally, we propose and examine the \"leveraged averaged representative distance\" (LARD) as a means to quantify the coverage of qNTA surrogates within a defined chemical space. Our results show that qNTA models can benefit from rational surrogate selection strategies. They further show that a large enough random surrogate sample can perform as well as a smaller, chemically informed surrogate sample. Researchers are advised to carefully consider these findings when selecting surrogates for future qNTA studies.</p>","PeriodicalId":462,"journal":{"name":"Analytical and Bioanalytical Chemistry","volume":" ","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Analytical and Bioanalytical Chemistry","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1007/s00216-025-05919-8","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Quantitative non-targeted analysis (qNTA) is an important tool for characterizing emerging contaminants in environmental, biological, and product-based samples. While traditional non-targeted analysis (NTA) focuses on chemical identification, qNTA additionally produces chemical concentration estimates. These estimates can inform provisional risk-based decisions and prioritize targets for follow-up analysis. Many common qNTA and "semi-quantitative" approaches rely on surrogate chemicals for calibration and model predictions. Despite their importance, surrogates are often chosen based on a combination of intuition and/or availability rather than rational (i.e., structure-based) selection. The lack of rational selection limits the degree to which qNTA can be objectively, mathematically assessed and improved. In this work, we systematically assess the extent to which chemical structure should inform the selection of qNTA surrogates using a dataset from liquid chromatography high-resolution mass spectrometry (LC-HRMS) experiments. First, we calculate a chemical space embedding using available LC-HRMS training data (n=385 chemicals) and 2D molecular descriptors deemed important to electrospray ionization efficiency. Then, using data from EPA's Non-Targeted Analysis Collaborative Trial (ENTACT), we calculate the leverage of measured analytes (n=533 chemicals) within the embedded chemical space. Based on leverage calculations, we implement multiple structure-based surrogate selection strategies and compare those to random selection using qNTA metrics for accuracy, uncertainty, and reliability. Finally, we propose and examine the "leveraged averaged representative distance" (LARD) as a means to quantify the coverage of qNTA surrogates within a defined chemical space. Our results show that qNTA models can benefit from rational surrogate selection strategies. They further show that a large enough random surrogate sample can perform as well as a smaller, chemically informed surrogate sample. Researchers are advised to carefully consider these findings when selecting surrogates for future qNTA studies.

检验基于结构的替代选择用于定量非靶向分析。
定量非目标分析(qNTA)是表征环境、生物和产品样品中新出现的污染物的重要工具。传统的非靶向分析(NTA)侧重于化学鉴定,而qNTA还可以产生化学浓度估计。这些估计可以为基于风险的临时决策提供信息,并为后续分析确定目标的优先顺序。许多常见的qNTA和“半定量”方法依赖于替代化学物质进行校准和模型预测。尽管替代品很重要,但它们的选择往往是基于直觉和/或可用性的结合,而不是理性(即基于结构的)选择。缺乏理性选择限制了qNTA能够客观、数学地评估和改进的程度。在这项工作中,我们使用液相色谱-高分辨率质谱(LC-HRMS)实验数据集系统地评估了化学结构应该在多大程度上影响qNTA替代品的选择。首先,我们使用可用的LC-HRMS训练数据(n=385种化学物质)和被认为对电喷雾电离效率很重要的二维分子描述符计算化学空间嵌入。然后,使用来自EPA非目标分析协作试验(ENTACT)的数据,我们计算了嵌入化学空间内测量的分析物(n=533种化学物质)的杠杆作用。基于杠杆计算,我们实现了多个基于结构的代理选择策略,并将其与使用qNTA度量的随机选择进行了准确性、不确定性和可靠性的比较。最后,我们提出并检验了“杠杆平均代表距离”(LARD)作为量化qNTA替代品在特定化学空间内覆盖范围的一种手段。我们的研究结果表明,qNTA模型可以从合理的代理选择策略中获益。他们进一步表明,一个足够大的随机替代样本可以表现得和一个较小的、化学信息丰富的替代样本一样好。建议研究人员在为未来的qNTA研究选择替代品时仔细考虑这些发现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
8.00
自引率
4.70%
发文量
638
审稿时长
2.1 months
期刊介绍: Analytical and Bioanalytical Chemistry’s mission is the rapid publication of excellent and high-impact research articles on fundamental and applied topics of analytical and bioanalytical measurement science. Its scope is broad, and ranges from novel measurement platforms and their characterization to multidisciplinary approaches that effectively address important scientific problems. The Editors encourage submissions presenting innovative analytical research in concept, instrumentation, methods, and/or applications, including: mass spectrometry, spectroscopy, and electroanalysis; advanced separations; analytical strategies in “-omics” and imaging, bioanalysis, and sampling; miniaturized devices, medical diagnostics, sensors; analytical characterization of nano- and biomaterials; chemometrics and advanced data analysis.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信