Nathaniel Charest, Shirley Pu, James P McCord, Antony J Williams, Jon R Sobus
{"title":"检验基于结构的替代选择用于定量非靶向分析。","authors":"Nathaniel Charest, Shirley Pu, James P McCord, Antony J Williams, Jon R Sobus","doi":"10.1007/s00216-025-05919-8","DOIUrl":null,"url":null,"abstract":"<p><p>Quantitative non-targeted analysis (qNTA) is an important tool for characterizing emerging contaminants in environmental, biological, and product-based samples. While traditional non-targeted analysis (NTA) focuses on chemical identification, qNTA additionally produces chemical concentration estimates. These estimates can inform provisional risk-based decisions and prioritize targets for follow-up analysis. Many common qNTA and \"semi-quantitative\" approaches rely on surrogate chemicals for calibration and model predictions. Despite their importance, surrogates are often chosen based on a combination of intuition and/or availability rather than rational (i.e., structure-based) selection. The lack of rational selection limits the degree to which qNTA can be objectively, mathematically assessed and improved. In this work, we systematically assess the extent to which chemical structure should inform the selection of qNTA surrogates using a dataset from liquid chromatography high-resolution mass spectrometry (LC-HRMS) experiments. First, we calculate a chemical space embedding using available LC-HRMS training data (n=385 chemicals) and 2D molecular descriptors deemed important to electrospray ionization efficiency. Then, using data from EPA's Non-Targeted Analysis Collaborative Trial (ENTACT), we calculate the leverage of measured analytes (n=533 chemicals) within the embedded chemical space. Based on leverage calculations, we implement multiple structure-based surrogate selection strategies and compare those to random selection using qNTA metrics for accuracy, uncertainty, and reliability. Finally, we propose and examine the \"leveraged averaged representative distance\" (LARD) as a means to quantify the coverage of qNTA surrogates within a defined chemical space. Our results show that qNTA models can benefit from rational surrogate selection strategies. They further show that a large enough random surrogate sample can perform as well as a smaller, chemically informed surrogate sample. Researchers are advised to carefully consider these findings when selecting surrogates for future qNTA studies.</p>","PeriodicalId":462,"journal":{"name":"Analytical and Bioanalytical Chemistry","volume":" ","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Examining structure-based surrogate selection for quantitative non-targeted analysis.\",\"authors\":\"Nathaniel Charest, Shirley Pu, James P McCord, Antony J Williams, Jon R Sobus\",\"doi\":\"10.1007/s00216-025-05919-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Quantitative non-targeted analysis (qNTA) is an important tool for characterizing emerging contaminants in environmental, biological, and product-based samples. While traditional non-targeted analysis (NTA) focuses on chemical identification, qNTA additionally produces chemical concentration estimates. These estimates can inform provisional risk-based decisions and prioritize targets for follow-up analysis. Many common qNTA and \\\"semi-quantitative\\\" approaches rely on surrogate chemicals for calibration and model predictions. Despite their importance, surrogates are often chosen based on a combination of intuition and/or availability rather than rational (i.e., structure-based) selection. The lack of rational selection limits the degree to which qNTA can be objectively, mathematically assessed and improved. In this work, we systematically assess the extent to which chemical structure should inform the selection of qNTA surrogates using a dataset from liquid chromatography high-resolution mass spectrometry (LC-HRMS) experiments. First, we calculate a chemical space embedding using available LC-HRMS training data (n=385 chemicals) and 2D molecular descriptors deemed important to electrospray ionization efficiency. Then, using data from EPA's Non-Targeted Analysis Collaborative Trial (ENTACT), we calculate the leverage of measured analytes (n=533 chemicals) within the embedded chemical space. Based on leverage calculations, we implement multiple structure-based surrogate selection strategies and compare those to random selection using qNTA metrics for accuracy, uncertainty, and reliability. Finally, we propose and examine the \\\"leveraged averaged representative distance\\\" (LARD) as a means to quantify the coverage of qNTA surrogates within a defined chemical space. Our results show that qNTA models can benefit from rational surrogate selection strategies. They further show that a large enough random surrogate sample can perform as well as a smaller, chemically informed surrogate sample. Researchers are advised to carefully consider these findings when selecting surrogates for future qNTA studies.</p>\",\"PeriodicalId\":462,\"journal\":{\"name\":\"Analytical and Bioanalytical Chemistry\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2025-06-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Analytical and Bioanalytical Chemistry\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://doi.org/10.1007/s00216-025-05919-8\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Analytical and Bioanalytical Chemistry","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1007/s00216-025-05919-8","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
Examining structure-based surrogate selection for quantitative non-targeted analysis.
Quantitative non-targeted analysis (qNTA) is an important tool for characterizing emerging contaminants in environmental, biological, and product-based samples. While traditional non-targeted analysis (NTA) focuses on chemical identification, qNTA additionally produces chemical concentration estimates. These estimates can inform provisional risk-based decisions and prioritize targets for follow-up analysis. Many common qNTA and "semi-quantitative" approaches rely on surrogate chemicals for calibration and model predictions. Despite their importance, surrogates are often chosen based on a combination of intuition and/or availability rather than rational (i.e., structure-based) selection. The lack of rational selection limits the degree to which qNTA can be objectively, mathematically assessed and improved. In this work, we systematically assess the extent to which chemical structure should inform the selection of qNTA surrogates using a dataset from liquid chromatography high-resolution mass spectrometry (LC-HRMS) experiments. First, we calculate a chemical space embedding using available LC-HRMS training data (n=385 chemicals) and 2D molecular descriptors deemed important to electrospray ionization efficiency. Then, using data from EPA's Non-Targeted Analysis Collaborative Trial (ENTACT), we calculate the leverage of measured analytes (n=533 chemicals) within the embedded chemical space. Based on leverage calculations, we implement multiple structure-based surrogate selection strategies and compare those to random selection using qNTA metrics for accuracy, uncertainty, and reliability. Finally, we propose and examine the "leveraged averaged representative distance" (LARD) as a means to quantify the coverage of qNTA surrogates within a defined chemical space. Our results show that qNTA models can benefit from rational surrogate selection strategies. They further show that a large enough random surrogate sample can perform as well as a smaller, chemically informed surrogate sample. Researchers are advised to carefully consider these findings when selecting surrogates for future qNTA studies.
期刊介绍:
Analytical and Bioanalytical Chemistry’s mission is the rapid publication of excellent and high-impact research articles on fundamental and applied topics of analytical and bioanalytical measurement science. Its scope is broad, and ranges from novel measurement platforms and their characterization to multidisciplinary approaches that effectively address important scientific problems. The Editors encourage submissions presenting innovative analytical research in concept, instrumentation, methods, and/or applications, including: mass spectrometry, spectroscopy, and electroanalysis; advanced separations; analytical strategies in “-omics” and imaging, bioanalysis, and sampling; miniaturized devices, medical diagnostics, sensors; analytical characterization of nano- and biomaterials; chemometrics and advanced data analysis.