Machine Learning Applied to Electrochemical Data Processing for Improved Analyte Quantification in Complex Saliva

IF 2.3 3区化学 Q2 CHEMISTRY, ANALYTICAL

Electroanalysis Pub Date : 2025-09-04 DOI:10.1002/elan.70048

Sangam Man Buddhacharya, Adam Ramsey, Stephen A. Ramsey, Elain Fu

{"title":"Machine Learning Applied to Electrochemical Data Processing for Improved Analyte Quantification in Complex Saliva","authors":"Sangam Man Buddhacharya, Adam Ramsey, Stephen A. Ramsey, Elain Fu","doi":"10.1002/elan.70048","DOIUrl":null,"url":null,"abstract":"Biofluids that can be noninvasively and frequently collected, such as saliva, have great promise for real-time analyte monitoring at the point of care to inform on patient health. However, analyte quantification in these fluids can be challenging due to their complex composition, that can reduce the signal-to-noise ratio. In the context of electrochemical sensing in saliva, the complexity of saliva can result in signal interference through a high and variable background, such that accurate and reproducible analyte quantification is challenging. Simple analysis algorithms that focus on a single peak feature may work well for analyte quantification in well-defined buffer backgrounds but may not be ideal for analyte quantification in complex biofluids. Motivated by this, for the task of quantifying drug levels in saliva from electrochemical voltammogram measurements, we assessed the performance of five different types of regression models: k-Nearest Neighbors (KNN), Random Forest (RF), Support Vector Machine (SVM), Gaussian Process (GP), and linear multivariate. We trained and tested the models on hundreds of voltammograms spanning five different analyte concentrations of the antiseizure drug carbamazepine spiked into whole human saliva. For each regression model type, we performed feature selection from nine voltammogram features coupled with hyperparameter tuning, using a performance metric that combined coefficient of determination (<math></math> and average <math></math>.k For unbiased model assessment, we applied each model to test-set data, using metrics of <math></math> and <math></math>, and statistically compared model performance using permutation testing. Our analysis (i) identified one critical voltammogram feature associated with the analyte peak that was common across models, but that is not commonly used in voltammogram analysis; (ii) demonstrated that each model's performance was improved by adding between one and two additional voltammogram features; and (iii) indicated that both voltage-based and background current features can improve model accuracy. Test-set results showed that all models produced <math></math> values above 0.84, but KNN and RF yielded the lowest <math></math> (19%), significantly better than the linear model (26%). Finally, further model assessment on saliva data from the same individual but collected on a different day (without any additional model training) showed that KNN performed the best with excellent generalizability (<math></math> of 19%), while RF and the linear model showed substantially degraded performance (<math></math> values of 25% and 39%, respectively). Overall, our results indicate the high impact potential of machine-learning models to substantially improve accuracy for the quantification of drug levels in saliva over conventional linear regression models.","PeriodicalId":162,"journal":{"name":"Electroanalysis","volume":"37 9","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Electroanalysis","FirstCategoryId":"92","ListUrlMain":"https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/10.1002/elan.70048","RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Biofluids that can be noninvasively and frequently collected, such as saliva, have great promise for real-time analyte monitoring at the point of care to inform on patient health. However, analyte quantification in these fluids can be challenging due to their complex composition, that can reduce the signal-to-noise ratio. In the context of electrochemical sensing in saliva, the complexity of saliva can result in signal interference through a high and variable background, such that accurate and reproducible analyte quantification is challenging. Simple analysis algorithms that focus on a single peak feature may work well for analyte quantification in well-defined buffer backgrounds but may not be ideal for analyte quantification in complex biofluids. Motivated by this, for the task of quantifying drug levels in saliva from electrochemical voltammogram measurements, we assessed the performance of five different types of regression models: k-Nearest Neighbors (KNN), Random Forest (RF), Support Vector Machine (SVM), Gaussian Process (GP), and linear multivariate. We trained and tested the models on hundreds of voltammograms spanning five different analyte concentrations of the antiseizure drug carbamazepine spiked into whole human saliva. For each regression model type, we performed feature selection from nine voltammogram features coupled with hyperparameter tuning, using a performance metric that combined coefficient of determination ( and average .k For unbiased model assessment, we applied each model to test-set data, using metrics of and , and statistically compared model performance using permutation testing. Our analysis (i) identified one critical voltammogram feature associated with the analyte peak that was common across models, but that is not commonly used in voltammogram analysis; (ii) demonstrated that each model's performance was improved by adding between one and two additional voltammogram features; and (iii) indicated that both voltage-based and background current features can improve model accuracy. Test-set results showed that all models produced values above 0.84, but KNN and RF yielded the lowest (19%), significantly better than the linear model (26%). Finally, further model assessment on saliva data from the same individual but collected on a different day (without any additional model training) showed that KNN performed the best with excellent generalizability ( of 19%), while RF and the linear model showed substantially degraded performance ( values of 25% and 39%, respectively). Overall, our results indicate the high impact potential of machine-learning models to substantially improve accuracy for the quantification of drug levels in saliva over conventional linear regression models.

Abstract Image

查看原文本刊更多论文

机器学习应用于电化学数据处理以改善复杂唾液中分析物的定量

无创且经常收集的生物液体，如唾液，在护理点进行实时分析物监测以告知患者健康状况方面具有很大的前景。然而，由于这些流体的成分复杂，分析物的定量可能会降低信噪比，因此具有挑战性。在唾液电化学传感的背景下，唾液的复杂性会导致高背景和可变背景的信号干扰，因此准确和可重复的分析物定量是具有挑战性的。专注于单峰特征的简单分析算法可能在定义良好的缓冲背景中很好地用于分析物定量，但可能不适合复杂生物流体中的分析物定量。为此，为了从电化学伏安图测量中定量唾液中的药物水平，我们评估了五种不同类型的回归模型的性能：k-近邻（KNN）、随机森林（RF）、支持向量机（SVM）、高斯过程（GP）和线性多元回归模型。我们对模型进行了训练和测试，测试了数百个伏安图，跨越五种不同浓度的抗癫痫药物卡马西平加入到整个人类唾液中。对于每种回归模型类型，我们使用结合决定系数（和平均值）的性能指标，从9个伏安特征中进行特征选择，并进行超参数调优。k对于无偏模型评估，我们使用和的度量将每个模型应用于测试集数据，并使用排列测试对模型性能进行统计比较。我们的分析(i)确定了一个与分析物峰相关的关键伏安特征，该特征在各个模型中都很常见，但在伏安分析中不常用；（ii）证明通过增加一到两个额外的伏安特征，每个模型的性能都得到了改善；（iii）表明基于电压和背景电流的特征都可以提高模型的精度。测试集结果显示，所有模型的值都在0.84以上，但KNN和RF的值最低（19%），显著优于线性模型（26%）。最后，对同一个体在不同日期收集的唾液数据进行进一步的模型评估（没有任何额外的模型训练）表明，KNN表现最好，具有出色的泛化性（19%），而RF和线性模型的性能明显下降（分别为25%和39%）。总体而言，我们的研究结果表明，机器学习模型具有很高的影响潜力，可以大大提高唾液中药物水平定量的准确性，而不是传统的线性回归模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Electroanalysis 化学-电化学

CiteScore

6.00

自引率

3.30%

发文量

222

审稿时长

2.4 months

期刊介绍： Electroanalysis is an international, peer-reviewed journal covering all branches of electroanalytical chemistry, including both fundamental and application papers as well as reviews dealing with new electrochemical sensors and biosensors, nanobioelectronics devices, analytical voltammetry, potentiometry, new electrochemical detection schemes based on novel nanomaterials, fuel cells and biofuel cells, and important practical applications. Serving as a vital communication link between the research labs and the field, Electroanalysis helps you to quickly adapt the latest innovations into practical clinical, environmental, food analysis, industrial and energy-related applications. Electroanalysis provides the most comprehensive coverage of the field and is the number one source for information on electroanalytical chemistry, electrochemical sensors and biosensors and fuel/biofuel cells.