Sangam Man Buddhacharya, Adam Ramsey, Stephen A. Ramsey, Elain Fu
{"title":"Machine Learning Applied to Electrochemical Data Processing for Improved Analyte Quantification in Complex Saliva","authors":"Sangam Man Buddhacharya, Adam Ramsey, Stephen A. Ramsey, Elain Fu","doi":"10.1002/elan.70048","DOIUrl":null,"url":null,"abstract":"<p>Biofluids that can be noninvasively and frequently collected, such as saliva, have great promise for real-time analyte monitoring at the point of care to inform on patient health. However, analyte quantification in these fluids can be challenging due to their complex composition, that can reduce the signal-to-noise ratio. In the context of electrochemical sensing in saliva, the complexity of saliva can result in signal interference through a high and variable background, such that accurate and reproducible analyte quantification is challenging. Simple analysis algorithms that focus on a single peak feature may work well for analyte quantification in well-defined buffer backgrounds but may not be ideal for analyte quantification in complex biofluids. Motivated by this, for the task of quantifying drug levels in saliva from electrochemical voltammogram measurements, we assessed the performance of five different types of regression models: k-Nearest Neighbors (KNN), Random Forest (RF), Support Vector Machine (SVM), Gaussian Process (GP), and linear multivariate. We trained and tested the models on hundreds of voltammograms spanning five different analyte concentrations of the antiseizure drug carbamazepine spiked into whole human saliva. For each regression model type, we performed feature selection from nine voltammogram features coupled with hyperparameter tuning, using a performance metric that combined coefficient of determination (<span></span><math></math> and average <span></span><math></math>.k For unbiased model assessment, we applied each model to test-set data, using metrics of <span></span><math></math> and <span></span><math></math>, and statistically compared model performance using permutation testing. Our analysis (i) identified one critical voltammogram feature associated with the analyte peak that was common across models, but that is not commonly used in voltammogram analysis; (ii) demonstrated that each model's performance was improved by adding between one and two additional voltammogram features; and (iii) indicated that both voltage-based and background current features can improve model accuracy. Test-set results showed that all models produced <span></span><math></math> values above 0.84, but KNN and RF yielded the lowest <span></span><math></math> (19%), significantly better than the linear model (26%). Finally, further model assessment on saliva data from the same individual but collected on a different day (without any additional model training) showed that KNN performed the best with excellent generalizability (<span></span><math></math> of 19%), while RF and the linear model showed substantially degraded performance (<span></span><math></math> values of 25% and 39%, respectively). Overall, our results indicate the high impact potential of machine-learning models to substantially improve accuracy for the quantification of drug levels in saliva over conventional linear regression models.</p>","PeriodicalId":162,"journal":{"name":"Electroanalysis","volume":"37 9","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Electroanalysis","FirstCategoryId":"92","ListUrlMain":"https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/10.1002/elan.70048","RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Biofluids that can be noninvasively and frequently collected, such as saliva, have great promise for real-time analyte monitoring at the point of care to inform on patient health. However, analyte quantification in these fluids can be challenging due to their complex composition, that can reduce the signal-to-noise ratio. In the context of electrochemical sensing in saliva, the complexity of saliva can result in signal interference through a high and variable background, such that accurate and reproducible analyte quantification is challenging. Simple analysis algorithms that focus on a single peak feature may work well for analyte quantification in well-defined buffer backgrounds but may not be ideal for analyte quantification in complex biofluids. Motivated by this, for the task of quantifying drug levels in saliva from electrochemical voltammogram measurements, we assessed the performance of five different types of regression models: k-Nearest Neighbors (KNN), Random Forest (RF), Support Vector Machine (SVM), Gaussian Process (GP), and linear multivariate. We trained and tested the models on hundreds of voltammograms spanning five different analyte concentrations of the antiseizure drug carbamazepine spiked into whole human saliva. For each regression model type, we performed feature selection from nine voltammogram features coupled with hyperparameter tuning, using a performance metric that combined coefficient of determination ( and average .k For unbiased model assessment, we applied each model to test-set data, using metrics of and , and statistically compared model performance using permutation testing. Our analysis (i) identified one critical voltammogram feature associated with the analyte peak that was common across models, but that is not commonly used in voltammogram analysis; (ii) demonstrated that each model's performance was improved by adding between one and two additional voltammogram features; and (iii) indicated that both voltage-based and background current features can improve model accuracy. Test-set results showed that all models produced values above 0.84, but KNN and RF yielded the lowest (19%), significantly better than the linear model (26%). Finally, further model assessment on saliva data from the same individual but collected on a different day (without any additional model training) showed that KNN performed the best with excellent generalizability ( of 19%), while RF and the linear model showed substantially degraded performance ( values of 25% and 39%, respectively). Overall, our results indicate the high impact potential of machine-learning models to substantially improve accuracy for the quantification of drug levels in saliva over conventional linear regression models.
期刊介绍:
Electroanalysis is an international, peer-reviewed journal covering all branches of electroanalytical chemistry, including both fundamental and application papers as well as reviews dealing with new electrochemical sensors and biosensors, nanobioelectronics devices, analytical voltammetry, potentiometry, new electrochemical detection schemes based on novel nanomaterials, fuel cells and biofuel cells, and important practical applications.
Serving as a vital communication link between the research labs and the field, Electroanalysis helps you to quickly adapt the latest innovations into practical clinical, environmental, food analysis, industrial and energy-related applications. Electroanalysis provides the most comprehensive coverage of the field and is the number one source for information on electroanalytical chemistry, electrochemical sensors and biosensors and fuel/biofuel cells.