Automated post-run analysis of arrayed quantitative PCR amplification curves using machine learning.

Gates Open Research Pub Date : 2025-01-20 eCollection Date: 2025-01-01 DOI:10.12688/gatesopenres.16313.1

Ben J Brintz, Darwin J Operario, David Garrett Brown, Shanrui Wu, Lan Wang, Eric R Houpt, Daniel T Leung, Jie Liu, James A Platts-Mills

{"title":"Automated post-run analysis of arrayed quantitative PCR amplification curves using machine learning.","authors":"Ben J Brintz, Darwin J Operario, David Garrett Brown, Shanrui Wu, Lan Wang, Eric R Houpt, Daniel T Leung, Jie Liu, James A Platts-Mills","doi":"10.12688/gatesopenres.16313.1","DOIUrl":null,"url":null,"abstract":"Background: The TaqMan Array Card (TAC) is an arrayed, high-throughput qPCR platform that can simultaneously detect multiple targets in a single reaction. However, the manual post-run analysis of TAC data is time consuming and subject to interpretation. We sought to automate the post-run analysis of TAC data using machine learning models.Methods: We used 165,214 qPCR amplification curves from two studies to train and test two eXtreme Gradient Boosting (XGBoost) models. Previous manual analyses of the amplification curves by experts in qPCR analysis were used as the gold standard. First, a classification model predicted whether amplification occurred or not, and if so, a second model predicted the cycle threshold (Ct) value. We used 5-fold cross-validation to tune the models and assessed performance using accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and mean absolute error (MAE). For external validation, we used 1,472 reactions previously analyzed by 17 laboratory scientists as part of an external quality assessment for a multisite study.Results: In internal validation, the classification model achieved an accuracy of 0.996, sensitivity of 0.997, specificity of 0.993, PPV of 0.998, and NPV of 0.991. The Ct prediction model achieved a MAE of 0.590. In external validation, the automated analysis achieved an accuracy of 0.997 and a MAE of 0.611, and the automated analysis was more accurate than manual analyses by 14 of the 17 laboratory scientists.Conclusions: We automated the post-run analysis of highly-arrayed qPCR data using machine learning models with high accuracy in comparison to a manual gold standard. This approach has the potential to save time and improve reproducibility in laboratories using the TAC platform and other high-throughput qPCR approaches.","PeriodicalId":12593,"journal":{"name":"Gates Open Research","volume":"9 ","pages":"1"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11756513/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Gates Open Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.12688/gatesopenres.16313.1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background: The TaqMan Array Card (TAC) is an arrayed, high-throughput qPCR platform that can simultaneously detect multiple targets in a single reaction. However, the manual post-run analysis of TAC data is time consuming and subject to interpretation. We sought to automate the post-run analysis of TAC data using machine learning models.

Methods: We used 165,214 qPCR amplification curves from two studies to train and test two eXtreme Gradient Boosting (XGBoost) models. Previous manual analyses of the amplification curves by experts in qPCR analysis were used as the gold standard. First, a classification model predicted whether amplification occurred or not, and if so, a second model predicted the cycle threshold (Ct) value. We used 5-fold cross-validation to tune the models and assessed performance using accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and mean absolute error (MAE). For external validation, we used 1,472 reactions previously analyzed by 17 laboratory scientists as part of an external quality assessment for a multisite study.

Results: In internal validation, the classification model achieved an accuracy of 0.996, sensitivity of 0.997, specificity of 0.993, PPV of 0.998, and NPV of 0.991. The Ct prediction model achieved a MAE of 0.590. In external validation, the automated analysis achieved an accuracy of 0.997 and a MAE of 0.611, and the automated analysis was more accurate than manual analyses by 14 of the 17 laboratory scientists.

Conclusions: We automated the post-run analysis of highly-arrayed qPCR data using machine learning models with high accuracy in comparison to a manual gold standard. This approach has the potential to save time and improve reproducibility in laboratories using the TAC platform and other high-throughput qPCR approaches.

Abstract Image

查看原文本刊更多论文

使用机器学习的阵列定量PCR扩增曲线的自动运行后分析。

TaqMan阵列卡（TAC）是一种阵列式高通量qPCR平台，可以在单个反应中同时检测多个目标。但是，对TAC数据进行运行后的手动分析非常耗时，而且需要进行解释。我们试图使用机器学习模型自动化TAC数据的运行后分析。方法：利用两项研究的165214条qPCR扩增曲线，对两种极端梯度增强（eXtreme Gradient Boosting, XGBoost）模型进行训练和检验。以之前qPCR专家手工分析的扩增曲线为金标准。首先，分类模型预测是否发生放大，如果发生，第二个模型预测周期阈值（Ct）值。我们使用5倍交叉验证来调整模型，并使用准确性、敏感性、特异性、阳性预测值（PPV）、阴性预测值（NPV）和平均绝对误差（MAE）来评估模型的性能。为了进行外部验证，我们使用了先前由17名实验室科学家分析的1472种反应，作为多地点研究的外部质量评估的一部分。结果：经内部验证，该分类模型的准确率为0.996，灵敏度为0.997，特异性为0.993，PPV为0.998，NPV为0.991。Ct预测模型的MAE为0.590。在外部验证中，自动化分析的准确度为0.997，MAE为0.611,17名实验室科学家中有14人的自动化分析比人工分析更准确。结论：与手工金标准相比，我们使用机器学习模型自动化了高度排列的qPCR数据的运行后分析，具有较高的准确性。这种方法有可能节省时间，提高实验室使用TAC平台和其他高通量qPCR方法的可重复性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Gates Open Research Immunology and Microbiology-Immunology and Microbiology (miscellaneous)

CiteScore

3.60

自引率

0.00%

发文量