Ashley Ramos-Lopez, Amanda Garcia Negron, Guie Beeu Guerrero Hunt, Adhi Guerrero-Thillet, Carolina Zambrano Rabanal, Paola Quiñonez Mendez, Andrea Lopez-Marrero, Alvaro Gutierrez, Fernando Zamuner, Bruce J. Trock, Wayne Koch, Mariana Brait, David Sidransky, Rafael Guerrero-Preston
{"title":"LB410:使用DNA甲基化和机器学习对混合血统患者唾液中的高危病变进行分层的精确口腔癌筛查和诊断解决方案","authors":"Ashley Ramos-Lopez, Amanda Garcia Negron, Guie Beeu Guerrero Hunt, Adhi Guerrero-Thillet, Carolina Zambrano Rabanal, Paola Quiñonez Mendez, Andrea Lopez-Marrero, Alvaro Gutierrez, Fernando Zamuner, Bruce J. Trock, Wayne Koch, Mariana Brait, David Sidransky, Rafael Guerrero-Preston","doi":"10.1158/1538-7445.am2025-lb410","DOIUrl":null,"url":null,"abstract":"Analysis of quantitative methylation specific PCR (qMSP) data for diagnosis and early detection of cancer has consisted of summarizing singleplex or multiplex DNA data in a cumulative methylation index, followed by threshold analyses. Recently, a novel clustering algorithm was used to examine digital PCR data, prior to downstream analysis. However, clustering of Real-Time qMSP data has not been used by laboratories developing DNA methylation biomarkers for the oral cancer screening and diagnostic space. We created a precision DNA methylation algorithm to quantify Differentially Methylated Promoters (DMPs) with Real-Time PCR instruments, combined with machine learning, for discovery and validation of head and neck squamous cell carcinoma (HNSCC) early detection, diagnosis, and prognostication targets. Analytic validation of PAX1, PAX5, ZIC4, PLCB1, and HHIP was performed to develop a qMSP protocol for clinical samples. The performance of the six singleplex reactions was tested in 307 oral cancer tissue and 55 normal uvulopalatopharyngoplasty (UPPP) samples from a mixed ancestry cohort (40% Black) obtained from the Johns Hopkins School of Medicine Head and Neck Cancer Tumor Bank. An R script for automated analysis of qMSP data was developed to import, process, and analyze multiple qMSP raw data files exported from Applied Biosystems SDS or DA2 software packages. The workflow includes data preprocessing; filtering by quality control metrics, such as CT and PCR efficiency; normalizing against a control gene (Bactin), and visualizing results through boxplots. A precision DNA methylation algorithm was then developed to perform unsupervised hierarchical clustering of singleplex qMSP observations for five genes, center the data, calculate the distance between all samples, determine the variance explained by each Principal Component (PC), set a cutoff DNA methylation value that maximizes performance for each gene and identify the best model fit. Logistic regression, Linear Discriminant Analysis, Loess, K nearest neighbor, and Random Forrest models, as well as an ensemble of all five models were then trained to model the relationship between test samples and PAX1, PAX5, ZIC4, PLCB1, HHIP DMPs. Model performance was compared based on accuracy and logistic regression was used for downstream analyses. The discriminatory ability of the five genes was evaluated using the Receiver Operator Characteristic (ROC) curve and Area Under the Curve (AUC) analyses. The best performance was obtained when using all five genes (PAX1, PAX5, ZIC4, PLCB1, HHIP): 94% Sensitivity, 96% Specificity, 97% Positive Predictive Value, 91% Negative Predictive Value, correctly classifying 95% with an AUC = 0.99. We also found HHIP fully discriminated between normal and tumor samples in a smaller subset of saliva samples (n=73). The normalized tissue-saliva Inter Quartile Range (IQR) ratio of HHIP DNA methylation was 98%. These results warrant to be validated in a larger cohort. RealTime PCR based tests have shown to be cost effective and scalable. The exceptional discriminatory power between normal and cancer taken into the context of post COVID excess installed Real Time PCR instruments, hold a promise of improving oral cancer early detection and diagnostic pipelines worldwide. Citation Format: Ashley Ramos-Lopez, Amanda Garcia Negron, Guie Beeu Guerrero Hunt, Adhi Guerrero-Thillet, Carolina Zambrano Rabanal, Paola Quiñonez Mendez, Andrea Lopez-Marrero, Alvaro Gutierrez, Fernando Zamuner, Bruce J. Trock, Wayne Koch, Mariana Brait, David Sidransky, Rafael Guerrero-Preston. Precision oral cancer screening and diagnostic solution using DNA methylation and machine learning to stratify high-risk lesions in saliva from patients of mixed ancestry [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2025; Part 2 (Late-Breaking, Clinical Trial, and Invited s); 2025 Apr 25-30; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2025;85(8_Suppl_2): nr LB410.","PeriodicalId":9441,"journal":{"name":"Cancer research","volume":"15 1","pages":""},"PeriodicalIF":12.5000,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Abstract LB410: Precision oral cancer screening and diagnostic solution using DNA methylation and machine learning to stratify high-risk lesions in saliva from patients of mixed ancestry\",\"authors\":\"Ashley Ramos-Lopez, Amanda Garcia Negron, Guie Beeu Guerrero Hunt, Adhi Guerrero-Thillet, Carolina Zambrano Rabanal, Paola Quiñonez Mendez, Andrea Lopez-Marrero, Alvaro Gutierrez, Fernando Zamuner, Bruce J. Trock, Wayne Koch, Mariana Brait, David Sidransky, Rafael Guerrero-Preston\",\"doi\":\"10.1158/1538-7445.am2025-lb410\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Analysis of quantitative methylation specific PCR (qMSP) data for diagnosis and early detection of cancer has consisted of summarizing singleplex or multiplex DNA data in a cumulative methylation index, followed by threshold analyses. Recently, a novel clustering algorithm was used to examine digital PCR data, prior to downstream analysis. However, clustering of Real-Time qMSP data has not been used by laboratories developing DNA methylation biomarkers for the oral cancer screening and diagnostic space. We created a precision DNA methylation algorithm to quantify Differentially Methylated Promoters (DMPs) with Real-Time PCR instruments, combined with machine learning, for discovery and validation of head and neck squamous cell carcinoma (HNSCC) early detection, diagnosis, and prognostication targets. Analytic validation of PAX1, PAX5, ZIC4, PLCB1, and HHIP was performed to develop a qMSP protocol for clinical samples. The performance of the six singleplex reactions was tested in 307 oral cancer tissue and 55 normal uvulopalatopharyngoplasty (UPPP) samples from a mixed ancestry cohort (40% Black) obtained from the Johns Hopkins School of Medicine Head and Neck Cancer Tumor Bank. An R script for automated analysis of qMSP data was developed to import, process, and analyze multiple qMSP raw data files exported from Applied Biosystems SDS or DA2 software packages. The workflow includes data preprocessing; filtering by quality control metrics, such as CT and PCR efficiency; normalizing against a control gene (Bactin), and visualizing results through boxplots. A precision DNA methylation algorithm was then developed to perform unsupervised hierarchical clustering of singleplex qMSP observations for five genes, center the data, calculate the distance between all samples, determine the variance explained by each Principal Component (PC), set a cutoff DNA methylation value that maximizes performance for each gene and identify the best model fit. Logistic regression, Linear Discriminant Analysis, Loess, K nearest neighbor, and Random Forrest models, as well as an ensemble of all five models were then trained to model the relationship between test samples and PAX1, PAX5, ZIC4, PLCB1, HHIP DMPs. Model performance was compared based on accuracy and logistic regression was used for downstream analyses. The discriminatory ability of the five genes was evaluated using the Receiver Operator Characteristic (ROC) curve and Area Under the Curve (AUC) analyses. The best performance was obtained when using all five genes (PAX1, PAX5, ZIC4, PLCB1, HHIP): 94% Sensitivity, 96% Specificity, 97% Positive Predictive Value, 91% Negative Predictive Value, correctly classifying 95% with an AUC = 0.99. We also found HHIP fully discriminated between normal and tumor samples in a smaller subset of saliva samples (n=73). The normalized tissue-saliva Inter Quartile Range (IQR) ratio of HHIP DNA methylation was 98%. These results warrant to be validated in a larger cohort. RealTime PCR based tests have shown to be cost effective and scalable. The exceptional discriminatory power between normal and cancer taken into the context of post COVID excess installed Real Time PCR instruments, hold a promise of improving oral cancer early detection and diagnostic pipelines worldwide. Citation Format: Ashley Ramos-Lopez, Amanda Garcia Negron, Guie Beeu Guerrero Hunt, Adhi Guerrero-Thillet, Carolina Zambrano Rabanal, Paola Quiñonez Mendez, Andrea Lopez-Marrero, Alvaro Gutierrez, Fernando Zamuner, Bruce J. Trock, Wayne Koch, Mariana Brait, David Sidransky, Rafael Guerrero-Preston. Precision oral cancer screening and diagnostic solution using DNA methylation and machine learning to stratify high-risk lesions in saliva from patients of mixed ancestry [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2025; Part 2 (Late-Breaking, Clinical Trial, and Invited s); 2025 Apr 25-30; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2025;85(8_Suppl_2): nr LB410.\",\"PeriodicalId\":9441,\"journal\":{\"name\":\"Cancer research\",\"volume\":\"15 1\",\"pages\":\"\"},\"PeriodicalIF\":12.5000,\"publicationDate\":\"2025-04-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cancer research\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1158/1538-7445.am2025-lb410\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cancer research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1158/1538-7445.am2025-lb410","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0
摘要
用于癌症诊断和早期检测的定量甲基化特异性PCR (qMSP)数据分析包括总结累积甲基化指数中的单复数或多重DNA数据,然后进行阈值分析。最近,一种新的聚类算法被用来检查数字PCR数据,在下游分析之前。然而,实时qMSP数据的聚类尚未被用于开发口腔癌筛查和诊断领域的DNA甲基化生物标志物的实验室使用。我们创建了一种精确的DNA甲基化算法,利用Real-Time PCR仪器结合机器学习,量化差异甲基化启动子(dmp),用于发现和验证头颈部鳞状细胞癌(HNSCC)的早期检测、诊断和预后目标。对PAX1、PAX5、ZIC4、PLCB1和HHIP进行分析验证,以建立临床样品的qMSP方案。在约翰霍普金斯医学院头颈癌肿瘤库的307个口腔癌组织和55个正常的悬垂腭咽成体术(UPPP)样本(40%为黑人)中测试了这6种单一反应的性能。开发了一个用于qMSP数据自动分析的R脚本,用于导入、处理和分析从Applied Biosystems SDS或DA2软件包导出的多个qMSP原始数据文件。工作流包括数据预处理;通过质量控制指标进行过滤,如CT和PCR效率;对照基因(Bactin)进行正常化,并通过箱形图将结果可视化。然后,开发了一种精确DNA甲基化算法,对五个基因的单plex qMSP观察结果进行无监督分层聚类,将数据居中,计算所有样本之间的距离,确定每个主成分(PC)解释的方差,设置一个截断DNA甲基化值,使每个基因的性能最大化,并确定最佳模型拟合。然后利用Logistic回归、线性判别分析、黄土、K近邻和随机Forrest模型以及所有五种模型的集合来模拟测试样本与PAX1、PAX5、ZIC4、PLCB1、hip dmp之间的关系。基于准确性比较模型性能,并使用逻辑回归进行下游分析。采用接收算子特征曲线(Receiver Operator Characteristic, ROC)和曲线下面积(Area Under The curve, AUC)分析评价5个基因的区分能力。5个基因(PAX1、PAX5、ZIC4、PLCB1、HHIP)的检测灵敏度为94%,特异性为96%,阳性预测值为97%,阴性预测值为91%,正确率为95%,AUC = 0.99。我们还发现HHIP在一小部分唾液样本中完全区分正常样本和肿瘤样本(n=73)。hip DNA甲基化的标准化组织-唾液四分位数范围(IQR)比率为98%。这些结果需要在更大的队列中得到验证。基于实时PCR的测试已被证明具有成本效益和可扩展性。在新冠肺炎疫情后安装的大量实时PCR仪器的背景下,正常和癌症之间的特殊区别力量有望改善全球口腔癌的早期检测和诊断管道。引文格式:Ashley Ramos-Lopez, Amanda Garcia Negron, Guie Beeu Guerrero Hunt, Adhi Guerrero- thillet, Carolina Zambrano Rabanal, Paola Quiñonez Mendez, Andrea Lopez-Marrero, Alvaro Gutierrez, Fernando Zamuner, Bruce J. Trock, Wayne Koch, Mariana Brait, David Sidransky, Rafael Guerrero- preston。使用DNA甲基化和机器学习对混合血统患者唾液中高危病变进行分层的精密口腔癌筛查和诊断解决方案[摘要]。摘自:《2025年美国癌症研究协会年会论文集》;第二部分(最新进展,临床试验,并邀请s);2025年4月25日至30日;费城(PA): AACR;中国癌症杂志,2015;35(8):391 - 391。
Abstract LB410: Precision oral cancer screening and diagnostic solution using DNA methylation and machine learning to stratify high-risk lesions in saliva from patients of mixed ancestry
Analysis of quantitative methylation specific PCR (qMSP) data for diagnosis and early detection of cancer has consisted of summarizing singleplex or multiplex DNA data in a cumulative methylation index, followed by threshold analyses. Recently, a novel clustering algorithm was used to examine digital PCR data, prior to downstream analysis. However, clustering of Real-Time qMSP data has not been used by laboratories developing DNA methylation biomarkers for the oral cancer screening and diagnostic space. We created a precision DNA methylation algorithm to quantify Differentially Methylated Promoters (DMPs) with Real-Time PCR instruments, combined with machine learning, for discovery and validation of head and neck squamous cell carcinoma (HNSCC) early detection, diagnosis, and prognostication targets. Analytic validation of PAX1, PAX5, ZIC4, PLCB1, and HHIP was performed to develop a qMSP protocol for clinical samples. The performance of the six singleplex reactions was tested in 307 oral cancer tissue and 55 normal uvulopalatopharyngoplasty (UPPP) samples from a mixed ancestry cohort (40% Black) obtained from the Johns Hopkins School of Medicine Head and Neck Cancer Tumor Bank. An R script for automated analysis of qMSP data was developed to import, process, and analyze multiple qMSP raw data files exported from Applied Biosystems SDS or DA2 software packages. The workflow includes data preprocessing; filtering by quality control metrics, such as CT and PCR efficiency; normalizing against a control gene (Bactin), and visualizing results through boxplots. A precision DNA methylation algorithm was then developed to perform unsupervised hierarchical clustering of singleplex qMSP observations for five genes, center the data, calculate the distance between all samples, determine the variance explained by each Principal Component (PC), set a cutoff DNA methylation value that maximizes performance for each gene and identify the best model fit. Logistic regression, Linear Discriminant Analysis, Loess, K nearest neighbor, and Random Forrest models, as well as an ensemble of all five models were then trained to model the relationship between test samples and PAX1, PAX5, ZIC4, PLCB1, HHIP DMPs. Model performance was compared based on accuracy and logistic regression was used for downstream analyses. The discriminatory ability of the five genes was evaluated using the Receiver Operator Characteristic (ROC) curve and Area Under the Curve (AUC) analyses. The best performance was obtained when using all five genes (PAX1, PAX5, ZIC4, PLCB1, HHIP): 94% Sensitivity, 96% Specificity, 97% Positive Predictive Value, 91% Negative Predictive Value, correctly classifying 95% with an AUC = 0.99. We also found HHIP fully discriminated between normal and tumor samples in a smaller subset of saliva samples (n=73). The normalized tissue-saliva Inter Quartile Range (IQR) ratio of HHIP DNA methylation was 98%. These results warrant to be validated in a larger cohort. RealTime PCR based tests have shown to be cost effective and scalable. The exceptional discriminatory power between normal and cancer taken into the context of post COVID excess installed Real Time PCR instruments, hold a promise of improving oral cancer early detection and diagnostic pipelines worldwide. Citation Format: Ashley Ramos-Lopez, Amanda Garcia Negron, Guie Beeu Guerrero Hunt, Adhi Guerrero-Thillet, Carolina Zambrano Rabanal, Paola Quiñonez Mendez, Andrea Lopez-Marrero, Alvaro Gutierrez, Fernando Zamuner, Bruce J. Trock, Wayne Koch, Mariana Brait, David Sidransky, Rafael Guerrero-Preston. Precision oral cancer screening and diagnostic solution using DNA methylation and machine learning to stratify high-risk lesions in saliva from patients of mixed ancestry [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2025; Part 2 (Late-Breaking, Clinical Trial, and Invited s); 2025 Apr 25-30; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2025;85(8_Suppl_2): nr LB410.
期刊介绍:
Cancer Research, published by the American Association for Cancer Research (AACR), is a journal that focuses on impactful original studies, reviews, and opinion pieces relevant to the broad cancer research community. Manuscripts that present conceptual or technological advances leading to insights into cancer biology are particularly sought after. The journal also places emphasis on convergence science, which involves bridging multiple distinct areas of cancer research.
With primary subsections including Cancer Biology, Cancer Immunology, Cancer Metabolism and Molecular Mechanisms, Translational Cancer Biology, Cancer Landscapes, and Convergence Science, Cancer Research has a comprehensive scope. It is published twice a month and has one volume per year, with a print ISSN of 0008-5472 and an online ISSN of 1538-7445.
Cancer Research is abstracted and/or indexed in various databases and platforms, including BIOSIS Previews (R) Database, MEDLINE, Current Contents/Life Sciences, Current Contents/Clinical Medicine, Science Citation Index, Scopus, and Web of Science.