基于机器学习的蛋白质基因组数据建模确定了用于早期检测肺癌的循环血浆生物标记物

Marcela A Johnson, Liping Hou, Bevan Emma Huang, Assieh Saadatpour, Abolfazl Doostparast Torshizi
{"title":"基于机器学习的蛋白质基因组数据建模确定了用于早期检测肺癌的循环血浆生物标记物","authors":"Marcela A Johnson, Liping Hou, Bevan Emma Huang, Assieh Saadatpour, Abolfazl Doostparast Torshizi","doi":"10.1101/2024.07.30.24311241","DOIUrl":null,"url":null,"abstract":"Identifying genetic variants associated with lung cancer (LC) risk and their impact on plasma protein levels is crucial for understanding LC predisposition. The discovery of risk biomarkers can enhance early LC screening protocols and improve prognostic interventions. In this study, we performed a genome-wide association analysis using the UK Biobank and FinnGen. We identified genetic variants associated with LC and protein levels leveraging the UK Biobank Pharma Proteomics Project. The dysregulated proteins were then analyzed in pre-symptomatic LC cases compared to healthy controls followed by training machine learning models to predict future LC diagnosis. We achieved median AUCs ranging from 0.79 to 0.88 (0-4 years before diagnosis/YBD), 0.73 to 0.83 (5-9YBD), and 0.78 to 0.84 (0-9YBD) based on 5-fold cross-validation. Conducting survival analysis using the 5-9YBD cohort, we identified eight proteins, including CALCB, PLAUR/uPAR, and CD74 whose higher levels were associated with worse overall survival. We also identified potential plasma biomarkers, including previously reported candidates such as CEACAM5, CXCL17, GDF15, and WFDC2, which have shown associations with future LC diagnosis. These proteins are enriched in various pathways, including cytokine signaling, interleukin regulation, neutrophil degranulation, and lung fibrosis. In conclusion, this study generates novel insights into our understanding of the genome-proteome dynamics in LC. Furthermore, our findings present a promising panel of non-invasive plasma biomarkers that hold potential to support early LC screening initiatives and enhance future diagnostic interventions.","PeriodicalId":501375,"journal":{"name":"medRxiv - Genetic and Genomic Medicine","volume":"77 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine learning-based proteogenomic data modeling identifies circulating plasma biomarkers for early detection of lung cancer\",\"authors\":\"Marcela A Johnson, Liping Hou, Bevan Emma Huang, Assieh Saadatpour, Abolfazl Doostparast Torshizi\",\"doi\":\"10.1101/2024.07.30.24311241\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Identifying genetic variants associated with lung cancer (LC) risk and their impact on plasma protein levels is crucial for understanding LC predisposition. The discovery of risk biomarkers can enhance early LC screening protocols and improve prognostic interventions. In this study, we performed a genome-wide association analysis using the UK Biobank and FinnGen. We identified genetic variants associated with LC and protein levels leveraging the UK Biobank Pharma Proteomics Project. The dysregulated proteins were then analyzed in pre-symptomatic LC cases compared to healthy controls followed by training machine learning models to predict future LC diagnosis. We achieved median AUCs ranging from 0.79 to 0.88 (0-4 years before diagnosis/YBD), 0.73 to 0.83 (5-9YBD), and 0.78 to 0.84 (0-9YBD) based on 5-fold cross-validation. Conducting survival analysis using the 5-9YBD cohort, we identified eight proteins, including CALCB, PLAUR/uPAR, and CD74 whose higher levels were associated with worse overall survival. We also identified potential plasma biomarkers, including previously reported candidates such as CEACAM5, CXCL17, GDF15, and WFDC2, which have shown associations with future LC diagnosis. These proteins are enriched in various pathways, including cytokine signaling, interleukin regulation, neutrophil degranulation, and lung fibrosis. In conclusion, this study generates novel insights into our understanding of the genome-proteome dynamics in LC. Furthermore, our findings present a promising panel of non-invasive plasma biomarkers that hold potential to support early LC screening initiatives and enhance future diagnostic interventions.\",\"PeriodicalId\":501375,\"journal\":{\"name\":\"medRxiv - Genetic and Genomic Medicine\",\"volume\":\"77 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"medRxiv - Genetic and Genomic Medicine\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1101/2024.07.30.24311241\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Genetic and Genomic Medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.07.30.24311241","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

确定与肺癌(LC)风险相关的基因变异及其对血浆蛋白水平的影响对于了解肺癌易感性至关重要。发现风险生物标志物可以加强早期肺癌筛查方案并改善预后干预措施。在这项研究中,我们利用英国生物库和 FinnGen 进行了全基因组关联分析。 我们利用英国生物库医药蛋白质组学项目确定了与 LC 和蛋白质水平相关的基因变异。然后,与健康对照组相比,分析了症状前 LC 病例中的失调蛋白,并训练了机器学习模型来预测未来的 LC 诊断。基于5倍交叉验证,我们获得了0.79至0.88(诊断前0-4年/YBD)、0.73至0.83(诊断前5-9年/YBD)和0.78至0.84(诊断前0-9年/YBD)的中位数AUC。在对 5-9YBD 组群进行生存分析时,我们发现包括 CALCB、PLAUR/uPAR 和 CD74 在内的 8 种蛋白质的水平越高,总生存期越短。我们还发现了潜在的血浆生物标志物,包括之前报道过的候选标志物,如 CEACAM5、CXCL17、GDF15 和 WFDC2,这些标志物与未来的 LC 诊断有关联。这些蛋白富集在各种通路中,包括细胞因子信号转导、白细胞介素调节、中性粒细胞脱颗粒和肺纤维化。总之,这项研究为我们了解 LC 的基因组-蛋白质组动态提供了新的视角。此外,我们的研究结果还提出了一组前景广阔的非侵入性血浆生物标志物,这些标志物有望支持早期肺癌筛查计划并增强未来的诊断干预措施。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Machine learning-based proteogenomic data modeling identifies circulating plasma biomarkers for early detection of lung cancer
Identifying genetic variants associated with lung cancer (LC) risk and their impact on plasma protein levels is crucial for understanding LC predisposition. The discovery of risk biomarkers can enhance early LC screening protocols and improve prognostic interventions. In this study, we performed a genome-wide association analysis using the UK Biobank and FinnGen. We identified genetic variants associated with LC and protein levels leveraging the UK Biobank Pharma Proteomics Project. The dysregulated proteins were then analyzed in pre-symptomatic LC cases compared to healthy controls followed by training machine learning models to predict future LC diagnosis. We achieved median AUCs ranging from 0.79 to 0.88 (0-4 years before diagnosis/YBD), 0.73 to 0.83 (5-9YBD), and 0.78 to 0.84 (0-9YBD) based on 5-fold cross-validation. Conducting survival analysis using the 5-9YBD cohort, we identified eight proteins, including CALCB, PLAUR/uPAR, and CD74 whose higher levels were associated with worse overall survival. We also identified potential plasma biomarkers, including previously reported candidates such as CEACAM5, CXCL17, GDF15, and WFDC2, which have shown associations with future LC diagnosis. These proteins are enriched in various pathways, including cytokine signaling, interleukin regulation, neutrophil degranulation, and lung fibrosis. In conclusion, this study generates novel insights into our understanding of the genome-proteome dynamics in LC. Furthermore, our findings present a promising panel of non-invasive plasma biomarkers that hold potential to support early LC screening initiatives and enhance future diagnostic interventions.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信