cfMethylPre: deep transfer learning enhances cancer detection based on circulating cell-free DNA methylation profiling.

IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS
Xuchao Zhang, Jing Chen, Yongtian Wang, Xiaofeng Wang, Jialu Hu, Jiajie Peng, Xuequn Shang, Yanpu Wang, Tao Wang
{"title":"cfMethylPre: deep transfer learning enhances cancer detection based on circulating cell-free DNA methylation profiling.","authors":"Xuchao Zhang, Jing Chen, Yongtian Wang, Xiaofeng Wang, Jialu Hu, Jiajie Peng, Xuequn Shang, Yanpu Wang, Tao Wang","doi":"10.1093/bib/bbaf303","DOIUrl":null,"url":null,"abstract":"<p><p>Cancer remains a significant global health burden, underscoring the need for innovative diagnostic tools to enable early detection and improve patient outcomes. While circulating cell-free DNA (cfDNA) methylation has emerged as a promising biomarker for noninvasive cancer diagnostics, existing methods often face limitations in handling the high-dimensionality of methylation data, small sample sizes, and a lack of biological interpretability. To address these challenges, we propose cfMethylPre, a novel deep transfer learning framework tailored for cancer detection using cfDNA methylation data. cfMethylPre leverages large language model pretrained embeddings from DNA sequence information and integrates them with methylation profiles to enhance feature representation. The deep transfer learning process involves pretraining on bulk DNA methylation data encompassing 2801 samples across 82 cancer types and normal controls, followed by fine-tuning with cfDNA methylation data. This approach ensures robust adaptation to cfDNA's unique characteristics while improving predictive accuracy. Our model achieved superior predictive accuracy compared with state-of-the-art methods, with a weighted Matthews Correlation Coefficient of 0.926 and a weighted F1-score of 0.942. Through model interpretation and biological experimental validation, we identified three novel breast cancer genes-PCDHA10, PRICKLE2, and PRTG-demonstrating their inhibitory effects on cell proliferation and migration in breast cancer cell lines. These findings establish cfMethylPre as a powerful and interpretable tool for cancer diagnostics and biological discovery, paving the way for its application in precision oncology.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12206449/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbaf303","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Cancer remains a significant global health burden, underscoring the need for innovative diagnostic tools to enable early detection and improve patient outcomes. While circulating cell-free DNA (cfDNA) methylation has emerged as a promising biomarker for noninvasive cancer diagnostics, existing methods often face limitations in handling the high-dimensionality of methylation data, small sample sizes, and a lack of biological interpretability. To address these challenges, we propose cfMethylPre, a novel deep transfer learning framework tailored for cancer detection using cfDNA methylation data. cfMethylPre leverages large language model pretrained embeddings from DNA sequence information and integrates them with methylation profiles to enhance feature representation. The deep transfer learning process involves pretraining on bulk DNA methylation data encompassing 2801 samples across 82 cancer types and normal controls, followed by fine-tuning with cfDNA methylation data. This approach ensures robust adaptation to cfDNA's unique characteristics while improving predictive accuracy. Our model achieved superior predictive accuracy compared with state-of-the-art methods, with a weighted Matthews Correlation Coefficient of 0.926 and a weighted F1-score of 0.942. Through model interpretation and biological experimental validation, we identified three novel breast cancer genes-PCDHA10, PRICKLE2, and PRTG-demonstrating their inhibitory effects on cell proliferation and migration in breast cancer cell lines. These findings establish cfMethylPre as a powerful and interpretable tool for cancer diagnostics and biological discovery, paving the way for its application in precision oncology.

cfMethylPre:深度迁移学习增强基于循环无细胞DNA甲基化谱的癌症检测。
癌症仍然是一个重大的全球健康负担,因此需要创新的诊断工具,以便及早发现并改善患者的治疗结果。虽然循环无细胞DNA (cfDNA)甲基化已成为一种有前途的非侵入性癌症诊断的生物标志物,但现有的方法在处理甲基化数据的高维性、小样本量和缺乏生物学可解释性方面往往面临限制。为了解决这些挑战,我们提出了cfMethylPre,这是一个新颖的深度迁移学习框架,专门用于使用cfDNA甲基化数据进行癌症检测。cfMethylPre利用DNA序列信息中的大型语言模型预训练嵌入,并将其与甲基化谱集成以增强特征表示。深度迁移学习过程包括对82种癌症类型和正常对照的2801个样本的大量DNA甲基化数据进行预训练,然后对cfDNA甲基化数据进行微调。这种方法确保了对cfDNA独特特征的强大适应,同时提高了预测的准确性。与最先进的方法相比,我们的模型具有更高的预测精度,加权马修斯相关系数为0.926,加权f1得分为0.942。通过模型解释和生物学实验验证,我们发现了三个新的乳腺癌基因pcdha10、PRICKLE2和prtg,证明了它们对乳腺癌细胞系细胞增殖和迁移的抑制作用。这些发现确立了cfMethylPre作为癌症诊断和生物学发现的强大且可解释的工具,为其在精确肿瘤学中的应用铺平了道路。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Briefings in bioinformatics
Briefings in bioinformatics 生物-生化研究方法
CiteScore
13.20
自引率
13.70%
发文量
549
审稿时长
6 months
期刊介绍: Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data. The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信