PTL-PRS: an R package for transfer learning of polygenic risk scores with pseudovalidation.

IF 5.4
Bokeum Cho, Seunggeun Lee
{"title":"PTL-PRS: an R package for transfer learning of polygenic risk scores with pseudovalidation.","authors":"Bokeum Cho, Seunggeun Lee","doi":"10.1093/bioinformatics/btaf540","DOIUrl":null,"url":null,"abstract":"<p><strong>Summary: </strong>Polygenic risk scores (PRSs) are essential tools for predicting individual phenotypic risk but often lack accuracy in non-European ancestry groups. Transfer Learning for Polygenic Risk Scores (TL-PRS) addresses this challenge by leveraging European PRSs to improve prediction in underrepresented ancestries but requires privacy-sensitive individual-level data and has low computational efficiency. Therefore, we introduce PTL-PRS (Pseudovalidated Transfer Learning for PRS), an extension of TL-PRS that incorporates pseudovalidation to eliminate the need for individual-level data and includes further software optimization. For pseudovalidation, PTL-PRS generates pseudo-summary statistics for training and validation and evaluates model performance with the pseudo-R2 metric. To improve computational efficiency, PTL-PRS software was optimized with C ++, blockwise early stopping, and direct genotype retrieval. Overall, PTL-PRS enhances usability while maintaining TL-PRS's predictive performance.</p><p><strong>Availability and implementation: </strong>The PTL.PRS R package is publicly available on GitHub at https://github.com/bokeumcho/PTL.PRS. The summary statistics used in this paper are available in the public domain: UK Biobank (https://pheweb.org/UKB-TOPMed), PGS Catalog (https://www.pgscatalog.org), COVID-19 Host Genetics Initiative (https://www.covid19hg.org) and GenOMICC (https://genomicc.org/data).</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btaf540","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Summary: Polygenic risk scores (PRSs) are essential tools for predicting individual phenotypic risk but often lack accuracy in non-European ancestry groups. Transfer Learning for Polygenic Risk Scores (TL-PRS) addresses this challenge by leveraging European PRSs to improve prediction in underrepresented ancestries but requires privacy-sensitive individual-level data and has low computational efficiency. Therefore, we introduce PTL-PRS (Pseudovalidated Transfer Learning for PRS), an extension of TL-PRS that incorporates pseudovalidation to eliminate the need for individual-level data and includes further software optimization. For pseudovalidation, PTL-PRS generates pseudo-summary statistics for training and validation and evaluates model performance with the pseudo-R2 metric. To improve computational efficiency, PTL-PRS software was optimized with C ++, blockwise early stopping, and direct genotype retrieval. Overall, PTL-PRS enhances usability while maintaining TL-PRS's predictive performance.

Availability and implementation: The PTL.PRS R package is publicly available on GitHub at https://github.com/bokeumcho/PTL.PRS. The summary statistics used in this paper are available in the public domain: UK Biobank (https://pheweb.org/UKB-TOPMed), PGS Catalog (https://www.pgscatalog.org), COVID-19 Host Genetics Initiative (https://www.covid19hg.org) and GenOMICC (https://genomicc.org/data).

Supplementary information: Supplementary data are available at Bioinformatics online.

PTL-PRS:一个具有伪验证的多基因风险评分迁移学习的R包。
摘要:多基因风险评分(PRSs)是预测个体表型风险的重要工具,但在非欧洲血统群体中往往缺乏准确性。多基因风险评分迁移学习(TL-PRS)通过利用欧洲prs来改善对代表性不足的祖先的预测,解决了这一挑战,但需要隐私敏感的个人层面数据,并且计算效率较低。因此,我们引入了PTL-PRS (Pseudovalidated Transfer Learning for PRS),这是TL-PRS的扩展,它包含了伪验证以消除对个人层面数据的需求,并包括进一步的软件优化。对于伪验证,PTL-PRS生成用于训练和验证的伪汇总统计数据,并使用伪r2度量来评估模型性能。为了提高计算效率,我们对PTL-PRS软件进行了c++优化,采用块式早期停止和直接基因型检索。总的来说,PTL-PRS增强了可用性,同时保持了TL-PRS的预测性能。可用性和实现:PTL。PRS R包在GitHub上公开提供https://github.com/bokeumcho/PTL.PRS。本文中使用的汇总统计数据可在公共领域获得:UK Biobank (https://pheweb.org/UKB-TOPMed)、PGS Catalog (https://www.pgscatalog.org)、COVID-19宿主遗传学倡议(https://www.covid19hg.org)和GenOMICC (https://genomicc.org/data).Supplementary)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信