{"title":"PTL-PRS: an R package for transfer learning of polygenic risk scores with pseudovalidation.","authors":"Bokeum Cho, Seunggeun Lee","doi":"10.1093/bioinformatics/btaf540","DOIUrl":null,"url":null,"abstract":"<p><strong>Summary: </strong>Polygenic risk scores (PRSs) are essential tools for predicting individual phenotypic risk but often lack accuracy in non-European ancestry groups. Transfer Learning for Polygenic Risk Scores (TL-PRS) addresses this challenge by leveraging European PRSs to improve prediction in underrepresented ancestries but requires privacy-sensitive individual-level data and has low computational efficiency. Therefore, we introduce PTL-PRS (Pseudovalidated Transfer Learning for PRS), an extension of TL-PRS that incorporates pseudovalidation to eliminate the need for individual-level data and includes further software optimization. For pseudovalidation, PTL-PRS generates pseudo-summary statistics for training and validation and evaluates model performance with the pseudo-R2 metric. To improve computational efficiency, PTL-PRS software was optimized with C ++, blockwise early stopping, and direct genotype retrieval. Overall, PTL-PRS enhances usability while maintaining TL-PRS's predictive performance.</p><p><strong>Availability and implementation: </strong>The PTL.PRS R package is publicly available on GitHub at https://github.com/bokeumcho/PTL.PRS. The summary statistics used in this paper are available in the public domain: UK Biobank (https://pheweb.org/UKB-TOPMed), PGS Catalog (https://www.pgscatalog.org), COVID-19 Host Genetics Initiative (https://www.covid19hg.org) and GenOMICC (https://genomicc.org/data).</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btaf540","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Summary: Polygenic risk scores (PRSs) are essential tools for predicting individual phenotypic risk but often lack accuracy in non-European ancestry groups. Transfer Learning for Polygenic Risk Scores (TL-PRS) addresses this challenge by leveraging European PRSs to improve prediction in underrepresented ancestries but requires privacy-sensitive individual-level data and has low computational efficiency. Therefore, we introduce PTL-PRS (Pseudovalidated Transfer Learning for PRS), an extension of TL-PRS that incorporates pseudovalidation to eliminate the need for individual-level data and includes further software optimization. For pseudovalidation, PTL-PRS generates pseudo-summary statistics for training and validation and evaluates model performance with the pseudo-R2 metric. To improve computational efficiency, PTL-PRS software was optimized with C ++, blockwise early stopping, and direct genotype retrieval. Overall, PTL-PRS enhances usability while maintaining TL-PRS's predictive performance.
Availability and implementation: The PTL.PRS R package is publicly available on GitHub at https://github.com/bokeumcho/PTL.PRS. The summary statistics used in this paper are available in the public domain: UK Biobank (https://pheweb.org/UKB-TOPMed), PGS Catalog (https://www.pgscatalog.org), COVID-19 Host Genetics Initiative (https://www.covid19hg.org) and GenOMICC (https://genomicc.org/data).
Supplementary information: Supplementary data are available at Bioinformatics online.