{"title":"PTL-PRS:一个具有伪验证的多基因风险评分迁移学习的R包。","authors":"Bokeum Cho, Seunggeun Lee","doi":"10.1093/bioinformatics/btaf540","DOIUrl":null,"url":null,"abstract":"<p><strong>Summary: </strong>Polygenic risk scores (PRSs) are essential tools for predicting individual phenotypic risk but often lack accuracy in non-European ancestry groups. Transfer Learning for Polygenic Risk Scores (TL-PRS) addresses this challenge by leveraging European PRSs to improve prediction in underrepresented ancestries but requires privacy-sensitive individual-level data and has low computational efficiency. Therefore, we introduce PTL-PRS (Pseudovalidated Transfer Learning for PRS), an extension of TL-PRS that incorporates pseudovalidation to eliminate the need for individual-level data and includes further software optimization. For pseudovalidation, PTL-PRS generates pseudo-summary statistics for training and validation and evaluates model performance with the pseudo-R2 metric. To improve computational efficiency, PTL-PRS software was optimized with C ++, blockwise early stopping, and direct genotype retrieval. Overall, PTL-PRS enhances usability while maintaining TL-PRS's predictive performance.</p><p><strong>Availability and implementation: </strong>The PTL.PRS R package is publicly available on GitHub at https://github.com/bokeumcho/PTL.PRS. The summary statistics used in this paper are available in the public domain: UK Biobank (https://pheweb.org/UKB-TOPMed), PGS Catalog (https://www.pgscatalog.org), COVID-19 Host Genetics Initiative (https://www.covid19hg.org) and GenOMICC (https://genomicc.org/data).</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PTL-PRS: an R package for transfer learning of polygenic risk scores with pseudovalidation.\",\"authors\":\"Bokeum Cho, Seunggeun Lee\",\"doi\":\"10.1093/bioinformatics/btaf540\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Summary: </strong>Polygenic risk scores (PRSs) are essential tools for predicting individual phenotypic risk but often lack accuracy in non-European ancestry groups. Transfer Learning for Polygenic Risk Scores (TL-PRS) addresses this challenge by leveraging European PRSs to improve prediction in underrepresented ancestries but requires privacy-sensitive individual-level data and has low computational efficiency. Therefore, we introduce PTL-PRS (Pseudovalidated Transfer Learning for PRS), an extension of TL-PRS that incorporates pseudovalidation to eliminate the need for individual-level data and includes further software optimization. For pseudovalidation, PTL-PRS generates pseudo-summary statistics for training and validation and evaluates model performance with the pseudo-R2 metric. To improve computational efficiency, PTL-PRS software was optimized with C ++, blockwise early stopping, and direct genotype retrieval. Overall, PTL-PRS enhances usability while maintaining TL-PRS's predictive performance.</p><p><strong>Availability and implementation: </strong>The PTL.PRS R package is publicly available on GitHub at https://github.com/bokeumcho/PTL.PRS. The summary statistics used in this paper are available in the public domain: UK Biobank (https://pheweb.org/UKB-TOPMed), PGS Catalog (https://www.pgscatalog.org), COVID-19 Host Genetics Initiative (https://www.covid19hg.org) and GenOMICC (https://genomicc.org/data).</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>\",\"PeriodicalId\":93899,\"journal\":{\"name\":\"Bioinformatics (Oxford, England)\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2025-09-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics (Oxford, England)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/bioinformatics/btaf540\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btaf540","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
摘要:多基因风险评分(PRSs)是预测个体表型风险的重要工具,但在非欧洲血统群体中往往缺乏准确性。多基因风险评分迁移学习(TL-PRS)通过利用欧洲prs来改善对代表性不足的祖先的预测,解决了这一挑战,但需要隐私敏感的个人层面数据,并且计算效率较低。因此,我们引入了PTL-PRS (Pseudovalidated Transfer Learning for PRS),这是TL-PRS的扩展,它包含了伪验证以消除对个人层面数据的需求,并包括进一步的软件优化。对于伪验证,PTL-PRS生成用于训练和验证的伪汇总统计数据,并使用伪r2度量来评估模型性能。为了提高计算效率,我们对PTL-PRS软件进行了c++优化,采用块式早期停止和直接基因型检索。总的来说,PTL-PRS增强了可用性,同时保持了TL-PRS的预测性能。可用性和实现:PTL。PRS R包在GitHub上公开提供https://github.com/bokeumcho/PTL.PRS。本文中使用的汇总统计数据可在公共领域获得:UK Biobank (https://pheweb.org/UKB-TOPMed)、PGS Catalog (https://www.pgscatalog.org)、COVID-19宿主遗传学倡议(https://www.covid19hg.org)和GenOMICC (https://genomicc.org/data).Supplementary)。
PTL-PRS: an R package for transfer learning of polygenic risk scores with pseudovalidation.
Summary: Polygenic risk scores (PRSs) are essential tools for predicting individual phenotypic risk but often lack accuracy in non-European ancestry groups. Transfer Learning for Polygenic Risk Scores (TL-PRS) addresses this challenge by leveraging European PRSs to improve prediction in underrepresented ancestries but requires privacy-sensitive individual-level data and has low computational efficiency. Therefore, we introduce PTL-PRS (Pseudovalidated Transfer Learning for PRS), an extension of TL-PRS that incorporates pseudovalidation to eliminate the need for individual-level data and includes further software optimization. For pseudovalidation, PTL-PRS generates pseudo-summary statistics for training and validation and evaluates model performance with the pseudo-R2 metric. To improve computational efficiency, PTL-PRS software was optimized with C ++, blockwise early stopping, and direct genotype retrieval. Overall, PTL-PRS enhances usability while maintaining TL-PRS's predictive performance.
Availability and implementation: The PTL.PRS R package is publicly available on GitHub at https://github.com/bokeumcho/PTL.PRS. The summary statistics used in this paper are available in the public domain: UK Biobank (https://pheweb.org/UKB-TOPMed), PGS Catalog (https://www.pgscatalog.org), COVID-19 Host Genetics Initiative (https://www.covid19hg.org) and GenOMICC (https://genomicc.org/data).
Supplementary information: Supplementary data are available at Bioinformatics online.