STREAM-PRS:一个多工具流水线,用于简化多基因风险评分计算。

IF 10.4 1区 生物学 Q1 GENETICS & HEREDITY
Sara Becelaere, Yasmina Abakkouy, Deborah Sarah Jans, Margaux David, Séverine Vermeire, Isabelle Cleynen
{"title":"STREAM-PRS:一个多工具流水线,用于简化多基因风险评分计算。","authors":"Sara Becelaere, Yasmina Abakkouy, Deborah Sarah Jans, Margaux David, Séverine Vermeire, Isabelle Cleynen","doi":"10.1186/s13073-025-01539-0","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Polygenic risk scores (PRS) offer an elegant approach to estimating an individual's genetic predisposition to a given disease or trait. Numerous tools are available for PRS calculation, each applying different strategies to account for linkage disequilibrium and effect size shrinkage. No single tool is inherently superior. Therefore, multiple tools should be tested to identify the one that best suits the research question. Additionally, challenges such as population stratification and PRS portability further complicate the field. Here, we developed STREAM-PRS, a PRS pipeline designed to calculate scores using five popular tools: PRSice-2, PRS-CS, LDpred2, lassosum, and lassosum2.</p><p><strong>Methods: </strong>STREAM-PRS first computes scores under various settings in a training dataset. The selected variants are subsequently used for score calculation in the test dataset, followed by PC correction and standardization to improve portability across different centers. Finally, the pipeline determines the best PRS tool and settings based on the variance explained (R<sup>2</sup>) in the test dataset. To demonstrate this PRS pipeline, we applied it to an in-house inflammatory bowel disease (IBD) cohort consisting of 3192 IBD cases and 822 controls. In total, 472 scores were created using The 1000 Genomes non-Finnish European subpopulation as training data and applied to UK Biobank data as the test dataset.</p><p><strong>Results: </strong>Using STREAM-PRS for 472 scores across the 5 PRS tools with 404 individuals in the training and 1000 individuals in the test dataset takes approximately 20 h to complete. For IBD, lassosum was identified as the best-performing tool with optimal settings as follows: a shrinkage value of 0.7 and a lambda value of 0.008859. Applying this optimized PRS to our in-house IBD dataset (validation) resulted in an R<sup>²</sup> of 0.203 and an AUC of 0.75. Further, the PRS showed a high positive predictive value of 0.905 but a low negative predictive value of 0.341. This suggests that the PRS is effective in identifying individuals at high risk but might be less reliable in excluding lower risk individuals.</p><p><strong>Conclusions: </strong>Overall, STREAM-PRS provides an efficient framework for selecting the best PRS calculation strategy and helps bridge the portability gap within the PRS field. STREAM-PRS is available at https://github.com/SaraBecelaere/STREAM-PRS.</p>","PeriodicalId":12645,"journal":{"name":"Genome Medicine","volume":"17 1","pages":"119"},"PeriodicalIF":10.4000,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12512491/pdf/","citationCount":"0","resultStr":"{\"title\":\"STREAM-PRS: a multi-tool pipeline for streamlining polygenic risk score computation.\",\"authors\":\"Sara Becelaere, Yasmina Abakkouy, Deborah Sarah Jans, Margaux David, Séverine Vermeire, Isabelle Cleynen\",\"doi\":\"10.1186/s13073-025-01539-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Polygenic risk scores (PRS) offer an elegant approach to estimating an individual's genetic predisposition to a given disease or trait. Numerous tools are available for PRS calculation, each applying different strategies to account for linkage disequilibrium and effect size shrinkage. No single tool is inherently superior. Therefore, multiple tools should be tested to identify the one that best suits the research question. Additionally, challenges such as population stratification and PRS portability further complicate the field. Here, we developed STREAM-PRS, a PRS pipeline designed to calculate scores using five popular tools: PRSice-2, PRS-CS, LDpred2, lassosum, and lassosum2.</p><p><strong>Methods: </strong>STREAM-PRS first computes scores under various settings in a training dataset. The selected variants are subsequently used for score calculation in the test dataset, followed by PC correction and standardization to improve portability across different centers. Finally, the pipeline determines the best PRS tool and settings based on the variance explained (R<sup>2</sup>) in the test dataset. To demonstrate this PRS pipeline, we applied it to an in-house inflammatory bowel disease (IBD) cohort consisting of 3192 IBD cases and 822 controls. In total, 472 scores were created using The 1000 Genomes non-Finnish European subpopulation as training data and applied to UK Biobank data as the test dataset.</p><p><strong>Results: </strong>Using STREAM-PRS for 472 scores across the 5 PRS tools with 404 individuals in the training and 1000 individuals in the test dataset takes approximately 20 h to complete. For IBD, lassosum was identified as the best-performing tool with optimal settings as follows: a shrinkage value of 0.7 and a lambda value of 0.008859. Applying this optimized PRS to our in-house IBD dataset (validation) resulted in an R<sup>²</sup> of 0.203 and an AUC of 0.75. Further, the PRS showed a high positive predictive value of 0.905 but a low negative predictive value of 0.341. This suggests that the PRS is effective in identifying individuals at high risk but might be less reliable in excluding lower risk individuals.</p><p><strong>Conclusions: </strong>Overall, STREAM-PRS provides an efficient framework for selecting the best PRS calculation strategy and helps bridge the portability gap within the PRS field. STREAM-PRS is available at https://github.com/SaraBecelaere/STREAM-PRS.</p>\",\"PeriodicalId\":12645,\"journal\":{\"name\":\"Genome Medicine\",\"volume\":\"17 1\",\"pages\":\"119\"},\"PeriodicalIF\":10.4000,\"publicationDate\":\"2025-10-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12512491/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genome Medicine\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s13073-025-01539-0\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome Medicine","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13073-025-01539-0","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

摘要

背景:多基因风险评分(PRS)提供了一种评估个体对特定疾病或特征的遗传易感性的优雅方法。有许多工具可用于PRS计算,每个工具都应用不同的策略来解释联系不平衡和效应大小收缩。没有哪一种工具是天生优越的。因此,应该测试多个工具,以确定最适合研究问题的工具。此外,人口分层和PRS可移植性等挑战使该领域进一步复杂化。在这里,我们开发了STREAM-PRS,这是一个PRS管道,旨在使用五种流行的工具来计算分数:prsce -2, PRS- cs, LDpred2, lassosum和lassosum2。方法:STREAM-PRS首先在训练数据集中计算各种设置下的分数。选择的变体随后用于测试数据集中的分数计算,然后进行PC校正和标准化,以提高不同中心的可移植性。最后,管道根据测试数据集中解释的方差(R2)确定最佳PRS工具和设置。为了证明这种PRS管道,我们将其应用于一个由3192例IBD病例和822例对照组成的内部炎症性肠病(IBD)队列。总共使用1000个基因组非芬兰欧洲亚群作为训练数据创建了472个分数,并应用于UK Biobank数据作为测试数据集。结果:在5个PRS工具中使用STREAM-PRS进行472个分数,训练中有404个个体,测试数据集中有1000个个体,大约需要20小时才能完成。对于IBD, lassosum被认为是性能最好的工具,其最佳设置如下:收缩值为0.7,lambda值为0.008859。将此优化的PRS应用于我们内部的IBD数据集(验证),R²为0.203,AUC为0.75。PRS的阳性预测值较高,为0.905,阴性预测值较低,为0.341。这表明PRS在识别高风险个体方面是有效的,但在排除低风险个体方面可能不太可靠。结论:总体而言,STREAM-PRS为选择最佳的PRS计算策略提供了一个有效的框架,并有助于弥合PRS领域内可移植性的差距。STREAM-PRS可在https://github.com/SaraBecelaere/STREAM-PRS上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

STREAM-PRS: a multi-tool pipeline for streamlining polygenic risk score computation.

STREAM-PRS: a multi-tool pipeline for streamlining polygenic risk score computation.

STREAM-PRS: a multi-tool pipeline for streamlining polygenic risk score computation.

STREAM-PRS: a multi-tool pipeline for streamlining polygenic risk score computation.

Background: Polygenic risk scores (PRS) offer an elegant approach to estimating an individual's genetic predisposition to a given disease or trait. Numerous tools are available for PRS calculation, each applying different strategies to account for linkage disequilibrium and effect size shrinkage. No single tool is inherently superior. Therefore, multiple tools should be tested to identify the one that best suits the research question. Additionally, challenges such as population stratification and PRS portability further complicate the field. Here, we developed STREAM-PRS, a PRS pipeline designed to calculate scores using five popular tools: PRSice-2, PRS-CS, LDpred2, lassosum, and lassosum2.

Methods: STREAM-PRS first computes scores under various settings in a training dataset. The selected variants are subsequently used for score calculation in the test dataset, followed by PC correction and standardization to improve portability across different centers. Finally, the pipeline determines the best PRS tool and settings based on the variance explained (R2) in the test dataset. To demonstrate this PRS pipeline, we applied it to an in-house inflammatory bowel disease (IBD) cohort consisting of 3192 IBD cases and 822 controls. In total, 472 scores were created using The 1000 Genomes non-Finnish European subpopulation as training data and applied to UK Biobank data as the test dataset.

Results: Using STREAM-PRS for 472 scores across the 5 PRS tools with 404 individuals in the training and 1000 individuals in the test dataset takes approximately 20 h to complete. For IBD, lassosum was identified as the best-performing tool with optimal settings as follows: a shrinkage value of 0.7 and a lambda value of 0.008859. Applying this optimized PRS to our in-house IBD dataset (validation) resulted in an R² of 0.203 and an AUC of 0.75. Further, the PRS showed a high positive predictive value of 0.905 but a low negative predictive value of 0.341. This suggests that the PRS is effective in identifying individuals at high risk but might be less reliable in excluding lower risk individuals.

Conclusions: Overall, STREAM-PRS provides an efficient framework for selecting the best PRS calculation strategy and helps bridge the portability gap within the PRS field. STREAM-PRS is available at https://github.com/SaraBecelaere/STREAM-PRS.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Genome Medicine
Genome Medicine GENETICS & HEREDITY-
CiteScore
20.80
自引率
0.80%
发文量
128
审稿时长
6-12 weeks
期刊介绍: Genome Medicine is an open access journal that publishes outstanding research applying genetics, genomics, and multi-omics to understand, diagnose, and treat disease. Bridging basic science and clinical research, it covers areas such as cancer genomics, immuno-oncology, immunogenomics, infectious disease, microbiome, neurogenomics, systems medicine, clinical genomics, gene therapies, precision medicine, and clinical trials. The journal publishes original research, methods, software, and reviews to serve authors and promote broad interest and importance in the field.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信