高效的blockLASSO多基因评分应用于我们所有人和英国生物银行。

IF 3.5 2区 生物学 Q2 BIOTECHNOLOGY & APPLIED MICROBIOLOGY
Timothy G Raben, Louis Lello, Erik Widen, Stephen D H Hsu
{"title":"高效的blockLASSO多基因评分应用于我们所有人和英国生物银行。","authors":"Timothy G Raben, Louis Lello, Erik Widen, Stephen D H Hsu","doi":"10.1186/s12864-025-11505-0","DOIUrl":null,"url":null,"abstract":"<p><p>We develop a \"block\" LASSO (blockLASSO) approach for training polygenic scores (PGS) and demonstrate its use in All of Us (AoU) and the UK Biobank (UKB). blockLASSO utilizes the approximate block diagonal structure (due to chromosomal partition of the genome) of linkage disequilibrium (LD). The new implementation can be used for exploratory and methods research where repeated PGS training is necessary and expensive. For 11 different phenotypes, in two different biobanks, and across 5 different ancestry groups (African, American, East Asian, European, and South Asian) - we demonstrate that blockLASSO is generally as effective for training PGS as a (global) LASSO. Previous work has shown penalized regression methods produce competitive PGS to alternative approaches. It has been shown that some phenotypes are more/less polygenic than others. Using sparse algorithms, an accurate PGS can be trained for type 1 diabetes (T1D) using <math><mrow><mo>∼</mo> <mn>100</mn></mrow> </math> single nucleotide variants (SNVs), but a PGS for body mass index (BMI) would need more than 10k SNVs. blockLASSO produces similar PGS for phenotypes while training with just a fraction of the variants per block. Within AoU (using only genetic information) block PGS for T1D reaches an AUC of <math><mrow><mn>0</mn> <mo>.</mo> <msub><mn>63</mn> <mrow><mo>±</mo> <mn>0.02</mn></mrow> </msub> </mrow> </math> and for BMI a correlation of <math><mrow><mn>0</mn> <mo>.</mo> <msub><mn>21</mn> <mrow><mo>±</mo> <mn>0.01</mn></mrow> </msub> </mrow> </math> , whereas a global LASSO approach which finds for T1D an AUC <math><mrow><mn>0</mn> <mo>.</mo> <msub><mn>65</mn> <mrow><mo>±</mo> <mn>0.03</mn></mrow> </msub> </mrow> </math> and BMI a correlation <math><mrow><mn>0</mn> <mo>.</mo> <msub><mn>19</mn> <mrow><mo>±</mo> <mn>0.03</mn></mrow> </msub> </mrow> </math> . This new block approach is more computationally efficient and scalable than naive global machine learning approaches and makes it ideal for exploratory methods investigations based on penalized regression.</p>","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":"26 1","pages":"302"},"PeriodicalIF":3.5000,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11948729/pdf/","citationCount":"0","resultStr":"{\"title\":\"Efficient blockLASSO for polygenic scores with applications to all of us and UK Biobank.\",\"authors\":\"Timothy G Raben, Louis Lello, Erik Widen, Stephen D H Hsu\",\"doi\":\"10.1186/s12864-025-11505-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>We develop a \\\"block\\\" LASSO (blockLASSO) approach for training polygenic scores (PGS) and demonstrate its use in All of Us (AoU) and the UK Biobank (UKB). blockLASSO utilizes the approximate block diagonal structure (due to chromosomal partition of the genome) of linkage disequilibrium (LD). The new implementation can be used for exploratory and methods research where repeated PGS training is necessary and expensive. For 11 different phenotypes, in two different biobanks, and across 5 different ancestry groups (African, American, East Asian, European, and South Asian) - we demonstrate that blockLASSO is generally as effective for training PGS as a (global) LASSO. Previous work has shown penalized regression methods produce competitive PGS to alternative approaches. It has been shown that some phenotypes are more/less polygenic than others. Using sparse algorithms, an accurate PGS can be trained for type 1 diabetes (T1D) using <math><mrow><mo>∼</mo> <mn>100</mn></mrow> </math> single nucleotide variants (SNVs), but a PGS for body mass index (BMI) would need more than 10k SNVs. blockLASSO produces similar PGS for phenotypes while training with just a fraction of the variants per block. Within AoU (using only genetic information) block PGS for T1D reaches an AUC of <math><mrow><mn>0</mn> <mo>.</mo> <msub><mn>63</mn> <mrow><mo>±</mo> <mn>0.02</mn></mrow> </msub> </mrow> </math> and for BMI a correlation of <math><mrow><mn>0</mn> <mo>.</mo> <msub><mn>21</mn> <mrow><mo>±</mo> <mn>0.01</mn></mrow> </msub> </mrow> </math> , whereas a global LASSO approach which finds for T1D an AUC <math><mrow><mn>0</mn> <mo>.</mo> <msub><mn>65</mn> <mrow><mo>±</mo> <mn>0.03</mn></mrow> </msub> </mrow> </math> and BMI a correlation <math><mrow><mn>0</mn> <mo>.</mo> <msub><mn>19</mn> <mrow><mo>±</mo> <mn>0.03</mn></mrow> </msub> </mrow> </math> . This new block approach is more computationally efficient and scalable than naive global machine learning approaches and makes it ideal for exploratory methods investigations based on penalized regression.</p>\",\"PeriodicalId\":9030,\"journal\":{\"name\":\"BMC Genomics\",\"volume\":\"26 1\",\"pages\":\"302\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2025-03-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11948729/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Genomics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s12864-025-11505-0\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOTECHNOLOGY & APPLIED MICROBIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Genomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12864-025-11505-0","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

我们开发了一种“块”LASSO (blockLASSO)方法来训练多基因评分(PGS),并演示了其在All of Us (AoU)和UK Biobank (UKB)中的应用。blockLASSO利用连锁不平衡(LD)的近似块对角结构(由于基因组的染色体分裂)。新的实现可以用于探索性和方法研究,重复的PGS训练是必要的和昂贵的。对于11种不同的表型,在两个不同的生物库中,跨越5个不同的祖先群体(非洲人、美洲人、东亚人、欧洲人和南亚人),我们证明blockLASSO通常与(全球)LASSO一样有效。以前的工作表明,惩罚回归方法产生竞争的PGS替代方法。已有研究表明,一些表型比其他表型多基因或少多基因。使用稀疏算法,可以使用约100个单核苷酸变异(snv)训练准确的1型糖尿病(T1D) PGS,但体重指数(BMI)的PGS需要超过10k个snv。blockLASSO在训练时产生相似的表型PGS,每个区块只有一小部分变体。在AoU内(仅使用遗传信息),T1D的块PGS的AUC为0。63±0.02,BMI相关系数为0。21±0.01,而全球LASSO方法发现T1D和AUC为0。65±0.03与BMI相关0。19±0.03。这种新的块方法比朴素的全局机器学习方法更具计算效率和可扩展性,使其成为基于惩罚回归的探索性方法研究的理想选择。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Efficient blockLASSO for polygenic scores with applications to all of us and UK Biobank.

We develop a "block" LASSO (blockLASSO) approach for training polygenic scores (PGS) and demonstrate its use in All of Us (AoU) and the UK Biobank (UKB). blockLASSO utilizes the approximate block diagonal structure (due to chromosomal partition of the genome) of linkage disequilibrium (LD). The new implementation can be used for exploratory and methods research where repeated PGS training is necessary and expensive. For 11 different phenotypes, in two different biobanks, and across 5 different ancestry groups (African, American, East Asian, European, and South Asian) - we demonstrate that blockLASSO is generally as effective for training PGS as a (global) LASSO. Previous work has shown penalized regression methods produce competitive PGS to alternative approaches. It has been shown that some phenotypes are more/less polygenic than others. Using sparse algorithms, an accurate PGS can be trained for type 1 diabetes (T1D) using 100 single nucleotide variants (SNVs), but a PGS for body mass index (BMI) would need more than 10k SNVs. blockLASSO produces similar PGS for phenotypes while training with just a fraction of the variants per block. Within AoU (using only genetic information) block PGS for T1D reaches an AUC of 0 . 63 ± 0.02 and for BMI a correlation of 0 . 21 ± 0.01 , whereas a global LASSO approach which finds for T1D an AUC 0 . 65 ± 0.03 and BMI a correlation 0 . 19 ± 0.03 . This new block approach is more computationally efficient and scalable than naive global machine learning approaches and makes it ideal for exploratory methods investigations based on penalized regression.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
BMC Genomics
BMC Genomics 生物-生物工程与应用微生物
CiteScore
7.40
自引率
4.50%
发文量
769
审稿时长
6.4 months
期刊介绍: BMC Genomics is an open access, peer-reviewed journal that considers articles on all aspects of genome-scale analysis, functional genomics, and proteomics. BMC Genomics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信