BLUPmrMLM：全基因组关联研究中的快速 mrMLM 算法

Genomics, Proteomics & Bioinformatics Pub Date : 2024-02-29 DOI:10.1093/gpbjnl/qzae020

Hong-Fu Li, Jing-Tian Wang, Qiong Zhao, Yuan-Ming Zhang

{"title":"BLUPmrMLM：全基因组关联研究中的快速 mrMLM 算法","authors":"Hong-Fu Li, Jing-Tian Wang, Qiong Zhao, Yuan-Ming Zhang","doi":"10.1093/gpbjnl/qzae020","DOIUrl":null,"url":null,"abstract":"\n Multilocus genome-wide association study has become the state-of-the-art tool for dissecting the genetic architecture of complex and multiomic traits. However, most existing multilocus methods require relatively long computational time when analyzing large datasets. To address this issue, in this study, we propose a fast mrMLM method, namely, best linear unbiased prediction multilocus random-SNP-effect mixed linear model (BLUPmrMLM). First, genome-wide single-marker scanning in mrMLM is replaced by vectorized Wald tests based on the best linear unbiased prediction (BLUP) values of marker effects and their variances in BLUPmrMLM. Then, adaptive best subset selection is used to identify potentially associated markers on each chromosome to reduce computational time when estimating marker effects via empirical bayes. Finally, shared memory and parallel computing schemes were used to reduce the computation time. In simulation studies, BLUPmrMLM outperformed GEMMA, EMMAX, mrMLM, FarmCPU, and the control method of BLUPmrMLM in terms of computational time, power, accuracy for estimating quantitative trait nucleotide positions and effects, false positive rate, false discovery rate, false negative rate, and F1 score. According to the reanalysis of two large rice datasets, compared with the above methods, BLUPmrMLM significantly reduced the computation time and identified more previously reported genes. This study provides an excellent multilocus model method for the analysis of large-scale and multiomic datasets. The software mrMLM v5.1 is available at BioCode (https://ngdc.cncb.ac.cn/biocode/tools/BT007388) or GitHub (https://github.com/YuanmingZhang65/mrMLM).","PeriodicalId":170516,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"24 7","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"BLUPmrMLM: A Fast mrMLM Algorithm in Genome-wide Association Studies\",\"authors\":\"Hong-Fu Li, Jing-Tian Wang, Qiong Zhao, Yuan-Ming Zhang\",\"doi\":\"10.1093/gpbjnl/qzae020\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n Multilocus genome-wide association study has become the state-of-the-art tool for dissecting the genetic architecture of complex and multiomic traits. However, most existing multilocus methods require relatively long computational time when analyzing large datasets. To address this issue, in this study, we propose a fast mrMLM method, namely, best linear unbiased prediction multilocus random-SNP-effect mixed linear model (BLUPmrMLM). First, genome-wide single-marker scanning in mrMLM is replaced by vectorized Wald tests based on the best linear unbiased prediction (BLUP) values of marker effects and their variances in BLUPmrMLM. Then, adaptive best subset selection is used to identify potentially associated markers on each chromosome to reduce computational time when estimating marker effects via empirical bayes. Finally, shared memory and parallel computing schemes were used to reduce the computation time. In simulation studies, BLUPmrMLM outperformed GEMMA, EMMAX, mrMLM, FarmCPU, and the control method of BLUPmrMLM in terms of computational time, power, accuracy for estimating quantitative trait nucleotide positions and effects, false positive rate, false discovery rate, false negative rate, and F1 score. According to the reanalysis of two large rice datasets, compared with the above methods, BLUPmrMLM significantly reduced the computation time and identified more previously reported genes. This study provides an excellent multilocus model method for the analysis of large-scale and multiomic datasets. The software mrMLM v5.1 is available at BioCode (https://ngdc.cncb.ac.cn/biocode/tools/BT007388) or GitHub (https://github.com/YuanmingZhang65/mrMLM).\",\"PeriodicalId\":170516,\"journal\":{\"name\":\"Genomics, Proteomics & Bioinformatics\",\"volume\":\"24 7\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genomics, Proteomics & Bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/gpbjnl/qzae020\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genomics, Proteomics & Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/gpbjnl/qzae020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

多焦点全基因组关联研究已成为剖析复杂多基因组性状遗传结构的最先进工具。然而，现有的大多数多焦点方法在分析大型数据集时需要相对较长的计算时间。针对这一问题，本研究提出了一种快速 mrMLM 方法，即最佳线性无偏预测多焦点随机-SNP 效应混合线性模型（BLUPmrMLM）。首先，mrMLM 中的全基因组单标记扫描被 BLUPmrMLM 中基于标记效应及其方差的最佳线性无偏预测（BLUP）值的矢量化 Wald 检验所取代。然后，在通过经验贝叶斯估计标记效应时，使用自适应最佳子集选择来识别每个染色体上可能相关的标记，以减少计算时间。最后，使用共享内存和并行计算方案来减少计算时间。在模拟研究中，BLUPmrMLM 在计算时间、功率、估计数量性状核苷酸位置和效应的准确性、假阳性率、假发现率、假阴性率和 F1 分数等方面都优于 GEMMA、EMMAX、mrMLM、FarmCPU 和 BLUPmrMLM 的控制方法。根据对两个大型水稻数据集的重新分析，与上述方法相比，BLUPmrMLM 大大缩短了计算时间，并发现了更多以前报道过的基因。这项研究为分析大规模和多组数据集提供了一种很好的多焦点模型方法。mrMLM v5.1 软件可在 BioCode (https://ngdc.cncb.ac.cn/biocode/tools/BT007388) 或 GitHub (https://github.com/YuanmingZhang65/mrMLM) 上获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

BLUPmrMLM: A Fast mrMLM Algorithm in Genome-wide Association Studies

Multilocus genome-wide association study has become the state-of-the-art tool for dissecting the genetic architecture of complex and multiomic traits. However, most existing multilocus methods require relatively long computational time when analyzing large datasets. To address this issue, in this study, we propose a fast mrMLM method, namely, best linear unbiased prediction multilocus random-SNP-effect mixed linear model (BLUPmrMLM). First, genome-wide single-marker scanning in mrMLM is replaced by vectorized Wald tests based on the best linear unbiased prediction (BLUP) values of marker effects and their variances in BLUPmrMLM. Then, adaptive best subset selection is used to identify potentially associated markers on each chromosome to reduce computational time when estimating marker effects via empirical bayes. Finally, shared memory and parallel computing schemes were used to reduce the computation time. In simulation studies, BLUPmrMLM outperformed GEMMA, EMMAX, mrMLM, FarmCPU, and the control method of BLUPmrMLM in terms of computational time, power, accuracy for estimating quantitative trait nucleotide positions and effects, false positive rate, false discovery rate, false negative rate, and F1 score. According to the reanalysis of two large rice datasets, compared with the above methods, BLUPmrMLM significantly reduced the computation time and identified more previously reported genes. This study provides an excellent multilocus model method for the analysis of large-scale and multiomic datasets. The software mrMLM v5.1 is available at BioCode (https://ngdc.cncb.ac.cn/biocode/tools/BT007388) or GitHub (https://github.com/YuanmingZhang65/mrMLM).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Genomics, Proteomics & Bioinformatics

自引率

0.00%

发文量