Fast3VmrMLM: A fast algorithm that integrates genome-wide scanning with machine learning to accelerate gene mining and breeding by design for polygenic traits in large-scale GWAS datasets.

IF 9.4 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY
Jingtian Wang, Ying Chen, Guoping Shu, Miaomiao Zhao, Ao Zheng, Xiaoyu Chang, Guiqi Li, Yibo Wang, Yuan-Ming Zhang
{"title":"Fast3VmrMLM: A fast algorithm that integrates genome-wide scanning with machine learning to accelerate gene mining and breeding by design for polygenic traits in large-scale GWAS datasets.","authors":"Jingtian Wang, Ying Chen, Guoping Shu, Miaomiao Zhao, Ao Zheng, Xiaoyu Chang, Guiqi Li, Yibo Wang, Yuan-Ming Zhang","doi":"10.1016/j.xplc.2025.101385","DOIUrl":null,"url":null,"abstract":"<p><p>Genetic dissection and breeding by design for polygenic traits remain challenges. To meet these challenges, it is important to identify as many genes as possible and key genes. Therefore, here, a genome-wide scanning plus machine learning framework was developed and integrated with advanced computational techniques to propose a novel algorithm called Fast3VmrMLM to mine more and key genes for polygenic traits in the era of big data and artificial intelligence. The algorithm was also extended to identify haplotype (Fast3VmrMLM-Hap) and molecular (Fast3VmrMLM-mQTL) variants. In simulation studies, Fast3VmrMLM outperformed existing methods in detecting dominant, small and rare variants, taking 3.30 and 5.43 hours (20 threads) to analyze the 18K rice and UK biobank-scale datasets, respectively. Fast3VmrMLM identified more known (211) and candidate (384) genes for 14 traits in the 18K rice dataset than FarmCPU (100 known genes), while Fast3VmrMLM identified 26 known and 24 candidate genes for 7 yield-related traits in a maize NC II design and Fast3VmrMLM-mQTL identified two known soybean genes around structural variants. We demonstrated that the new two-step framework outperformed genome-wide scanning alone. In breeding by design, a genetic network constructed by machine learning using all known/candidate genes in this study identified 21 key genes for rice yield-related traits, while all the associated markers gave high prediction accuracies in rice (0.7443) and maize (0.8492) and excellent hybrid combinations. A new breeding by design strategy based on the identified key genes was also proposed. This study provides an excellent method for gene mining and breeding by design.</p>","PeriodicalId":52373,"journal":{"name":"Plant Communications","volume":" ","pages":"101385"},"PeriodicalIF":9.4000,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Plant Communications","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/j.xplc.2025.101385","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Genetic dissection and breeding by design for polygenic traits remain challenges. To meet these challenges, it is important to identify as many genes as possible and key genes. Therefore, here, a genome-wide scanning plus machine learning framework was developed and integrated with advanced computational techniques to propose a novel algorithm called Fast3VmrMLM to mine more and key genes for polygenic traits in the era of big data and artificial intelligence. The algorithm was also extended to identify haplotype (Fast3VmrMLM-Hap) and molecular (Fast3VmrMLM-mQTL) variants. In simulation studies, Fast3VmrMLM outperformed existing methods in detecting dominant, small and rare variants, taking 3.30 and 5.43 hours (20 threads) to analyze the 18K rice and UK biobank-scale datasets, respectively. Fast3VmrMLM identified more known (211) and candidate (384) genes for 14 traits in the 18K rice dataset than FarmCPU (100 known genes), while Fast3VmrMLM identified 26 known and 24 candidate genes for 7 yield-related traits in a maize NC II design and Fast3VmrMLM-mQTL identified two known soybean genes around structural variants. We demonstrated that the new two-step framework outperformed genome-wide scanning alone. In breeding by design, a genetic network constructed by machine learning using all known/candidate genes in this study identified 21 key genes for rice yield-related traits, while all the associated markers gave high prediction accuracies in rice (0.7443) and maize (0.8492) and excellent hybrid combinations. A new breeding by design strategy based on the identified key genes was also proposed. This study provides an excellent method for gene mining and breeding by design.

Fast3VmrMLM:一种集成了全基因组扫描和机器学习的快速算法,通过设计大规模GWAS数据集中的多基因性状来加速基因挖掘和育种。
遗传解剖和多基因性状的设计育种仍然是挑战。为了应对这些挑战,确定尽可能多的基因和关键基因是很重要的。因此,本文开发了全基因组扫描+机器学习框架,并结合先进的计算技术,提出了一种名为Fast3VmrMLM的新算法,以挖掘大数据和人工智能时代多基因性状的更多关键基因。该算法还被扩展到识别单倍型(Fast3VmrMLM-Hap)和分子(Fast3VmrMLM-mQTL)变异。在模拟研究中,Fast3VmrMLM在检测显性、小变异和罕见变异方面优于现有方法,分别用3.30和5.43小时(20个线程)分析了18K水稻和英国生物银行规模的数据集。Fast3VmrMLM在18K水稻数据集中鉴定出的14个性状的已知基因(211个)和候选基因(384个)多于FarmCPU(100个已知基因),而Fast3VmrMLM在玉米NC II设计中鉴定出7个产量相关性状的已知基因(26个)和候选基因(24个),Fast3VmrMLM- mqtl鉴定出2个已知的大豆结构变异基因。我们证明了新的两步框架优于单独的全基因组扫描。在设计育种方面,利用机器学习构建的遗传网络,利用本研究中所有已知/候选基因,确定了21个水稻产量相关性状的关键基因,所有相关标记在水稻(0.7443)和玉米(0.8492)中具有较高的预测精度和优良的杂交组合。并提出了一种基于鉴定出的关键基因的设计育种策略。本研究为基因挖掘和设计育种提供了良好的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Plant Communications
Plant Communications Agricultural and Biological Sciences-Plant Science
CiteScore
15.70
自引率
5.70%
发文量
105
审稿时长
6 weeks
期刊介绍: Plant Communications is an open access publishing platform that supports the global plant science community. It publishes original research, review articles, technical advances, and research resources in various areas of plant sciences. The scope of topics includes evolution, ecology, physiology, biochemistry, development, reproduction, metabolism, molecular and cellular biology, genetics, genomics, environmental interactions, biotechnology, breeding of higher and lower plants, and their interactions with other organisms. The goal of Plant Communications is to provide a high-quality platform for the dissemination of plant science research.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信