HaploCatcher:一个预测单倍型的R包。

IF 3.9 2区 生物学 Q1 GENETICS & HEREDITY
Plant Genome Pub Date : 2024-03-01 Epub Date: 2023-11-15 DOI:10.1002/tpg2.20412
Zachary James Winn, Emily Hudson-Arns, Mikayla Hammers, Noah DeWitt, Jeanette Lyerly, Guihua Bai, Paul St Amand, Punya Nachappa, Scott Haley, Richard Esten Mason
{"title":"HaploCatcher:一个预测单倍型的R包。","authors":"Zachary James Winn, Emily Hudson-Arns, Mikayla Hammers, Noah DeWitt, Jeanette Lyerly, Guihua Bai, Paul St Amand, Punya Nachappa, Scott Haley, Richard Esten Mason","doi":"10.1002/tpg2.20412","DOIUrl":null,"url":null,"abstract":"<p><p>Wheat (Triticum aestivum L.) is crucial to global food security but is often threatened by diseases, pests, and environmental stresses. Wheat-stem sawfly (Cephus cinctus Norton) poses a major threat to food security in the United States, and solid-stem varieties, which carry the stem-solidness locus (Sst1), are the main source of genetic resistance against sawfly. Marker-assisted selection uses molecular markers to identify lines possessing beneficial haplotypes, like that of the Sst1 locus. In this study, an R package titled \"HaploCatcher\" was developed to predict specific haplotypes of interest in genome-wide genotyped lines. A training population of 1056 lines genotyped for the Sst1 locus, known to confer stem solidness, and genome-wide markers was curated to make predictions of the Sst1 haplotypes for 292 lines from the Colorado State University wheat breeding program. Predicted Sst1 haplotypes were compared to marker-derived haplotypes. Our results indicated that the training set was substantially predictive, with kappa scores of 0.83 for k-nearest neighbors and 0.88 for random forest models. Forward validation on newly developed breeding lines demonstrated that a random forest model, trained on the total available training data, had comparable accuracy between forward and cross-validation. Estimated group means of lines classified by haplotypes from PCR-derived markers and predictive modeling did not significantly differ. The HaploCatcher package is freely available and may be utilized by breeding programs, using their own training populations, to predict haplotypes for whole-genome sequenced early generation material.</p>","PeriodicalId":49002,"journal":{"name":"Plant Genome","volume":null,"pages":null},"PeriodicalIF":3.9000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"HaploCatcher: An R package for prediction of haplotypes.\",\"authors\":\"Zachary James Winn, Emily Hudson-Arns, Mikayla Hammers, Noah DeWitt, Jeanette Lyerly, Guihua Bai, Paul St Amand, Punya Nachappa, Scott Haley, Richard Esten Mason\",\"doi\":\"10.1002/tpg2.20412\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Wheat (Triticum aestivum L.) is crucial to global food security but is often threatened by diseases, pests, and environmental stresses. Wheat-stem sawfly (Cephus cinctus Norton) poses a major threat to food security in the United States, and solid-stem varieties, which carry the stem-solidness locus (Sst1), are the main source of genetic resistance against sawfly. Marker-assisted selection uses molecular markers to identify lines possessing beneficial haplotypes, like that of the Sst1 locus. In this study, an R package titled \\\"HaploCatcher\\\" was developed to predict specific haplotypes of interest in genome-wide genotyped lines. A training population of 1056 lines genotyped for the Sst1 locus, known to confer stem solidness, and genome-wide markers was curated to make predictions of the Sst1 haplotypes for 292 lines from the Colorado State University wheat breeding program. Predicted Sst1 haplotypes were compared to marker-derived haplotypes. Our results indicated that the training set was substantially predictive, with kappa scores of 0.83 for k-nearest neighbors and 0.88 for random forest models. Forward validation on newly developed breeding lines demonstrated that a random forest model, trained on the total available training data, had comparable accuracy between forward and cross-validation. Estimated group means of lines classified by haplotypes from PCR-derived markers and predictive modeling did not significantly differ. The HaploCatcher package is freely available and may be utilized by breeding programs, using their own training populations, to predict haplotypes for whole-genome sequenced early generation material.</p>\",\"PeriodicalId\":49002,\"journal\":{\"name\":\"Plant Genome\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2024-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Plant Genome\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1002/tpg2.20412\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/11/15 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Plant Genome","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/tpg2.20412","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/11/15 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

摘要

小麦(Triticum aestivum L.)对全球粮食安全至关重要,但经常受到病虫害和环境压力的威胁。麦秆锯蝇(Cephus cintus Norton)对美国的粮食安全构成重大威胁,而携带茎固性位点(Sst1)的实茎品种是抗锯蝇遗传抗性的主要来源。标记辅助选择使用分子标记来识别具有有益单倍型的系,如Sst1位点。在这项研究中,开发了一个名为“HaploCatcher”的R包来预测全基因组基因型系中感兴趣的特定单倍型。利用1056个Sst1基因型的训练群体和全基因组标记,对来自科罗拉多州立大学小麦育种项目的292个品种的Sst1单倍型进行预测。将预测的Sst1单倍型与标记源单倍型进行比较。我们的结果表明,训练集具有很强的预测性,k近邻的kappa分数为0.83,随机森林模型的kappa分数为0.88。对新开发的育种品系进行前向验证表明,随机森林模型在前向验证和交叉验证之间具有相当的准确性。根据pcr衍生标记和预测模型进行单倍型分类的品系的估计群均值没有显著差异。HaploCatcher软件包是免费的,可以被育种项目使用,使用他们自己的训练群体,来预测全基因组测序的早期材料的单倍型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
HaploCatcher: An R package for prediction of haplotypes.

Wheat (Triticum aestivum L.) is crucial to global food security but is often threatened by diseases, pests, and environmental stresses. Wheat-stem sawfly (Cephus cinctus Norton) poses a major threat to food security in the United States, and solid-stem varieties, which carry the stem-solidness locus (Sst1), are the main source of genetic resistance against sawfly. Marker-assisted selection uses molecular markers to identify lines possessing beneficial haplotypes, like that of the Sst1 locus. In this study, an R package titled "HaploCatcher" was developed to predict specific haplotypes of interest in genome-wide genotyped lines. A training population of 1056 lines genotyped for the Sst1 locus, known to confer stem solidness, and genome-wide markers was curated to make predictions of the Sst1 haplotypes for 292 lines from the Colorado State University wheat breeding program. Predicted Sst1 haplotypes were compared to marker-derived haplotypes. Our results indicated that the training set was substantially predictive, with kappa scores of 0.83 for k-nearest neighbors and 0.88 for random forest models. Forward validation on newly developed breeding lines demonstrated that a random forest model, trained on the total available training data, had comparable accuracy between forward and cross-validation. Estimated group means of lines classified by haplotypes from PCR-derived markers and predictive modeling did not significantly differ. The HaploCatcher package is freely available and may be utilized by breeding programs, using their own training populations, to predict haplotypes for whole-genome sequenced early generation material.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Plant Genome
Plant Genome PLANT SCIENCES-GENETICS & HEREDITY
CiteScore
6.00
自引率
4.80%
发文量
93
审稿时长
>12 weeks
期刊介绍: The Plant Genome publishes original research investigating all aspects of plant genomics. Technical breakthroughs reporting improvements in the efficiency and speed of acquiring and interpreting plant genomics data are welcome. The editorial board gives preference to novel reports that use innovative genomic applications that advance our understanding of plant biology that may have applications to crop improvement. The journal also publishes invited review articles and perspectives that offer insight and commentary on recent advances in genomics and their potential for agronomic improvement.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信