Zachary James Winn, Emily Hudson-Arns, Mikayla Hammers, Noah DeWitt, Jeanette Lyerly, Guihua Bai, Paul St Amand, Punya Nachappa, Scott Haley, Richard Esten Mason
{"title":"HaploCatcher:一个预测单倍型的R包。","authors":"Zachary James Winn, Emily Hudson-Arns, Mikayla Hammers, Noah DeWitt, Jeanette Lyerly, Guihua Bai, Paul St Amand, Punya Nachappa, Scott Haley, Richard Esten Mason","doi":"10.1002/tpg2.20412","DOIUrl":null,"url":null,"abstract":"<p><p>Wheat (Triticum aestivum L.) is crucial to global food security but is often threatened by diseases, pests, and environmental stresses. Wheat-stem sawfly (Cephus cinctus Norton) poses a major threat to food security in the United States, and solid-stem varieties, which carry the stem-solidness locus (Sst1), are the main source of genetic resistance against sawfly. Marker-assisted selection uses molecular markers to identify lines possessing beneficial haplotypes, like that of the Sst1 locus. In this study, an R package titled \"HaploCatcher\" was developed to predict specific haplotypes of interest in genome-wide genotyped lines. A training population of 1056 lines genotyped for the Sst1 locus, known to confer stem solidness, and genome-wide markers was curated to make predictions of the Sst1 haplotypes for 292 lines from the Colorado State University wheat breeding program. Predicted Sst1 haplotypes were compared to marker-derived haplotypes. Our results indicated that the training set was substantially predictive, with kappa scores of 0.83 for k-nearest neighbors and 0.88 for random forest models. Forward validation on newly developed breeding lines demonstrated that a random forest model, trained on the total available training data, had comparable accuracy between forward and cross-validation. Estimated group means of lines classified by haplotypes from PCR-derived markers and predictive modeling did not significantly differ. The HaploCatcher package is freely available and may be utilized by breeding programs, using their own training populations, to predict haplotypes for whole-genome sequenced early generation material.</p>","PeriodicalId":49002,"journal":{"name":"Plant Genome","volume":null,"pages":null},"PeriodicalIF":3.9000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"HaploCatcher: An R package for prediction of haplotypes.\",\"authors\":\"Zachary James Winn, Emily Hudson-Arns, Mikayla Hammers, Noah DeWitt, Jeanette Lyerly, Guihua Bai, Paul St Amand, Punya Nachappa, Scott Haley, Richard Esten Mason\",\"doi\":\"10.1002/tpg2.20412\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Wheat (Triticum aestivum L.) is crucial to global food security but is often threatened by diseases, pests, and environmental stresses. Wheat-stem sawfly (Cephus cinctus Norton) poses a major threat to food security in the United States, and solid-stem varieties, which carry the stem-solidness locus (Sst1), are the main source of genetic resistance against sawfly. Marker-assisted selection uses molecular markers to identify lines possessing beneficial haplotypes, like that of the Sst1 locus. In this study, an R package titled \\\"HaploCatcher\\\" was developed to predict specific haplotypes of interest in genome-wide genotyped lines. A training population of 1056 lines genotyped for the Sst1 locus, known to confer stem solidness, and genome-wide markers was curated to make predictions of the Sst1 haplotypes for 292 lines from the Colorado State University wheat breeding program. Predicted Sst1 haplotypes were compared to marker-derived haplotypes. Our results indicated that the training set was substantially predictive, with kappa scores of 0.83 for k-nearest neighbors and 0.88 for random forest models. Forward validation on newly developed breeding lines demonstrated that a random forest model, trained on the total available training data, had comparable accuracy between forward and cross-validation. Estimated group means of lines classified by haplotypes from PCR-derived markers and predictive modeling did not significantly differ. The HaploCatcher package is freely available and may be utilized by breeding programs, using their own training populations, to predict haplotypes for whole-genome sequenced early generation material.</p>\",\"PeriodicalId\":49002,\"journal\":{\"name\":\"Plant Genome\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2024-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Plant Genome\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1002/tpg2.20412\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/11/15 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Plant Genome","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/tpg2.20412","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/11/15 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
HaploCatcher: An R package for prediction of haplotypes.
Wheat (Triticum aestivum L.) is crucial to global food security but is often threatened by diseases, pests, and environmental stresses. Wheat-stem sawfly (Cephus cinctus Norton) poses a major threat to food security in the United States, and solid-stem varieties, which carry the stem-solidness locus (Sst1), are the main source of genetic resistance against sawfly. Marker-assisted selection uses molecular markers to identify lines possessing beneficial haplotypes, like that of the Sst1 locus. In this study, an R package titled "HaploCatcher" was developed to predict specific haplotypes of interest in genome-wide genotyped lines. A training population of 1056 lines genotyped for the Sst1 locus, known to confer stem solidness, and genome-wide markers was curated to make predictions of the Sst1 haplotypes for 292 lines from the Colorado State University wheat breeding program. Predicted Sst1 haplotypes were compared to marker-derived haplotypes. Our results indicated that the training set was substantially predictive, with kappa scores of 0.83 for k-nearest neighbors and 0.88 for random forest models. Forward validation on newly developed breeding lines demonstrated that a random forest model, trained on the total available training data, had comparable accuracy between forward and cross-validation. Estimated group means of lines classified by haplotypes from PCR-derived markers and predictive modeling did not significantly differ. The HaploCatcher package is freely available and may be utilized by breeding programs, using their own training populations, to predict haplotypes for whole-genome sequenced early generation material.
期刊介绍:
The Plant Genome publishes original research investigating all aspects of plant genomics. Technical breakthroughs reporting improvements in the efficiency and speed of acquiring and interpreting plant genomics data are welcome. The editorial board gives preference to novel reports that use innovative genomic applications that advance our understanding of plant biology that may have applications to crop improvement. The journal also publishes invited review articles and perspectives that offer insight and commentary on recent advances in genomics and their potential for agronomic improvement.