Benchmarking Parametric and Machine Learning Models for Genomic Prediction of Complex Traits.

Christina B Azodi, Emily Bolger, Andrew McCarren, Mark Roantree, Gustavo de Los Campos, Shin-Han Shiu
{"title":"Benchmarking Parametric and Machine Learning Models for Genomic Prediction of Complex Traits.","authors":"Christina B Azodi, Emily Bolger, Andrew McCarren, Mark Roantree, Gustavo de Los Campos, Shin-Han Shiu","doi":"10.1534/g3.119.400498","DOIUrl":null,"url":null,"abstract":"<p><p>The usefulness of genomic prediction in crop and livestock breeding programs has prompted efforts to develop new and improved genomic prediction algorithms, such as artificial neural networks and gradient tree boosting. However, the performance of these algorithms has not been compared in a systematic manner using a wide range of datasets and models. Using data of 18 traits across six plant species with different marker densities and training population sizes, we compared the performance of six linear and six non-linear algorithms. First, we found that hyperparameter selection was necessary for all non-linear algorithms and that feature selection prior to model training was critical for artificial neural networks when the markers greatly outnumbered the number of training lines. Across all species and trait combinations, no one algorithm performed best, however predictions based on a combination of results from multiple algorithms (<i>i.e.</i>, ensemble predictions) performed consistently well. While linear and non-linear algorithms performed best for a similar number of traits, the performance of non-linear algorithms vary more between traits. Although artificial neural networks did not perform best for any trait, we identified strategies (<i>i.e.</i>, feature selection, seeded starting weights) that boosted their performance to near the level of other algorithms. Our results highlight the importance of algorithm selection for the prediction of trait values.</p>","PeriodicalId":31358,"journal":{"name":"ILIRIA International Review","volume":"8 1","pages":"3691-3702"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6829122/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ILIRIA International Review","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1534/g3.119.400498","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The usefulness of genomic prediction in crop and livestock breeding programs has prompted efforts to develop new and improved genomic prediction algorithms, such as artificial neural networks and gradient tree boosting. However, the performance of these algorithms has not been compared in a systematic manner using a wide range of datasets and models. Using data of 18 traits across six plant species with different marker densities and training population sizes, we compared the performance of six linear and six non-linear algorithms. First, we found that hyperparameter selection was necessary for all non-linear algorithms and that feature selection prior to model training was critical for artificial neural networks when the markers greatly outnumbered the number of training lines. Across all species and trait combinations, no one algorithm performed best, however predictions based on a combination of results from multiple algorithms (i.e., ensemble predictions) performed consistently well. While linear and non-linear algorithms performed best for a similar number of traits, the performance of non-linear algorithms vary more between traits. Although artificial neural networks did not perform best for any trait, we identified strategies (i.e., feature selection, seeded starting weights) that boosted their performance to near the level of other algorithms. Our results highlight the importance of algorithm selection for the prediction of trait values.

用于复杂性状基因组预测的参数模型和机器学习模型基准。
基因组预测在作物和牲畜育种计划中的作用促使人们努力开发新的和改进的基因组预测算法,如人工神经网络和梯度树增强算法。然而,这些算法的性能尚未通过广泛的数据集和模型进行系统比较。我们利用六种植物的 18 个性状数据,以不同的标记密度和训练群体大小,比较了六种线性算法和六种非线性算法的性能。首先,我们发现超参数选择对所有非线性算法都是必要的,当标记数量大大超过训练线数量时,模型训练前的特征选择对人工神经网络至关重要。在所有物种和性状组合中,没有一种算法表现最好,但是基于多种算法结果组合的预测(即集合预测)表现一直很好。虽然线性和非线性算法在类似数量的性状上表现最佳,但非线性算法在不同性状上的表现差异较大。虽然人工神经网络在任何性状上的表现都不是最好的,但我们发现了一些策略(如特征选择、种子起始权重)能将其性能提升到接近其他算法的水平。我们的研究结果凸显了算法选择对预测性状值的重要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
审稿时长
6 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信