Research Article A comparison of regression methods based on dimensional reduction for genomic prediction

IF 0.6 Q4 GENETICS & HEREDITY
J. A. D. Costa, Carolina Azevedo, M. Nascimento, F. F. Silva, M. Resende, A. C. Nascimento
{"title":"Research Article A comparison of regression methods based on dimensional reduction for genomic prediction","authors":"J. A. D. Costa, Carolina Azevedo, M. Nascimento, F. F. Silva, M. Resende, A. C. Nascimento","doi":"10.4238/GMR18877","DOIUrl":null,"url":null,"abstract":". The quality of fit of a multiple linear regression model often encounters multicollinearity and high dimensionality problems, making it impossible to obtain stable estimates through the traditional method of estimation based on ordinary least squares. To overcome such challenges, dimensionality reduction methods have been proposed, because of their simple theory and easy application. We compared three dimensionality reduction methods: Principal Components Regression (PCR), Partial Least Squares (PLS), and Independent Components Regression (ICR). An important step for dimensionality reduction and prediction is selecting the number of components, as it affects the linear combinations of the explanatory variables. The linear combinations are inserted into the model to predict the response based on a reduced number of parameters. We examined the criteria for the selection of the number of components. The dimensionality reduction methods were applied to genomic and phenotype data. We evaluated 370 accessions of Asian rice, Oryza sativa , which were genotyped for 36,901 SNPs markers considered to predict the genomic values for the number of panicles per plant trait. This data set presented multicollinearity and high dimensionality. The computational time for each method was also recorded. Among the methods, PCR and ICR gave the highest accuracy values, with ICR standing out for presenting estimates of the least biased genomic values. However, ICR required more computational time than the other methodologies.","PeriodicalId":12518,"journal":{"name":"Genetics and Molecular Research","volume":"56 1","pages":""},"PeriodicalIF":0.6000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetics and Molecular Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4238/GMR18877","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

. The quality of fit of a multiple linear regression model often encounters multicollinearity and high dimensionality problems, making it impossible to obtain stable estimates through the traditional method of estimation based on ordinary least squares. To overcome such challenges, dimensionality reduction methods have been proposed, because of their simple theory and easy application. We compared three dimensionality reduction methods: Principal Components Regression (PCR), Partial Least Squares (PLS), and Independent Components Regression (ICR). An important step for dimensionality reduction and prediction is selecting the number of components, as it affects the linear combinations of the explanatory variables. The linear combinations are inserted into the model to predict the response based on a reduced number of parameters. We examined the criteria for the selection of the number of components. The dimensionality reduction methods were applied to genomic and phenotype data. We evaluated 370 accessions of Asian rice, Oryza sativa , which were genotyped for 36,901 SNPs markers considered to predict the genomic values for the number of panicles per plant trait. This data set presented multicollinearity and high dimensionality. The computational time for each method was also recorded. Among the methods, PCR and ICR gave the highest accuracy values, with ICR standing out for presenting estimates of the least biased genomic values. However, ICR required more computational time than the other methodologies.
研究论文基于降维的基因组预测回归方法的比较
。多元线性回归模型的拟合质量往往会遇到多重共线性和高维问题,使得传统的基于普通最小二乘的估计方法无法获得稳定的估计。为了克服这些挑战,人们提出了理论简单、易于应用的降维方法。我们比较了三种降维方法:主成分回归(PCR)、偏最小二乘(PLS)和独立成分回归(ICR)。降维和预测的一个重要步骤是选择成分的数量,因为它影响解释变量的线性组合。将线性组合插入到模型中,根据减少的参数数量来预测响应。我们检查了选择组件数量的标准。降维方法应用于基因组和表型数据。我们对370份亚洲水稻材料Oryza sativa进行了基因分型,并对36,901个snp标记进行了基因分型,这些标记被认为可以预测每株水稻穗数性状的基因组价值。该数据集具有多重共线性和高维性。并记录了每种方法的计算时间。在方法中,PCR和ICR给出了最高的准确性值,其中ICR因提出最小偏差基因组值的估计而脱颖而出。然而,ICR比其他方法需要更多的计算时间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Genetics and Molecular Research
Genetics and Molecular Research 生物-生化与分子生物学
CiteScore
1.00
自引率
25.00%
发文量
7
审稿时长
3 months
期刊介绍: Genetics and Molecular Research (GMR), maintained by the Research Foundation of Ribeirão Preto (Fundação de Pesquisas Científicas de Ribeirão Preto), publishes high quality research in genetics and molecular biology. GMR reflects the full breadth and interdisciplinary nature of this research by publishing outstanding original contributions in all areas of biology. GMR publishes human studies, as well as research on model organisms—from mice and flies, to plants and bacteria. Our emphasis is on studies of broad interest that provide significant insight into a biological process or processes. Topics include, but are not limited to gene discovery and function, population genetics, evolution, genome projects, comparative and functional genomics, molecular analysis of simple and complex genetic traits, cancer genetics, medical genetics, disease biology, agricultural genomics, developmental genetics, regulatory variation in gene expression, pharmacological genomics, evolution, gene expression, chromosome biology, and epigenetics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信