A. Behpouri, S. Farokhzadeh, Z. Zinati, Zobeir Khosravi
{"title":"Use of multivariate analysis and machine learning methods to characterize traits contributing to wheat yield diversity","authors":"A. Behpouri, S. Farokhzadeh, Z. Zinati, Zobeir Khosravi","doi":"10.5424/sjar/2023211-19835","DOIUrl":null,"url":null,"abstract":"Aim of study: Regarding the third largest staple food crop in the world, determining the factors affecting wheat yield is of great importance. This study aimed to determine useful subsets of agronomic traits and evaluate the order of importance of traits in grain yield. \nArea of study: Fars province, Iran. \nMaterial and methods: In total, the data corresponding to 22 agronomic traits was collected from six different regions (Darab, Kavar, Marvdasht, Fasa, Lar, and Khonj) of 90 farms of Fars province, Iran as the most important wheat-growing regions. Multivariate statistical analysis (correlation, stepwise regression, and principal component analysis (PCA)) and machine learning modeling approaches, such as partial least squares regression (PLSR) and support vector regression (SVR) models, were applied to agronomic traits. \nMain results: The findings, based on integrated approaches such as correlation, stepwise regression, and PCA, highlighted that number of spikes m-2, grain number spike-1, and thousand-grain weight had a major impact on the yield followed by awn length, spike length, narrow leaf herbicide, broadleaf herbicide, time to plant maturity (month), and soil salinity. Besides, PLSR with nine inputs (nine selected traits) displayed better prediction capability (R2=85 %, RMSE=0.32, MSE=0.10, and BIAS=-0.05) than that with all twenty-two input traits. \nResearch highlights: Integrated multivariate statistical analyses and machine learning regression methods could be a powerful tool in determining traits that have a significant impact on yield. These achievements can be considered for future breeding programs.","PeriodicalId":22182,"journal":{"name":"Spanish Journal of Agricultural Research","volume":null,"pages":null},"PeriodicalIF":0.8000,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Spanish Journal of Agricultural Research","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.5424/sjar/2023211-19835","RegionNum":4,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"AGRICULTURE, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Aim of study: Regarding the third largest staple food crop in the world, determining the factors affecting wheat yield is of great importance. This study aimed to determine useful subsets of agronomic traits and evaluate the order of importance of traits in grain yield.
Area of study: Fars province, Iran.
Material and methods: In total, the data corresponding to 22 agronomic traits was collected from six different regions (Darab, Kavar, Marvdasht, Fasa, Lar, and Khonj) of 90 farms of Fars province, Iran as the most important wheat-growing regions. Multivariate statistical analysis (correlation, stepwise regression, and principal component analysis (PCA)) and machine learning modeling approaches, such as partial least squares regression (PLSR) and support vector regression (SVR) models, were applied to agronomic traits.
Main results: The findings, based on integrated approaches such as correlation, stepwise regression, and PCA, highlighted that number of spikes m-2, grain number spike-1, and thousand-grain weight had a major impact on the yield followed by awn length, spike length, narrow leaf herbicide, broadleaf herbicide, time to plant maturity (month), and soil salinity. Besides, PLSR with nine inputs (nine selected traits) displayed better prediction capability (R2=85 %, RMSE=0.32, MSE=0.10, and BIAS=-0.05) than that with all twenty-two input traits.
Research highlights: Integrated multivariate statistical analyses and machine learning regression methods could be a powerful tool in determining traits that have a significant impact on yield. These achievements can be considered for future breeding programs.
期刊介绍:
The Spanish Journal of Agricultural Research (SJAR) is a quarterly international journal that accepts research articles, reviews and short communications of content related to agriculture. Research articles and short communications must report original work not previously published in any language and not under consideration for publication elsewhere.
The main aim of SJAR is to publish papers that report research findings on the following topics: agricultural economics; agricultural engineering; agricultural environment and ecology; animal breeding, genetics and reproduction; animal health and welfare; animal production; plant breeding, genetics and genetic resources; plant physiology; plant production (field and horticultural crops); plant protection; soil science; and water management.