Larissa Pereira Ribeiro Teodoro, Maik Oliveira Silva, Regimar Garcia dos Santos, Júlia Ferreira de Alcântara, Paulo Carteri Coradi, Bárbara Biduski, Carlos Antonio da Silva Junior, Francisco Eduardo Torres, Paulo Eduardo Teodoro
{"title":"Machine learning for classification of soybean populations for industrial technological variables based on agronomic traits","authors":"Larissa Pereira Ribeiro Teodoro, Maik Oliveira Silva, Regimar Garcia dos Santos, Júlia Ferreira de Alcântara, Paulo Carteri Coradi, Bárbara Biduski, Carlos Antonio da Silva Junior, Francisco Eduardo Torres, Paulo Eduardo Teodoro","doi":"10.1007/s10681-024-03301-w","DOIUrl":null,"url":null,"abstract":"<p>A current challenge of genetic breeding programs is to increase grain yield and protein content and at least maintain oil content. However, evaluations of industrial traits are time and cost-consuming. Thus, achieving accurate models for classifying genotypes with better industrial technological performance based on easier and faster to measure traits, such as agronomic ones, is of paramount importance for soybean breeding programs. The objective was to classify groups of soybean genotypes to industrial technological variables based on agronomic traits measured in the field using machine learning (ML) techniques. Field experiments were carried out in two sites in a randomized block design with two replications and 206 F<sub>2</sub> soybean populations. Agronomic traits evaluated were: days to maturation (DM), first pod height (FPH), plant height (PH), number of branches (NB), main stem diameter (SD), mass of one hundred grains (MHG), and grain yield (GY). Industrial technological variables evaluated were oil yield, crude protein, crude fiber, and ash contents, determined by high-optical accuracy near-infrared spectroscopy (NIRS). The models tested were: support vector machine (SVM), artificial neural network (ANN), decision tree models J48 and REPTree, random forest (RF), and logistic regression (LR, used as control). A genotype clustering was performed using PCA and k-means algorithm, and then the clusters formed were used as output variables of the ML models, while the agronomic traits were used as input variables. ML techniques provided accurate models to classify soybean genotypes for more complex variables (industrial technological) based on agronomic traits. RF outperformed the other models and can be used to contribute to soybean breeding programs by classifying genotypes for industrial technological traits.</p>","PeriodicalId":11803,"journal":{"name":"Euphytica","volume":"112 1","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2024-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Euphytica","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.1007/s10681-024-03301-w","RegionNum":3,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AGRONOMY","Score":null,"Total":0}
引用次数: 0
Abstract
A current challenge of genetic breeding programs is to increase grain yield and protein content and at least maintain oil content. However, evaluations of industrial traits are time and cost-consuming. Thus, achieving accurate models for classifying genotypes with better industrial technological performance based on easier and faster to measure traits, such as agronomic ones, is of paramount importance for soybean breeding programs. The objective was to classify groups of soybean genotypes to industrial technological variables based on agronomic traits measured in the field using machine learning (ML) techniques. Field experiments were carried out in two sites in a randomized block design with two replications and 206 F2 soybean populations. Agronomic traits evaluated were: days to maturation (DM), first pod height (FPH), plant height (PH), number of branches (NB), main stem diameter (SD), mass of one hundred grains (MHG), and grain yield (GY). Industrial technological variables evaluated were oil yield, crude protein, crude fiber, and ash contents, determined by high-optical accuracy near-infrared spectroscopy (NIRS). The models tested were: support vector machine (SVM), artificial neural network (ANN), decision tree models J48 and REPTree, random forest (RF), and logistic regression (LR, used as control). A genotype clustering was performed using PCA and k-means algorithm, and then the clusters formed were used as output variables of the ML models, while the agronomic traits were used as input variables. ML techniques provided accurate models to classify soybean genotypes for more complex variables (industrial technological) based on agronomic traits. RF outperformed the other models and can be used to contribute to soybean breeding programs by classifying genotypes for industrial technological traits.
期刊介绍:
Euphytica is an international journal on theoretical and applied aspects of plant breeding. It publishes critical reviews and papers on the results of original research related to plant breeding.
The integration of modern and traditional plant breeding is a growing field of research using transgenic crop plants and/or marker assisted breeding in combination with traditional breeding tools. The content should cover the interests of researchers directly or indirectly involved in plant breeding, at universities, breeding institutes, seed industries, plant biotech companies and industries using plant raw materials, and promote stability, adaptability and sustainability in agriculture and agro-industries.