Machine learning for classification of soybean populations for industrial technological variables based on agronomic traits

IF 1.7 3区农林科学 Q2 AGRONOMY

Euphytica Pub Date : 2024-02-22 DOI:10.1007/s10681-024-03301-w

Larissa Pereira Ribeiro Teodoro, Maik Oliveira Silva, Regimar Garcia dos Santos, Júlia Ferreira de Alcântara, Paulo Carteri Coradi, Bárbara Biduski, Carlos Antonio da Silva Junior, Francisco Eduardo Torres, Paulo Eduardo Teodoro

{"title":"Machine learning for classification of soybean populations for industrial technological variables based on agronomic traits","authors":"Larissa Pereira Ribeiro Teodoro, Maik Oliveira Silva, Regimar Garcia dos Santos, Júlia Ferreira de Alcântara, Paulo Carteri Coradi, Bárbara Biduski, Carlos Antonio da Silva Junior, Francisco Eduardo Torres, Paulo Eduardo Teodoro","doi":"10.1007/s10681-024-03301-w","DOIUrl":null,"url":null,"abstract":"<p>A current challenge of genetic breeding programs is to increase grain yield and protein content and at least maintain oil content. However, evaluations of industrial traits are time and cost-consuming. Thus, achieving accurate models for classifying genotypes with better industrial technological performance based on easier and faster to measure traits, such as agronomic ones, is of paramount importance for soybean breeding programs. The objective was to classify groups of soybean genotypes to industrial technological variables based on agronomic traits measured in the field using machine learning (ML) techniques. Field experiments were carried out in two sites in a randomized block design with two replications and 206 F<sub>2</sub> soybean populations. Agronomic traits evaluated were: days to maturation (DM), first pod height (FPH), plant height (PH), number of branches (NB), main stem diameter (SD), mass of one hundred grains (MHG), and grain yield (GY). Industrial technological variables evaluated were oil yield, crude protein, crude fiber, and ash contents, determined by high-optical accuracy near-infrared spectroscopy (NIRS). The models tested were: support vector machine (SVM), artificial neural network (ANN), decision tree models J48 and REPTree, random forest (RF), and logistic regression (LR, used as control). A genotype clustering was performed using PCA and k-means algorithm, and then the clusters formed were used as output variables of the ML models, while the agronomic traits were used as input variables. ML techniques provided accurate models to classify soybean genotypes for more complex variables (industrial technological) based on agronomic traits. RF outperformed the other models and can be used to contribute to soybean breeding programs by classifying genotypes for industrial technological traits.</p>","PeriodicalId":11803,"journal":{"name":"Euphytica","volume":"112 1","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2024-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Euphytica","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.1007/s10681-024-03301-w","RegionNum":3,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AGRONOMY","Score":null,"Total":0}

引用次数: 0

Abstract

A current challenge of genetic breeding programs is to increase grain yield and protein content and at least maintain oil content. However, evaluations of industrial traits are time and cost-consuming. Thus, achieving accurate models for classifying genotypes with better industrial technological performance based on easier and faster to measure traits, such as agronomic ones, is of paramount importance for soybean breeding programs. The objective was to classify groups of soybean genotypes to industrial technological variables based on agronomic traits measured in the field using machine learning (ML) techniques. Field experiments were carried out in two sites in a randomized block design with two replications and 206 F₂ soybean populations. Agronomic traits evaluated were: days to maturation (DM), first pod height (FPH), plant height (PH), number of branches (NB), main stem diameter (SD), mass of one hundred grains (MHG), and grain yield (GY). Industrial technological variables evaluated were oil yield, crude protein, crude fiber, and ash contents, determined by high-optical accuracy near-infrared spectroscopy (NIRS). The models tested were: support vector machine (SVM), artificial neural network (ANN), decision tree models J48 and REPTree, random forest (RF), and logistic regression (LR, used as control). A genotype clustering was performed using PCA and k-means algorithm, and then the clusters formed were used as output variables of the ML models, while the agronomic traits were used as input variables. ML techniques provided accurate models to classify soybean genotypes for more complex variables (industrial technological) based on agronomic traits. RF outperformed the other models and can be used to contribute to soybean breeding programs by classifying genotypes for industrial technological traits.

Abstract Image

查看原文本刊更多论文

基于农艺性状的工业技术变量大豆群体分类机器学习

遗传育种计划目前面临的挑战是提高谷物产量和蛋白质含量，并至少保持含油量。然而，对工业性状的评估既费时又费钱。因此，对于大豆育种计划来说，根据农艺性状等更容易、更快测量的性状，建立准确的模型，对工业技术性能更好的基因型进行分类，具有极其重要的意义。我们的目标是利用机器学习（ML）技术，根据田间测量的农艺性状，对大豆基因型组进行工业技术变量分类。田间试验在两个地点进行，采用随机区组设计，两次重复，共有 206 个 F2 大豆种群。评估的农艺性状包括：成熟天数（DM）、第一荚高度（FPH）、株高（PH）、分枝数（NB）、主茎直径（SD）、百粒重（MHG）和谷物产量（GY）。评估的工业技术变量包括产油量、粗蛋白、粗纤维和灰分含量，均由高光学精度的近红外光谱（NIRS）测定。测试的模型有：支持向量机（SVM）、人工神经网络（ANN）、决策树模型 J48 和 REPTree、随机森林（RF）和逻辑回归（LR，用作对照）。利用 PCA 和 k-means 算法对基因型进行聚类，然后将形成的聚类作为 ML 模型的输出变量，而农艺性状则作为输入变量。基于农艺性状的 ML 技术为更复杂变量（工业技术）的大豆基因型分类提供了精确的模型。RF 的表现优于其他模型，可通过对基因型进行工业技术性状分类，为大豆育种计划做出贡献。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Euphytica 农林科学-农艺学

CiteScore

3.80

自引率

5.30%

发文量

157

审稿时长

4.5 months

期刊介绍： Euphytica is an international journal on theoretical and applied aspects of plant breeding. It publishes critical reviews and papers on the results of original research related to plant breeding. The integration of modern and traditional plant breeding is a growing field of research using transgenic crop plants and/or marker assisted breeding in combination with traditional breeding tools. The content should cover the interests of researchers directly or indirectly involved in plant breeding, at universities, breeding institutes, seed industries, plant biotech companies and industries using plant raw materials, and promote stability, adaptability and sustainability in agriculture and agro-industries.