基于农艺性状的工业技术变量大豆群体分类机器学习

IF 1.6 3区 农林科学 Q2 AGRONOMY
Larissa Pereira Ribeiro Teodoro, Maik Oliveira Silva, Regimar Garcia dos Santos, Júlia Ferreira de Alcântara, Paulo Carteri Coradi, Bárbara Biduski, Carlos Antonio da Silva Junior, Francisco Eduardo Torres, Paulo Eduardo Teodoro
{"title":"基于农艺性状的工业技术变量大豆群体分类机器学习","authors":"Larissa Pereira Ribeiro Teodoro, Maik Oliveira Silva, Regimar Garcia dos Santos, Júlia Ferreira de Alcântara, Paulo Carteri Coradi, Bárbara Biduski, Carlos Antonio da Silva Junior, Francisco Eduardo Torres, Paulo Eduardo Teodoro","doi":"10.1007/s10681-024-03301-w","DOIUrl":null,"url":null,"abstract":"<p>A current challenge of genetic breeding programs is to increase grain yield and protein content and at least maintain oil content. However, evaluations of industrial traits are time and cost-consuming. Thus, achieving accurate models for classifying genotypes with better industrial technological performance based on easier and faster to measure traits, such as agronomic ones, is of paramount importance for soybean breeding programs. The objective was to classify groups of soybean genotypes to industrial technological variables based on agronomic traits measured in the field using machine learning (ML) techniques. Field experiments were carried out in two sites in a randomized block design with two replications and 206 F<sub>2</sub> soybean populations. Agronomic traits evaluated were: days to maturation (DM), first pod height (FPH), plant height (PH), number of branches (NB), main stem diameter (SD), mass of one hundred grains (MHG), and grain yield (GY). Industrial technological variables evaluated were oil yield, crude protein, crude fiber, and ash contents, determined by high-optical accuracy near-infrared spectroscopy (NIRS). The models tested were: support vector machine (SVM), artificial neural network (ANN), decision tree models J48 and REPTree, random forest (RF), and logistic regression (LR, used as control). A genotype clustering was performed using PCA and k-means algorithm, and then the clusters formed were used as output variables of the ML models, while the agronomic traits were used as input variables. ML techniques provided accurate models to classify soybean genotypes for more complex variables (industrial technological) based on agronomic traits. RF outperformed the other models and can be used to contribute to soybean breeding programs by classifying genotypes for industrial technological traits.</p>","PeriodicalId":11803,"journal":{"name":"Euphytica","volume":"112 1","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2024-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine learning for classification of soybean populations for industrial technological variables based on agronomic traits\",\"authors\":\"Larissa Pereira Ribeiro Teodoro, Maik Oliveira Silva, Regimar Garcia dos Santos, Júlia Ferreira de Alcântara, Paulo Carteri Coradi, Bárbara Biduski, Carlos Antonio da Silva Junior, Francisco Eduardo Torres, Paulo Eduardo Teodoro\",\"doi\":\"10.1007/s10681-024-03301-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>A current challenge of genetic breeding programs is to increase grain yield and protein content and at least maintain oil content. However, evaluations of industrial traits are time and cost-consuming. Thus, achieving accurate models for classifying genotypes with better industrial technological performance based on easier and faster to measure traits, such as agronomic ones, is of paramount importance for soybean breeding programs. The objective was to classify groups of soybean genotypes to industrial technological variables based on agronomic traits measured in the field using machine learning (ML) techniques. Field experiments were carried out in two sites in a randomized block design with two replications and 206 F<sub>2</sub> soybean populations. Agronomic traits evaluated were: days to maturation (DM), first pod height (FPH), plant height (PH), number of branches (NB), main stem diameter (SD), mass of one hundred grains (MHG), and grain yield (GY). Industrial technological variables evaluated were oil yield, crude protein, crude fiber, and ash contents, determined by high-optical accuracy near-infrared spectroscopy (NIRS). The models tested were: support vector machine (SVM), artificial neural network (ANN), decision tree models J48 and REPTree, random forest (RF), and logistic regression (LR, used as control). A genotype clustering was performed using PCA and k-means algorithm, and then the clusters formed were used as output variables of the ML models, while the agronomic traits were used as input variables. ML techniques provided accurate models to classify soybean genotypes for more complex variables (industrial technological) based on agronomic traits. RF outperformed the other models and can be used to contribute to soybean breeding programs by classifying genotypes for industrial technological traits.</p>\",\"PeriodicalId\":11803,\"journal\":{\"name\":\"Euphytica\",\"volume\":\"112 1\",\"pages\":\"\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2024-02-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Euphytica\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://doi.org/10.1007/s10681-024-03301-w\",\"RegionNum\":3,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"AGRONOMY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Euphytica","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.1007/s10681-024-03301-w","RegionNum":3,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AGRONOMY","Score":null,"Total":0}
引用次数: 0

摘要

遗传育种计划目前面临的挑战是提高谷物产量和蛋白质含量,并至少保持含油量。然而,对工业性状的评估既费时又费钱。因此,对于大豆育种计划来说,根据农艺性状等更容易、更快测量的性状,建立准确的模型,对工业技术性能更好的基因型进行分类,具有极其重要的意义。我们的目标是利用机器学习(ML)技术,根据田间测量的农艺性状,对大豆基因型组进行工业技术变量分类。田间试验在两个地点进行,采用随机区组设计,两次重复,共有 206 个 F2 大豆种群。评估的农艺性状包括:成熟天数(DM)、第一荚高度(FPH)、株高(PH)、分枝数(NB)、主茎直径(SD)、百粒重(MHG)和谷物产量(GY)。评估的工业技术变量包括产油量、粗蛋白、粗纤维和灰分含量,均由高光学精度的近红外光谱(NIRS)测定。测试的模型有:支持向量机(SVM)、人工神经网络(ANN)、决策树模型 J48 和 REPTree、随机森林(RF)和逻辑回归(LR,用作对照)。利用 PCA 和 k-means 算法对基因型进行聚类,然后将形成的聚类作为 ML 模型的输出变量,而农艺性状则作为输入变量。基于农艺性状的 ML 技术为更复杂变量(工业技术)的大豆基因型分类提供了精确的模型。RF 的表现优于其他模型,可通过对基因型进行工业技术性状分类,为大豆育种计划做出贡献。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Machine learning for classification of soybean populations for industrial technological variables based on agronomic traits

Machine learning for classification of soybean populations for industrial technological variables based on agronomic traits

A current challenge of genetic breeding programs is to increase grain yield and protein content and at least maintain oil content. However, evaluations of industrial traits are time and cost-consuming. Thus, achieving accurate models for classifying genotypes with better industrial technological performance based on easier and faster to measure traits, such as agronomic ones, is of paramount importance for soybean breeding programs. The objective was to classify groups of soybean genotypes to industrial technological variables based on agronomic traits measured in the field using machine learning (ML) techniques. Field experiments were carried out in two sites in a randomized block design with two replications and 206 F2 soybean populations. Agronomic traits evaluated were: days to maturation (DM), first pod height (FPH), plant height (PH), number of branches (NB), main stem diameter (SD), mass of one hundred grains (MHG), and grain yield (GY). Industrial technological variables evaluated were oil yield, crude protein, crude fiber, and ash contents, determined by high-optical accuracy near-infrared spectroscopy (NIRS). The models tested were: support vector machine (SVM), artificial neural network (ANN), decision tree models J48 and REPTree, random forest (RF), and logistic regression (LR, used as control). A genotype clustering was performed using PCA and k-means algorithm, and then the clusters formed were used as output variables of the ML models, while the agronomic traits were used as input variables. ML techniques provided accurate models to classify soybean genotypes for more complex variables (industrial technological) based on agronomic traits. RF outperformed the other models and can be used to contribute to soybean breeding programs by classifying genotypes for industrial technological traits.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Euphytica
Euphytica 农林科学-农艺学
CiteScore
3.80
自引率
5.30%
发文量
157
审稿时长
4.5 months
期刊介绍: Euphytica is an international journal on theoretical and applied aspects of plant breeding. It publishes critical reviews and papers on the results of original research related to plant breeding. The integration of modern and traditional plant breeding is a growing field of research using transgenic crop plants and/or marker assisted breeding in combination with traditional breeding tools. The content should cover the interests of researchers directly or indirectly involved in plant breeding, at universities, breeding institutes, seed industries, plant biotech companies and industries using plant raw materials, and promote stability, adaptability and sustainability in agriculture and agro-industries.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信