影响陆地棉产量和品质性状关键环境因子的机器学习驱动识别。

IF 4 2区 生物学 Q1 PLANT SCIENCES
Mohamadou Souaibou, Haoliang Yan, Panhong Dai, Jingtao Pan, Yang Li, Yuzhen Shi, Wankui Gong, Haihong Shang, Juwu Gong, Youlu Yuan
{"title":"影响陆地棉产量和品质性状关键环境因子的机器学习驱动识别。","authors":"Mohamadou Souaibou, Haoliang Yan, Panhong Dai, Jingtao Pan, Yang Li, Yuzhen Shi, Wankui Gong, Haihong Shang, Juwu Gong, Youlu Yuan","doi":"10.3390/plants14132053","DOIUrl":null,"url":null,"abstract":"<p><p>Understanding the influence of environmental factors on cotton performance is crucial for enhancing yield and fiber quality in the context of climate change. This study investigates genotype-by-environment (G×E) interactions in cotton, using data from 250 recombinant inbred lines (CCRI70 RILs) cultivated across 14 diverse environments in China's major cotton cultivation areas. Our findings reveal that environmental effects predominantly influenced yield-related traits (boll weight, lint percentage, and the seed index), contributing to 34.7% to 55.7% of their variance. In contrast fiber quality traits showed lower environmental sensitivity (12.3-27.0%), with notable phenotypic plasticity observed in the boll weight, lint percentage, and fiber micronaire. Employing six machine learning models, Random Forest demonstrated superior predictive ability (R<sup>2</sup> = 0.40-0.72; predictive Pearson correlation = 0.63-0.86). Through SHAP-based interpretation and sliding-window regression, we identified key environmental drivers primarily active during mid-to-late growth stages. This approach effectively reduced the number of influential input variables to just 0.1-2.4% of the original dataset, spanning 2-9 critical time windows per trait. Incorporating these identified drivers significantly improved cross-environment predictions, enhancing Random Forest accuracy by 0.02-0.15. These results underscore the strong potential of machine learning to uncover critical temporal environmental factors underlying G×E interactions and to substantially improve predictive modeling in cotton breeding programs, ultimately contributing to more resilient and productive cotton cultivation.</p>","PeriodicalId":56267,"journal":{"name":"Plants-Basel","volume":"14 13","pages":""},"PeriodicalIF":4.0000,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12252131/pdf/","citationCount":"0","resultStr":"{\"title\":\"Machine Learning-Driven Identification of Key Environmental Factors Influencing Fiber Yield and Quality Traits in Upland Cotton.\",\"authors\":\"Mohamadou Souaibou, Haoliang Yan, Panhong Dai, Jingtao Pan, Yang Li, Yuzhen Shi, Wankui Gong, Haihong Shang, Juwu Gong, Youlu Yuan\",\"doi\":\"10.3390/plants14132053\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Understanding the influence of environmental factors on cotton performance is crucial for enhancing yield and fiber quality in the context of climate change. This study investigates genotype-by-environment (G×E) interactions in cotton, using data from 250 recombinant inbred lines (CCRI70 RILs) cultivated across 14 diverse environments in China's major cotton cultivation areas. Our findings reveal that environmental effects predominantly influenced yield-related traits (boll weight, lint percentage, and the seed index), contributing to 34.7% to 55.7% of their variance. In contrast fiber quality traits showed lower environmental sensitivity (12.3-27.0%), with notable phenotypic plasticity observed in the boll weight, lint percentage, and fiber micronaire. Employing six machine learning models, Random Forest demonstrated superior predictive ability (R<sup>2</sup> = 0.40-0.72; predictive Pearson correlation = 0.63-0.86). Through SHAP-based interpretation and sliding-window regression, we identified key environmental drivers primarily active during mid-to-late growth stages. This approach effectively reduced the number of influential input variables to just 0.1-2.4% of the original dataset, spanning 2-9 critical time windows per trait. Incorporating these identified drivers significantly improved cross-environment predictions, enhancing Random Forest accuracy by 0.02-0.15. These results underscore the strong potential of machine learning to uncover critical temporal environmental factors underlying G×E interactions and to substantially improve predictive modeling in cotton breeding programs, ultimately contributing to more resilient and productive cotton cultivation.</p>\",\"PeriodicalId\":56267,\"journal\":{\"name\":\"Plants-Basel\",\"volume\":\"14 13\",\"pages\":\"\"},\"PeriodicalIF\":4.0000,\"publicationDate\":\"2025-07-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12252131/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Plants-Basel\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.3390/plants14132053\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PLANT SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Plants-Basel","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.3390/plants14132053","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PLANT SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

了解环境因素对棉花生产性能的影响,对于气候变化背景下提高棉花产量和纤维品质具有重要意义。本研究利用中国主要棉花种植区14个不同环境中250个重组自交系(CCRI70 ril)的数据,研究了棉花基因型-环境(G×E)相互作用。结果表明,环境效应主要影响产量相关性状(铃重、衣分和种子指数),占其方差的34.7% ~ 55.7%。相比之下,棉质性状的环境敏感性较低(12.3% ~ 27.0%),铃重、衣分和纤维马克隆值具有显著的表型可塑性。采用6个机器学习模型,随机森林显示出优越的预测能力(R2 = 0.40-0.72;预测Pearson相关性= 0.63-0.86)。通过基于shap的解释和滑动窗口回归,我们确定了主要在生长中后期活跃的关键环境驱动因素。这种方法有效地将有影响的输入变量的数量减少到原始数据集的0.1-2.4%,每个特征跨越2-9个关键时间窗。结合这些已识别的驱动因素显著改善了跨环境预测,将随机森林的准确性提高了0.02-0.15。这些结果强调了机器学习的强大潜力,可以揭示G×E相互作用背后的关键时间环境因素,并大大改善棉花育种计划中的预测建模,最终促进更具弹性和生产力的棉花种植。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Machine Learning-Driven Identification of Key Environmental Factors Influencing Fiber Yield and Quality Traits in Upland Cotton.

Understanding the influence of environmental factors on cotton performance is crucial for enhancing yield and fiber quality in the context of climate change. This study investigates genotype-by-environment (G×E) interactions in cotton, using data from 250 recombinant inbred lines (CCRI70 RILs) cultivated across 14 diverse environments in China's major cotton cultivation areas. Our findings reveal that environmental effects predominantly influenced yield-related traits (boll weight, lint percentage, and the seed index), contributing to 34.7% to 55.7% of their variance. In contrast fiber quality traits showed lower environmental sensitivity (12.3-27.0%), with notable phenotypic plasticity observed in the boll weight, lint percentage, and fiber micronaire. Employing six machine learning models, Random Forest demonstrated superior predictive ability (R2 = 0.40-0.72; predictive Pearson correlation = 0.63-0.86). Through SHAP-based interpretation and sliding-window regression, we identified key environmental drivers primarily active during mid-to-late growth stages. This approach effectively reduced the number of influential input variables to just 0.1-2.4% of the original dataset, spanning 2-9 critical time windows per trait. Incorporating these identified drivers significantly improved cross-environment predictions, enhancing Random Forest accuracy by 0.02-0.15. These results underscore the strong potential of machine learning to uncover critical temporal environmental factors underlying G×E interactions and to substantially improve predictive modeling in cotton breeding programs, ultimately contributing to more resilient and productive cotton cultivation.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Plants-Basel
Plants-Basel Agricultural and Biological Sciences-Ecology, Evolution, Behavior and Systematics
CiteScore
6.50
自引率
11.10%
发文量
2923
审稿时长
15.4 days
期刊介绍: Plants (ISSN 2223-7747), is an international and multidisciplinary scientific open access journal that covers all key areas of plant science. It publishes review articles, regular research articles, communications, and short notes in the fields of structural, functional and experimental botany. In addition to fundamental disciplines such as morphology, systematics, physiology and ecology of plants, the journal welcomes all types of articles in the field of applied plant science.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信