{"title":"Machine Learning-Driven Identification of Key Environmental Factors Influencing Fiber Yield and Quality Traits in Upland Cotton.","authors":"Mohamadou Souaibou, Haoliang Yan, Panhong Dai, Jingtao Pan, Yang Li, Yuzhen Shi, Wankui Gong, Haihong Shang, Juwu Gong, Youlu Yuan","doi":"10.3390/plants14132053","DOIUrl":null,"url":null,"abstract":"<p><p>Understanding the influence of environmental factors on cotton performance is crucial for enhancing yield and fiber quality in the context of climate change. This study investigates genotype-by-environment (G×E) interactions in cotton, using data from 250 recombinant inbred lines (CCRI70 RILs) cultivated across 14 diverse environments in China's major cotton cultivation areas. Our findings reveal that environmental effects predominantly influenced yield-related traits (boll weight, lint percentage, and the seed index), contributing to 34.7% to 55.7% of their variance. In contrast fiber quality traits showed lower environmental sensitivity (12.3-27.0%), with notable phenotypic plasticity observed in the boll weight, lint percentage, and fiber micronaire. Employing six machine learning models, Random Forest demonstrated superior predictive ability (R<sup>2</sup> = 0.40-0.72; predictive Pearson correlation = 0.63-0.86). Through SHAP-based interpretation and sliding-window regression, we identified key environmental drivers primarily active during mid-to-late growth stages. This approach effectively reduced the number of influential input variables to just 0.1-2.4% of the original dataset, spanning 2-9 critical time windows per trait. Incorporating these identified drivers significantly improved cross-environment predictions, enhancing Random Forest accuracy by 0.02-0.15. These results underscore the strong potential of machine learning to uncover critical temporal environmental factors underlying G×E interactions and to substantially improve predictive modeling in cotton breeding programs, ultimately contributing to more resilient and productive cotton cultivation.</p>","PeriodicalId":56267,"journal":{"name":"Plants-Basel","volume":"14 13","pages":""},"PeriodicalIF":4.0000,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12252131/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Plants-Basel","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.3390/plants14132053","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PLANT SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Understanding the influence of environmental factors on cotton performance is crucial for enhancing yield and fiber quality in the context of climate change. This study investigates genotype-by-environment (G×E) interactions in cotton, using data from 250 recombinant inbred lines (CCRI70 RILs) cultivated across 14 diverse environments in China's major cotton cultivation areas. Our findings reveal that environmental effects predominantly influenced yield-related traits (boll weight, lint percentage, and the seed index), contributing to 34.7% to 55.7% of their variance. In contrast fiber quality traits showed lower environmental sensitivity (12.3-27.0%), with notable phenotypic plasticity observed in the boll weight, lint percentage, and fiber micronaire. Employing six machine learning models, Random Forest demonstrated superior predictive ability (R2 = 0.40-0.72; predictive Pearson correlation = 0.63-0.86). Through SHAP-based interpretation and sliding-window regression, we identified key environmental drivers primarily active during mid-to-late growth stages. This approach effectively reduced the number of influential input variables to just 0.1-2.4% of the original dataset, spanning 2-9 critical time windows per trait. Incorporating these identified drivers significantly improved cross-environment predictions, enhancing Random Forest accuracy by 0.02-0.15. These results underscore the strong potential of machine learning to uncover critical temporal environmental factors underlying G×E interactions and to substantially improve predictive modeling in cotton breeding programs, ultimately contributing to more resilient and productive cotton cultivation.
Plants-BaselAgricultural and Biological Sciences-Ecology, Evolution, Behavior and Systematics
CiteScore
6.50
自引率
11.10%
发文量
2923
审稿时长
15.4 days
期刊介绍:
Plants (ISSN 2223-7747), is an international and multidisciplinary scientific open access journal that covers all key areas of plant science. It publishes review articles, regular research articles, communications, and short notes in the fields of structural, functional and experimental botany. In addition to fundamental disciplines such as morphology, systematics, physiology and ecology of plants, the journal welcomes all types of articles in the field of applied plant science.