通过机器学习和差异基因相关性分析发现黄瓜中的涝害响应基因。

IF 3.4 3区 生物学 Q1 Agricultural and Biological Sciences
Zahra Zinati, Leyla Nazari, Ali Niazi
{"title":"通过机器学习和差异基因相关性分析发现黄瓜中的涝害响应基因。","authors":"Zahra Zinati, Leyla Nazari, Ali Niazi","doi":"10.1186/s40529-024-00433-z","DOIUrl":null,"url":null,"abstract":"<p><p>As climate change intensifies, the frequency and severity of waterlogging are expected to increase, necessitating a deeper understanding of the cucumber response to this stress. In this study, three public RNA-seq datasets (PRJNA799460, PRJNA844418, and PRJNA678740) comprising 36 samples were analyzed. Various feature selection algorithms including Uncertainty, Relief, SVM (Support Vector Machine), Correlation, and logistic least absolute shrinkage, and selection operator (LASSO) were performed to identify the most significant genes related to the waterlogging stress response. These feature selection techniques, which have different characteristics, were used to reduce the complexity of the data and thereby identify the most significant genes related to the waterlogging stress response. Uncertainty, Relief, SVM, Correlation, and LASSO identified 4, 4, 10, 21, and 13 genes, respectively. Differential gene correlation analysis (DGCA) focusing on the 36 selected genes identified changes in correlation patterns between the selected genes under waterlogged versus control conditions, providing deeper insights into the regulatory networks and interactions among the selected genes. DGCA revealed significant changes in the correlation of 13 genes between control and waterlogging conditions. Finally, we validated 13 genes using the Random Forest (RF) classifier, which achieved 100% accuracy and a 1.0 Area Under the Curve (AUC) score. The SHapley Additive exPlanations (SHAP) values clearly showed the significant impact of LOC101209599, LOC101217277, and LOC101216320 on the model's predictive power. In addition, we employed the Boruta as a wrapper feature selection method to further validate our gene selection strategy. Eight of the 13 genes were common across the four feature weighting algorithms, LASSO, DGCA, and Boruta, underscoring the robustness and reliability of our gene selection strategy. Notably, the genes LOC101209599, LOC101217277, and LOC101216320 were among genes identified by multiple feature selection methods from different categories (filtering, wrapper, and embedded). Pathways associated with these specific genes play a pivotal role in regulating stress tolerance, root development, nutrient absorption, sugar metabolism, gene expression, protein degradation, and calcium signaling. These intricate regulatory mechanisms are crucial for cucumbers to adapt effectively to waterlogging conditions. These findings provide valuable insights for uncovering targets in breeding new cucumber varieties with enhanced stress tolerance.</p>","PeriodicalId":9185,"journal":{"name":"Botanical Studies","volume":null,"pages":null},"PeriodicalIF":3.4000,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11324642/pdf/","citationCount":"0","resultStr":"{\"title\":\"Uncovering waterlogging-responsive genes in cucumber through machine learning and differential gene correlation analysis.\",\"authors\":\"Zahra Zinati, Leyla Nazari, Ali Niazi\",\"doi\":\"10.1186/s40529-024-00433-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>As climate change intensifies, the frequency and severity of waterlogging are expected to increase, necessitating a deeper understanding of the cucumber response to this stress. In this study, three public RNA-seq datasets (PRJNA799460, PRJNA844418, and PRJNA678740) comprising 36 samples were analyzed. Various feature selection algorithms including Uncertainty, Relief, SVM (Support Vector Machine), Correlation, and logistic least absolute shrinkage, and selection operator (LASSO) were performed to identify the most significant genes related to the waterlogging stress response. These feature selection techniques, which have different characteristics, were used to reduce the complexity of the data and thereby identify the most significant genes related to the waterlogging stress response. Uncertainty, Relief, SVM, Correlation, and LASSO identified 4, 4, 10, 21, and 13 genes, respectively. Differential gene correlation analysis (DGCA) focusing on the 36 selected genes identified changes in correlation patterns between the selected genes under waterlogged versus control conditions, providing deeper insights into the regulatory networks and interactions among the selected genes. DGCA revealed significant changes in the correlation of 13 genes between control and waterlogging conditions. Finally, we validated 13 genes using the Random Forest (RF) classifier, which achieved 100% accuracy and a 1.0 Area Under the Curve (AUC) score. The SHapley Additive exPlanations (SHAP) values clearly showed the significant impact of LOC101209599, LOC101217277, and LOC101216320 on the model's predictive power. In addition, we employed the Boruta as a wrapper feature selection method to further validate our gene selection strategy. Eight of the 13 genes were common across the four feature weighting algorithms, LASSO, DGCA, and Boruta, underscoring the robustness and reliability of our gene selection strategy. Notably, the genes LOC101209599, LOC101217277, and LOC101216320 were among genes identified by multiple feature selection methods from different categories (filtering, wrapper, and embedded). Pathways associated with these specific genes play a pivotal role in regulating stress tolerance, root development, nutrient absorption, sugar metabolism, gene expression, protein degradation, and calcium signaling. These intricate regulatory mechanisms are crucial for cucumbers to adapt effectively to waterlogging conditions. These findings provide valuable insights for uncovering targets in breeding new cucumber varieties with enhanced stress tolerance.</p>\",\"PeriodicalId\":9185,\"journal\":{\"name\":\"Botanical Studies\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-08-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11324642/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Botanical Studies\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s40529-024-00433-z\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Agricultural and Biological Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Botanical Studies","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s40529-024-00433-z","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Agricultural and Biological Sciences","Score":null,"Total":0}
引用次数: 0

摘要

随着气候变化的加剧,预计涝害的频率和严重程度都会增加,因此有必要深入了解黄瓜对这种胁迫的反应。本研究分析了由36个样本组成的三个公开RNA-seq数据集(PRJNA799460、PRJNA844418和PRJNA678740)。研究人员采用了多种特征选择算法,包括不确定性算法、救济算法、SVM(支持向量机)算法、相关算法和逻辑最小绝对收缩和选择算子(LASSO)算法,以确定与水涝胁迫反应相关的最重要基因。这些特征选择技术具有不同的特点,用于降低数据的复杂性,从而找出与内涝应激反应相关的最重要基因。不确定性、救济、SVM、相关和 LASSO 分别识别出了 4、4、10、21 和 13 个基因。针对 36 个选定基因的差异基因相关性分析(DGCA)确定了选定基因在涝害与对照条件下的相关性模式变化,为深入了解选定基因之间的调控网络和相互作用提供了依据。DGCA 发现 13 个基因的相关性在对照条件和水涝条件下发生了显著变化。最后,我们使用随机森林(RF)分类器验证了 13 个基因,其准确率达到 100%,曲线下面积(AUC)得分达到 1.0。SHAP 值清楚地显示了 LOC101209599、LOC101217277 和 LOC101216320 对模型预测能力的显著影响。此外,我们还采用了 Boruta 作为包装特征选择方法,以进一步验证我们的基因选择策略。在四种特征加权算法、LASSO、DGCA 和 Boruta 中,13 个基因中有 8 个是共同的,这突出表明了我们的基因选择策略的稳健性和可靠性。值得注意的是,基因 LOC101209599、LOC101217277 和 LOC101216320 是由不同类别(过滤、包装和嵌入)的多种特征选择方法识别出的基因。与这些特定基因相关的通路在调控抗逆性、根系发育、养分吸收、糖代谢、基因表达、蛋白质降解和钙信号转导等方面发挥着关键作用。这些错综复杂的调控机制对黄瓜有效适应水涝条件至关重要。这些发现为发掘具有更强抗逆性的黄瓜新品种育种目标提供了宝贵的见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Uncovering waterlogging-responsive genes in cucumber through machine learning and differential gene correlation analysis.

As climate change intensifies, the frequency and severity of waterlogging are expected to increase, necessitating a deeper understanding of the cucumber response to this stress. In this study, three public RNA-seq datasets (PRJNA799460, PRJNA844418, and PRJNA678740) comprising 36 samples were analyzed. Various feature selection algorithms including Uncertainty, Relief, SVM (Support Vector Machine), Correlation, and logistic least absolute shrinkage, and selection operator (LASSO) were performed to identify the most significant genes related to the waterlogging stress response. These feature selection techniques, which have different characteristics, were used to reduce the complexity of the data and thereby identify the most significant genes related to the waterlogging stress response. Uncertainty, Relief, SVM, Correlation, and LASSO identified 4, 4, 10, 21, and 13 genes, respectively. Differential gene correlation analysis (DGCA) focusing on the 36 selected genes identified changes in correlation patterns between the selected genes under waterlogged versus control conditions, providing deeper insights into the regulatory networks and interactions among the selected genes. DGCA revealed significant changes in the correlation of 13 genes between control and waterlogging conditions. Finally, we validated 13 genes using the Random Forest (RF) classifier, which achieved 100% accuracy and a 1.0 Area Under the Curve (AUC) score. The SHapley Additive exPlanations (SHAP) values clearly showed the significant impact of LOC101209599, LOC101217277, and LOC101216320 on the model's predictive power. In addition, we employed the Boruta as a wrapper feature selection method to further validate our gene selection strategy. Eight of the 13 genes were common across the four feature weighting algorithms, LASSO, DGCA, and Boruta, underscoring the robustness and reliability of our gene selection strategy. Notably, the genes LOC101209599, LOC101217277, and LOC101216320 were among genes identified by multiple feature selection methods from different categories (filtering, wrapper, and embedded). Pathways associated with these specific genes play a pivotal role in regulating stress tolerance, root development, nutrient absorption, sugar metabolism, gene expression, protein degradation, and calcium signaling. These intricate regulatory mechanisms are crucial for cucumbers to adapt effectively to waterlogging conditions. These findings provide valuable insights for uncovering targets in breeding new cucumber varieties with enhanced stress tolerance.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Botanical Studies
Botanical Studies 生物-植物科学
CiteScore
5.50
自引率
2.90%
发文量
32
审稿时长
2.4 months
期刊介绍: Botanical Studies is an open access journal that encompasses all aspects of botany, including but not limited to taxonomy, morphology, development, genetics, evolution, reproduction, systematics, and biodiversity of all plant groups, algae, and fungi. The journal is affiliated with the Institute of Plant and Microbial Biology, Academia Sinica, Taiwan.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信