Gene regulatory network prediction using machine learning, deep learning, and hybrid approaches.

IF 5

Forestry research Pub Date : 2025-07-30 eCollection Date: 2025-01-01 DOI:10.48130/forres-0025-0014

Sai Teja Mummadi, Md Khairul Islam, Victor Busov, Hairong Wei

{"title":"Gene regulatory network prediction using machine learning, deep learning, and hybrid approaches.","authors":"Sai Teja Mummadi, Md Khairul Islam, Victor Busov, Hairong Wei","doi":"10.48130/forres-0025-0014","DOIUrl":null,"url":null,"abstract":"Construction of gene regulatory networks (GRNs) is essential for elucidating the regulatory mechanisms underlying metabolic pathways, biological processes, and complex traits. In this study, we developed and evaluated machine learning, deep learning, and hybrid approaches for constructing GRNs by integrating prior knowledge and large-scale transcriptomic data from Arabidopsis thaliana, poplar, and maize. Among these, hybrid models that combined convolutional neural networks and machine learning consistently outperformed traditional machine learning and statistical methods, achieving over 95% accuracy on the holdout test datasets. These models not only identified a greater number of known transcription factors regulating the lignin biosynthesis pathway but also demonstrated higher precision in ranking key master regulators such as MYB46 and MYB83, as well as many upstream regulators, including members of the VND, NST, and SND families, at the top of candidate lists. To address the challenge of limited training data in non-model species, we implemented transfer learning, enabling cross-species GRN inference by applying models trained on well-characterized and data-rich species to another species with limited data. This strategy enhanced model performance and demonstrated the feasibility of knowledge transfer across species. Overall, our findings underscore the effectiveness of hybrid and transfer learning approaches in GRN prediction, offering a scalable framework for elucidating regulatory mechanisms in both model and non-model plant systems.","PeriodicalId":520285,"journal":{"name":"Forestry research","volume":"5 ","pages":"e014"},"PeriodicalIF":5.0000,"publicationDate":"2025-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12441907/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Forestry research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48130/forres-0025-0014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Construction of gene regulatory networks (GRNs) is essential for elucidating the regulatory mechanisms underlying metabolic pathways, biological processes, and complex traits. In this study, we developed and evaluated machine learning, deep learning, and hybrid approaches for constructing GRNs by integrating prior knowledge and large-scale transcriptomic data from Arabidopsis thaliana, poplar, and maize. Among these, hybrid models that combined convolutional neural networks and machine learning consistently outperformed traditional machine learning and statistical methods, achieving over 95% accuracy on the holdout test datasets. These models not only identified a greater number of known transcription factors regulating the lignin biosynthesis pathway but also demonstrated higher precision in ranking key master regulators such as MYB46 and MYB83, as well as many upstream regulators, including members of the VND, NST, and SND families, at the top of candidate lists. To address the challenge of limited training data in non-model species, we implemented transfer learning, enabling cross-species GRN inference by applying models trained on well-characterized and data-rich species to another species with limited data. This strategy enhanced model performance and demonstrated the feasibility of knowledge transfer across species. Overall, our findings underscore the effectiveness of hybrid and transfer learning approaches in GRN prediction, offering a scalable framework for elucidating regulatory mechanisms in both model and non-model plant systems.

查看原文本刊更多论文

利用机器学习、深度学习和混合方法进行基因调控网络预测。

基因调控网络（grn）的构建对于阐明代谢途径、生物过程和复杂性状的调控机制至关重要。在这项研究中，我们开发并评估了机器学习、深度学习和混合方法，通过整合先验知识和来自拟南芥、杨树和玉米的大规模转录组学数据来构建grn。其中，结合卷积神经网络和机器学习的混合模型始终优于传统的机器学习和统计方法，在holdout测试数据集上达到95%以上的准确率。这些模型不仅确定了更多已知的调节木质素生物合成途径的转录因子，而且在将关键的主调控因子（如MYB46和MYB83）以及许多上游调控因子（包括VND， NST和SND家族的成员）排在候选列表的顶部时显示出更高的精度。为了解决非模型物种训练数据有限的挑战，我们实施了迁移学习，通过将在特征良好且数据丰富的物种上训练的模型应用于数据有限的另一个物种，实现跨物种GRN推理。该策略提高了模型的性能，证明了跨物种知识转移的可行性。总的来说，我们的研究结果强调了混合学习和迁移学习方法在GRN预测中的有效性，为阐明模型和非模型植物系统中的调节机制提供了一个可扩展的框架。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Forestry research

自引率

0.00%

发文量