Exploiting Genomic Features to Improve the Prediction of Transcription Factor-Binding Sites in Plants.

Quentin Rivière, Massimiliano Corso, Madalina Ciortan, Grégoire Noël, Nathalie Verbruggen, Matthieu Defrance
{"title":"Exploiting Genomic Features to Improve the Prediction of Transcription Factor-Binding Sites in Plants.","authors":"Quentin Rivière,&nbsp;Massimiliano Corso,&nbsp;Madalina Ciortan,&nbsp;Grégoire Noël,&nbsp;Nathalie Verbruggen,&nbsp;Matthieu Defrance","doi":"10.1093/pcp/pcac095","DOIUrl":null,"url":null,"abstract":"<p><p>The identification of transcription factor (TF) target genes is central in biology. A popular approach is based on the location by pattern matching of potential cis-regulatory elements (CREs). During the last few years, tools integrating next-generation sequencing data have been developed to improve the performance of pattern matching. However, such tools have not yet been comprehensively evaluated in plants. Hence, we developed a new streamlined method aiming at predicting CREs and target genes of plant TFs in specific organs or conditions. Our approach implements a supervised machine learning strategy, which allows decision rule models to be learnt using TF ChIP-chip/seq experimental data. Different layers of genomic features were integrated in predictive models: the position on the gene, the DNA sequence conservation, the chromatin state and various CRE footprints. Among the tested features, the chromatin features were crucial for improving the accuracy of the method. Furthermore, we evaluated the transferability of predictive models across TFs, organs and species. Finally, we validated our method by correctly inferring the target genes of key TFs controlling metabolite biosynthesis at the organ level in Arabidopsis. We developed a tool-Wimtrap-to reproduce our approach in plant species and conditions/organs for which ChIP-chip/seq data are available. Wimtrap is a user-friendly R package that supports an R Shiny web interface and is provided with pre-built models that can be used to quickly get predictions of CREs and TF gene targets in different organs or conditions in Arabidopsis thaliana, Solanum lycopersicum, Oryza sativa and Zea mays.</p>","PeriodicalId":502140,"journal":{"name":"Plant & Cell Physiology","volume":" ","pages":"1457-1473"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Plant & Cell Physiology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/pcp/pcac095","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

The identification of transcription factor (TF) target genes is central in biology. A popular approach is based on the location by pattern matching of potential cis-regulatory elements (CREs). During the last few years, tools integrating next-generation sequencing data have been developed to improve the performance of pattern matching. However, such tools have not yet been comprehensively evaluated in plants. Hence, we developed a new streamlined method aiming at predicting CREs and target genes of plant TFs in specific organs or conditions. Our approach implements a supervised machine learning strategy, which allows decision rule models to be learnt using TF ChIP-chip/seq experimental data. Different layers of genomic features were integrated in predictive models: the position on the gene, the DNA sequence conservation, the chromatin state and various CRE footprints. Among the tested features, the chromatin features were crucial for improving the accuracy of the method. Furthermore, we evaluated the transferability of predictive models across TFs, organs and species. Finally, we validated our method by correctly inferring the target genes of key TFs controlling metabolite biosynthesis at the organ level in Arabidopsis. We developed a tool-Wimtrap-to reproduce our approach in plant species and conditions/organs for which ChIP-chip/seq data are available. Wimtrap is a user-friendly R package that supports an R Shiny web interface and is provided with pre-built models that can be used to quickly get predictions of CREs and TF gene targets in different organs or conditions in Arabidopsis thaliana, Solanum lycopersicum, Oryza sativa and Zea mays.

利用基因组特征改进植物转录因子结合位点的预测。
转录因子靶基因的鉴定是生物学研究的核心。一种流行的方法是基于潜在顺式调控元件(CREs)的模式匹配定位。在过去的几年里,整合下一代测序数据的工具已经被开发出来,以提高模式匹配的性能。然而,这些工具尚未在植物中得到全面评估。因此,我们开发了一种新的简化方法,旨在预测特定器官或条件下植物TFs的cre和靶基因。我们的方法实现了一种监督机器学习策略,该策略允许使用TF ChIP-chip/seq实验数据学习决策规则模型。不同层次的基因组特征被整合到预测模型中:基因上的位置、DNA序列的保守性、染色质状态和各种CRE足迹。在测试的特征中,染色质特征对提高方法的准确性至关重要。此外,我们还评估了预测模型在tf、器官和物种之间的可移植性。最后,我们通过正确推断拟南芥器官水平上控制代谢物生物合成的关键tf的靶基因来验证我们的方法。我们开发了一种工具- wimtrap -在ChIP-chip/seq数据可用的植物物种和条件/器官中复制我们的方法。Wimtrap是一个用户友好的R软件包,支持R Shiny web界面,并提供预构建的模型,可用于快速获得拟南芥、茄、水稻和玉米在不同器官或条件下的cre和TF基因靶标的预测。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信