GATA TF分类器:基于人工智能的被子植物GATA转录因子功能预测与分类分析。

IF 1.9 4区 生物学 Q2 BIOLOGY
Mangi Kim
{"title":"GATA TF分类器:基于人工智能的被子植物GATA转录因子功能预测与分类分析。","authors":"Mangi Kim","doi":"10.1016/j.biosystems.2025.105589","DOIUrl":null,"url":null,"abstract":"<div><div>GATA transcription factors (TFs) are key regulators of diverse physiological and developmental processes in angiosperms. Although they are traditionally classified into four functional classes (A-D) based on phylogenetic relationships, large-scale classification across plant genomes remains limited by the labor-intensive nature of tree-based approaches. To overcome this limitation, this study presents the GATA TF Class Classifier, a scalable sequence-based tool for genome-wide functional classification of GATA TFs across angiosperm species. The model was trained on 700 curated full-length sequences from 23 species, encoded with ProtBERT, reduced via principal component analysis (PCA) with six additional features, and classified into functional classes using a support vector machine (SVM). The model achieved an average accuracy of 94.29 %, with balanced performance across all classes, as confirmed by repeated stratified 5-fold cross-validation. When applied to 4170 GATA TFs from 121 angiosperm genomes, the classifier showed that classes A and B were relatively abundant, whereas classes C and D were less represented, implying that each class may perform distinct biological functions. In addition, this study performed a taxonomic analysis of the predicted GATA TF classes to investigate their characteristics across major angiosperm lineages. Taken together, the classifier facilitates large-scale annotation and offers insights into the lineage-specific diversification and functional evolution of GATA TFs in angiosperms.</div></div>","PeriodicalId":50730,"journal":{"name":"Biosystems","volume":"257 ","pages":"Article 105589"},"PeriodicalIF":1.9000,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GATA TF Class Classifier: AI-based functional prediction and taxonomic profiling in angiosperm GATA transcription factors\",\"authors\":\"Mangi Kim\",\"doi\":\"10.1016/j.biosystems.2025.105589\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>GATA transcription factors (TFs) are key regulators of diverse physiological and developmental processes in angiosperms. Although they are traditionally classified into four functional classes (A-D) based on phylogenetic relationships, large-scale classification across plant genomes remains limited by the labor-intensive nature of tree-based approaches. To overcome this limitation, this study presents the GATA TF Class Classifier, a scalable sequence-based tool for genome-wide functional classification of GATA TFs across angiosperm species. The model was trained on 700 curated full-length sequences from 23 species, encoded with ProtBERT, reduced via principal component analysis (PCA) with six additional features, and classified into functional classes using a support vector machine (SVM). The model achieved an average accuracy of 94.29 %, with balanced performance across all classes, as confirmed by repeated stratified 5-fold cross-validation. When applied to 4170 GATA TFs from 121 angiosperm genomes, the classifier showed that classes A and B were relatively abundant, whereas classes C and D were less represented, implying that each class may perform distinct biological functions. In addition, this study performed a taxonomic analysis of the predicted GATA TF classes to investigate their characteristics across major angiosperm lineages. Taken together, the classifier facilitates large-scale annotation and offers insights into the lineage-specific diversification and functional evolution of GATA TFs in angiosperms.</div></div>\",\"PeriodicalId\":50730,\"journal\":{\"name\":\"Biosystems\",\"volume\":\"257 \",\"pages\":\"Article 105589\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2025-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biosystems\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0303264725001996\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biosystems","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0303264725001996","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

GATA转录因子是被子植物多种生理和发育过程的关键调控因子。尽管传统上基于系统发育关系将它们分为四个功能类(A-D),但基于树的方法的劳动密集型性质仍然限制了植物基因组的大规模分类。为了克服这一限制,本研究提出了GATA TF分类器,这是一个可扩展的基于序列的工具,用于跨被子植物物种的GATA TF全基因组功能分类。该模型对来自23个物种的700个精选的全长序列进行训练,用ProtBERT编码,通过主成分分析(PCA)与六个附加特征进行约简,并使用支持向量机(SVM)将其分类为功能类。该模型的平均准确率为94.29%,在所有类别中都具有平衡的性能,正如重复分层5倍交叉验证所证实的那样。对来自121个被子植物基因组的4170个GATA TFs进行分类,结果显示A类和B类相对丰富,而C类和D类较少,这表明每个类别可能具有不同的生物学功能。此外,本研究还对预测的GATA TF类进行了分类分析,以探讨它们在主要被子植物谱系中的特征。综上所述,该分类器促进了大规模注释,并为被子植物GATA TFs的谱系特异性多样化和功能进化提供了见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
GATA TF Class Classifier: AI-based functional prediction and taxonomic profiling in angiosperm GATA transcription factors
GATA transcription factors (TFs) are key regulators of diverse physiological and developmental processes in angiosperms. Although they are traditionally classified into four functional classes (A-D) based on phylogenetic relationships, large-scale classification across plant genomes remains limited by the labor-intensive nature of tree-based approaches. To overcome this limitation, this study presents the GATA TF Class Classifier, a scalable sequence-based tool for genome-wide functional classification of GATA TFs across angiosperm species. The model was trained on 700 curated full-length sequences from 23 species, encoded with ProtBERT, reduced via principal component analysis (PCA) with six additional features, and classified into functional classes using a support vector machine (SVM). The model achieved an average accuracy of 94.29 %, with balanced performance across all classes, as confirmed by repeated stratified 5-fold cross-validation. When applied to 4170 GATA TFs from 121 angiosperm genomes, the classifier showed that classes A and B were relatively abundant, whereas classes C and D were less represented, implying that each class may perform distinct biological functions. In addition, this study performed a taxonomic analysis of the predicted GATA TF classes to investigate their characteristics across major angiosperm lineages. Taken together, the classifier facilitates large-scale annotation and offers insights into the lineage-specific diversification and functional evolution of GATA TFs in angiosperms.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Biosystems
Biosystems 生物-生物学
CiteScore
3.70
自引率
18.80%
发文量
129
审稿时长
34 days
期刊介绍: BioSystems encourages experimental, computational, and theoretical articles that link biology, evolutionary thinking, and the information processing sciences. The link areas form a circle that encompasses the fundamental nature of biological information processing, computational modeling of complex biological systems, evolutionary models of computation, the application of biological principles to the design of novel computing systems, and the use of biomolecular materials to synthesize artificial systems that capture essential principles of natural biological information processing.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信