J.M. Trinidad-Barnech, J.R. José Sotelo-Silveira, D. Fernandez Do Porto, P. Smircich
{"title":"通过蛋白质结构比较扩展动粒体基因组注释","authors":"J.M. Trinidad-Barnech, J.R. José Sotelo-Silveira, D. Fernandez Do Porto, P. Smircich","doi":"10.1101/2024.08.07.607044","DOIUrl":null,"url":null,"abstract":"Kinetoplastids belong to the supergroup Discobids, an early divergent eukaryotic clade. Although the amount of genomic information on these parasites has grown substantially, assigning gene functions through traditional sequence-based homology methods remains challenging. Recently, significant advancements have been made in in silico protein structure prediction and algorithms for rapid and precise large-scale protein structure comparisons. In this work, we developed a protein structure-based homology search pipeline (ASC, Annotation by Structural Comparisons) and applied it to annotate all kinetoplastid proteins available in TriTrypDB. Our pipeline assigned functional annotation to 23,000 hypothetical proteins across all 35 kinetoplastid species in the database. Among these, we identified ubiquitous eukaryotic proteins that had not been previously detected in kinetoplastid genomes. The resulting annotations (KASC, Kinetoplastid Annotation by Structural Comparison) are openly available to the community (kasc.fcien.edu.uy). Author Summary Kinetoplastids are a group of parasites that cause severe diseases in the poorest regions of the world. Despite the increasing amount of genomic information available on these parasites, predicting the function of many of their genes using traditional methods has been difficult. Recently, there have been significant advancements in predicting protein structures and comparing them on a large scale. In this study, we created a new method called ASC (Annotation by Structural Comparisons) to find functions for all the kinetoplastid genes listed in the TriTrypDB database. Our strategy successfully assigned functions to 23,000 proteins in kinetoplastids. Among these, we discovered important proteins found in all eukaryotes that had not been previously identified in kinetoplastids. This information (KASC, Kinetoplastid Annotation by Structural Comparison) is freely available at kasc.fcien.edu.uy.","PeriodicalId":505198,"journal":{"name":"bioRxiv","volume":"18 4","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Expanding kinetoplastid genome annotation through protein structure comparison\",\"authors\":\"J.M. Trinidad-Barnech, J.R. José Sotelo-Silveira, D. Fernandez Do Porto, P. Smircich\",\"doi\":\"10.1101/2024.08.07.607044\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Kinetoplastids belong to the supergroup Discobids, an early divergent eukaryotic clade. Although the amount of genomic information on these parasites has grown substantially, assigning gene functions through traditional sequence-based homology methods remains challenging. Recently, significant advancements have been made in in silico protein structure prediction and algorithms for rapid and precise large-scale protein structure comparisons. In this work, we developed a protein structure-based homology search pipeline (ASC, Annotation by Structural Comparisons) and applied it to annotate all kinetoplastid proteins available in TriTrypDB. Our pipeline assigned functional annotation to 23,000 hypothetical proteins across all 35 kinetoplastid species in the database. Among these, we identified ubiquitous eukaryotic proteins that had not been previously detected in kinetoplastid genomes. The resulting annotations (KASC, Kinetoplastid Annotation by Structural Comparison) are openly available to the community (kasc.fcien.edu.uy). Author Summary Kinetoplastids are a group of parasites that cause severe diseases in the poorest regions of the world. Despite the increasing amount of genomic information available on these parasites, predicting the function of many of their genes using traditional methods has been difficult. Recently, there have been significant advancements in predicting protein structures and comparing them on a large scale. In this study, we created a new method called ASC (Annotation by Structural Comparisons) to find functions for all the kinetoplastid genes listed in the TriTrypDB database. Our strategy successfully assigned functions to 23,000 proteins in kinetoplastids. Among these, we discovered important proteins found in all eukaryotes that had not been previously identified in kinetoplastids. This information (KASC, Kinetoplastid Annotation by Structural Comparison) is freely available at kasc.fcien.edu.uy.\",\"PeriodicalId\":505198,\"journal\":{\"name\":\"bioRxiv\",\"volume\":\"18 4\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"bioRxiv\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1101/2024.08.07.607044\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.08.07.607044","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
Kinetoplastids 属于 Discobids 超群,是一个早期分化的真核生物支系。尽管有关这些寄生虫的基因组信息量已大幅增加,但通过传统的基于序列的同源性方法来确定基因功能仍然具有挑战性。最近,硅学蛋白质结构预测和快速精确的大规模蛋白质结构比较算法取得了重大进展。在这项工作中,我们开发了基于蛋白质结构的同源性搜索管道(ASC,Annotation by Structural Comparisons),并将其应用于注释 TriTrypDB 中的所有动植体蛋白质。我们的管道为数据库中所有 35 个核原生动物物种的 23,000 个假定蛋白质分配了功能注释。在这些蛋白质中,我们发现了以前未在核原生质体基因组中检测到的普遍存在的真核蛋白质。由此产生的注释(KASC,Kinetoplastid Annotation by Structural Comparison)可向社区公开(kasc.fcien.edu.uy)。作者简介 Kinetoplastids 是一类寄生虫,在世界上最贫穷的地区引起严重的疾病。尽管有关这些寄生虫的基因组信息越来越多,但用传统方法预测其许多基因的功能一直很困难。最近,在预测蛋白质结构并对其进行大规模比较方面取得了重大进展。在这项研究中,我们创建了一种名为 ASC(通过结构比较进行注释)的新方法,为 TriTrypDB 数据库中列出的所有动植体基因寻找功能。我们的策略成功地为 23,000 个核原生动物蛋白质分配了功能。在这些蛋白质中,我们发现了所有真核生物中都有的重要蛋白质,而这些蛋白质以前从未在核原生质中发现过。这些信息(KASC,Kinetoplastid Annotation by Structural Comparison)可在 kasc.fcien.edu.uy 免费获取。
Expanding kinetoplastid genome annotation through protein structure comparison
Kinetoplastids belong to the supergroup Discobids, an early divergent eukaryotic clade. Although the amount of genomic information on these parasites has grown substantially, assigning gene functions through traditional sequence-based homology methods remains challenging. Recently, significant advancements have been made in in silico protein structure prediction and algorithms for rapid and precise large-scale protein structure comparisons. In this work, we developed a protein structure-based homology search pipeline (ASC, Annotation by Structural Comparisons) and applied it to annotate all kinetoplastid proteins available in TriTrypDB. Our pipeline assigned functional annotation to 23,000 hypothetical proteins across all 35 kinetoplastid species in the database. Among these, we identified ubiquitous eukaryotic proteins that had not been previously detected in kinetoplastid genomes. The resulting annotations (KASC, Kinetoplastid Annotation by Structural Comparison) are openly available to the community (kasc.fcien.edu.uy). Author Summary Kinetoplastids are a group of parasites that cause severe diseases in the poorest regions of the world. Despite the increasing amount of genomic information available on these parasites, predicting the function of many of their genes using traditional methods has been difficult. Recently, there have been significant advancements in predicting protein structures and comparing them on a large scale. In this study, we created a new method called ASC (Annotation by Structural Comparisons) to find functions for all the kinetoplastid genes listed in the TriTrypDB database. Our strategy successfully assigned functions to 23,000 proteins in kinetoplastids. Among these, we discovered important proteins found in all eukaryotes that had not been previously identified in kinetoplastids. This information (KASC, Kinetoplastid Annotation by Structural Comparison) is freely available at kasc.fcien.edu.uy.