Juan Manuel Trinidad-Barnech, José Sotelo-Silveira, Darío Fernández Do Porto, Pablo Smircich
{"title":"通过蛋白质结构比较扩展着丝质体基因组注释。","authors":"Juan Manuel Trinidad-Barnech, José Sotelo-Silveira, Darío Fernández Do Porto, Pablo Smircich","doi":"10.1371/journal.ppat.1013120","DOIUrl":null,"url":null,"abstract":"<p><p>Kinetoplastids belong to the Discoba supergroup, an early divergent eukaryotic clade. Although the amount of genomic information on these parasites has grown substantially, assigning gene functions through traditional sequence-based homology methods remains challenging. Recently, significant advancements have been made in in-silico protein structure prediction and algorithms for rapid and precise large-scale protein structure comparisons. In this work, we developed a protein structure-based homology search pipeline (ASC, Annotation by Structural Comparisons) and applied it to transfer biological information to all kinetoplastid proteins available in TriTrypDB, the reference database for this lineage. Our pipeline enabled the assignment of structural similarity to a substantial portion of kinetoplastid proteins, improving current knowledge through annotation transfer. Additionally, we identified structural homologs for representatives of 6,700 uncharacterized proteins across 33 kinetoplastid species, proteins that could not be annotated using existing sequence-based tools and databases. As a result, this approach allowed us to infer potential biological information for a considerable number of kinetoplastid proteins. Among these, we identified structural homologs to ubiquitous eukaryotic proteins that are challenging to detect in kinetoplastid genomes through standard genome annotation pipelines. The results (KASC, Kinetoplastid Annotation by Structural Comparison) are openly accessible to the community at kasc.fcien.edu.uy through a user-friendly, gene-by-gene interface that enables visual inspection of the data.</p>","PeriodicalId":48999,"journal":{"name":"PLoS Pathogens","volume":"21 4","pages":"e1013120"},"PeriodicalIF":5.5000,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12047770/pdf/","citationCount":"0","resultStr":"{\"title\":\"Expanding kinetoplastid genome annotation through protein structure comparison.\",\"authors\":\"Juan Manuel Trinidad-Barnech, José Sotelo-Silveira, Darío Fernández Do Porto, Pablo Smircich\",\"doi\":\"10.1371/journal.ppat.1013120\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Kinetoplastids belong to the Discoba supergroup, an early divergent eukaryotic clade. Although the amount of genomic information on these parasites has grown substantially, assigning gene functions through traditional sequence-based homology methods remains challenging. Recently, significant advancements have been made in in-silico protein structure prediction and algorithms for rapid and precise large-scale protein structure comparisons. In this work, we developed a protein structure-based homology search pipeline (ASC, Annotation by Structural Comparisons) and applied it to transfer biological information to all kinetoplastid proteins available in TriTrypDB, the reference database for this lineage. Our pipeline enabled the assignment of structural similarity to a substantial portion of kinetoplastid proteins, improving current knowledge through annotation transfer. Additionally, we identified structural homologs for representatives of 6,700 uncharacterized proteins across 33 kinetoplastid species, proteins that could not be annotated using existing sequence-based tools and databases. As a result, this approach allowed us to infer potential biological information for a considerable number of kinetoplastid proteins. Among these, we identified structural homologs to ubiquitous eukaryotic proteins that are challenging to detect in kinetoplastid genomes through standard genome annotation pipelines. The results (KASC, Kinetoplastid Annotation by Structural Comparison) are openly accessible to the community at kasc.fcien.edu.uy through a user-friendly, gene-by-gene interface that enables visual inspection of the data.</p>\",\"PeriodicalId\":48999,\"journal\":{\"name\":\"PLoS Pathogens\",\"volume\":\"21 4\",\"pages\":\"e1013120\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2025-04-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12047770/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PLoS Pathogens\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1371/journal.ppat.1013120\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/4/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"MICROBIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS Pathogens","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1371/journal.ppat.1013120","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0
摘要
着丝质体属于迪斯科超群,一个早期分化的真核生物分支。尽管这些寄生虫的基因组信息已经大量增加,但通过传统的基于序列的同源性方法来分配基因功能仍然具有挑战性。近年来,在蛋白质结构预测和快速精确的大规模蛋白质结构比较算法方面取得了重大进展。在这项工作中,我们开发了一个基于蛋白质结构的同源性搜索管道(ASC, Annotation by Structural comparison),并应用它将生物信息传递到TriTrypDB(该谱系的参考数据库)中可用的所有着丝质体蛋白质。我们的管道能够对大部分着丝质体蛋白的结构相似性进行分配,通过注释转移改善了现有的知识。此外,我们鉴定了33种着丝质体物种中6700种未表征蛋白的结构同源性,这些蛋白无法使用现有的基于序列的工具和数据库进行注释。因此,这种方法使我们能够推断出相当数量的着丝质体蛋白的潜在生物学信息。其中,我们鉴定了普遍存在的真核蛋白的结构同源物,这些蛋白很难通过标准的基因组注释管道在着丝质体基因组中检测到。结果(KASC, Kinetoplastid Annotation by Structural Comparison)可以在kasc.fcien.edu.uy上通过一个用户友好的、逐个基因的界面公开访问,可以对数据进行视觉检查。
Expanding kinetoplastid genome annotation through protein structure comparison.
Kinetoplastids belong to the Discoba supergroup, an early divergent eukaryotic clade. Although the amount of genomic information on these parasites has grown substantially, assigning gene functions through traditional sequence-based homology methods remains challenging. Recently, significant advancements have been made in in-silico protein structure prediction and algorithms for rapid and precise large-scale protein structure comparisons. In this work, we developed a protein structure-based homology search pipeline (ASC, Annotation by Structural Comparisons) and applied it to transfer biological information to all kinetoplastid proteins available in TriTrypDB, the reference database for this lineage. Our pipeline enabled the assignment of structural similarity to a substantial portion of kinetoplastid proteins, improving current knowledge through annotation transfer. Additionally, we identified structural homologs for representatives of 6,700 uncharacterized proteins across 33 kinetoplastid species, proteins that could not be annotated using existing sequence-based tools and databases. As a result, this approach allowed us to infer potential biological information for a considerable number of kinetoplastid proteins. Among these, we identified structural homologs to ubiquitous eukaryotic proteins that are challenging to detect in kinetoplastid genomes through standard genome annotation pipelines. The results (KASC, Kinetoplastid Annotation by Structural Comparison) are openly accessible to the community at kasc.fcien.edu.uy through a user-friendly, gene-by-gene interface that enables visual inspection of the data.
期刊介绍:
Bacteria, fungi, parasites, prions and viruses cause a plethora of diseases that have important medical, agricultural, and economic consequences. Moreover, the study of microbes continues to provide novel insights into such fundamental processes as the molecular basis of cellular and organismal function.