{"title":"Functional (re)annotation of Mycobacteroides abscessus proteome using integrative sequence and AI-based structural approaches","authors":"Pranavathiyani Gnanasekar , Simran Gambhir , Priyadarshan Kinatukara , Anshu Bhardwaj","doi":"10.1016/j.crstbi.2025.100172","DOIUrl":null,"url":null,"abstract":"<div><div>Functional annotation of proteins is crucial in understanding the basic biology of organisms. In the context of pathogens, it can provide valuable insights towards its functional landscape contributing to understanding the molecular mechanisms of pathogenesis and survival. In this study, we explored the applications of sequence and AI-driven structure-based methods to functionally (re)annotate <em>Mycobacteroides abscessus</em> (MAB). MAB is an opportunistic pathogen responsible for causing infections in immunocompromised patients and exhibits resistance to several antibiotics. The global rise in drug-resistant strains and the recently identified potential for indirect human-to-human transmission emphasizes the importance of understanding MAB as a critical pathogen. However, there is a huge gap in our understanding of the MAB proteome, which is vital not only for understanding the functional aspects of various proteins but also for prioritizing drug targets for therapeutic development. Presently, 28 % of the MAB proteome, as available in UniProtKB, is poorly annotated, and more than a fourth of MAB proteome lack gene ontology (GO) terms, indicating a lack of standard functional descriptions. To this end, the present study aims to functionally (re)annotate MAB proteome using a combination of sequence and structure-based approaches in a systematic way. We performed sequence-based similarity search against NR database and performed HMM based search for functional domains with Pfam and CATH. Then, we utilized MAB AlphaFold-predicted structures to annotate MAB proteins with structure-based similarity search using Foldseek to identify proteins and transfer their gene ontology (GO) annotations. We assigned new GO annotations (374 proteins) and refined the existing annotations (885 proteins) for previously unannotated essential genes of MAB. In addition, we also performed annotations using an integrated sequence and structure-based approach for the 29 proteins for which AlphaFold structures were not available. In the end, structural comparisons of a few proteins that were similar to <em>Mycobacterium tuberculosis</em> were explored, revealing residue-level differences in MAB linked to drug resistance. Our study highlights a combined sequence- and AI-driven structure-based approach for large-scale proteome functional annotation, which can be applied to any organism of interest.</div></div>","PeriodicalId":10870,"journal":{"name":"Current Research in Structural Biology","volume":"10 ","pages":"Article 100172"},"PeriodicalIF":2.7000,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Research in Structural Biology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2665928X25000091","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Functional annotation of proteins is crucial in understanding the basic biology of organisms. In the context of pathogens, it can provide valuable insights towards its functional landscape contributing to understanding the molecular mechanisms of pathogenesis and survival. In this study, we explored the applications of sequence and AI-driven structure-based methods to functionally (re)annotate Mycobacteroides abscessus (MAB). MAB is an opportunistic pathogen responsible for causing infections in immunocompromised patients and exhibits resistance to several antibiotics. The global rise in drug-resistant strains and the recently identified potential for indirect human-to-human transmission emphasizes the importance of understanding MAB as a critical pathogen. However, there is a huge gap in our understanding of the MAB proteome, which is vital not only for understanding the functional aspects of various proteins but also for prioritizing drug targets for therapeutic development. Presently, 28 % of the MAB proteome, as available in UniProtKB, is poorly annotated, and more than a fourth of MAB proteome lack gene ontology (GO) terms, indicating a lack of standard functional descriptions. To this end, the present study aims to functionally (re)annotate MAB proteome using a combination of sequence and structure-based approaches in a systematic way. We performed sequence-based similarity search against NR database and performed HMM based search for functional domains with Pfam and CATH. Then, we utilized MAB AlphaFold-predicted structures to annotate MAB proteins with structure-based similarity search using Foldseek to identify proteins and transfer their gene ontology (GO) annotations. We assigned new GO annotations (374 proteins) and refined the existing annotations (885 proteins) for previously unannotated essential genes of MAB. In addition, we also performed annotations using an integrated sequence and structure-based approach for the 29 proteins for which AlphaFold structures were not available. In the end, structural comparisons of a few proteins that were similar to Mycobacterium tuberculosis were explored, revealing residue-level differences in MAB linked to drug resistance. Our study highlights a combined sequence- and AI-driven structure-based approach for large-scale proteome functional annotation, which can be applied to any organism of interest.