Jamie D Dixson, Abhijay Azad, Pamela A Padilla, Rajeev K Azad
{"title":"Inference of Cytochrome P450 Evolutionary History Using Structural and Physicochemical Metrics.","authors":"Jamie D Dixson, Abhijay Azad, Pamela A Padilla, Rajeev K Azad","doi":"10.1093/gbe/evaf178","DOIUrl":null,"url":null,"abstract":"<p><p>Cytochrome P450s are a superfamily of heme-binding monooxygenases involved with the detoxification of intrinsic and extrinsic toxins. They are near ubiquitous within biological domains and are found in all domains. Members of families within the superfamily are defined based on amino acid identity thresholds, with thresholds as low as 40% in some families. Relationships among Cytochrome P450 families have proven elusive due to sub-Twilight Zone interfamily identities (<30%) that result in poor multiple sequence alignment quality and thus low levels of support for downstream phylogenetic reconstructions. Despite the low identities, Cytochrome P450 structures are remarkably well conserved both within and among families. In such cases, structural phylogenetics has the potential to unveil elusive relationships because the selectively favored physicochemical properties giving rise to the structure and function of the proteins persist despite sequence-level divergence. Recently, in two separate publications, we demonstrated that by utilizing physicochemical vectors, dynamic time warping, and hierarchical clustering (PCDTW), large swaths of protein domain families and betacoronavirus receptor-binding domain clades were congruent with validated functional/structural relationships. These were important findings because anomalous sequence alignment-based maximum likelihood phylogenetic findings, which were not congruent with the known functional relationships, were resolved. That also validated the use of physicochemical vectors in making inferences about structural/functional homology. Additionally, it illuminated that the same methods might be applied to other protein families with relationships that are difficult to resolve from sequence data alone. Herein, we used Molecular Weight and Hydrophobicity Physicochemical Dynamic Time Warping (MWHP PCDTW) along with structural and sequence alignment-based phylogenetic methodologies to analyze all of the Cytochrome P450s found both in the high-fidelity Structural Classificaction of Proteins (SCOP) database and the reviewed sequences with both experimentally resolved and de novo predicted structures in the Protein Data Bank and the AlphaFold (AF) Protein Structure Database, respectively. We compared the resulting phylogenetic topologies and found that in some cases, structure-based methods may be less able to resolve random/convergent similarity than physicochemical and sequence-based methodologies. This finding agrees with previous findings that demonstrate the usefulness of physicochemical properties in resolving both random structural similarity and potentially convergent relationships.</p>","PeriodicalId":12779,"journal":{"name":"Genome Biology and Evolution","volume":" ","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12502919/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome Biology and Evolution","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/gbe/evaf178","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"EVOLUTIONARY BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Cytochrome P450s are a superfamily of heme-binding monooxygenases involved with the detoxification of intrinsic and extrinsic toxins. They are near ubiquitous within biological domains and are found in all domains. Members of families within the superfamily are defined based on amino acid identity thresholds, with thresholds as low as 40% in some families. Relationships among Cytochrome P450 families have proven elusive due to sub-Twilight Zone interfamily identities (<30%) that result in poor multiple sequence alignment quality and thus low levels of support for downstream phylogenetic reconstructions. Despite the low identities, Cytochrome P450 structures are remarkably well conserved both within and among families. In such cases, structural phylogenetics has the potential to unveil elusive relationships because the selectively favored physicochemical properties giving rise to the structure and function of the proteins persist despite sequence-level divergence. Recently, in two separate publications, we demonstrated that by utilizing physicochemical vectors, dynamic time warping, and hierarchical clustering (PCDTW), large swaths of protein domain families and betacoronavirus receptor-binding domain clades were congruent with validated functional/structural relationships. These were important findings because anomalous sequence alignment-based maximum likelihood phylogenetic findings, which were not congruent with the known functional relationships, were resolved. That also validated the use of physicochemical vectors in making inferences about structural/functional homology. Additionally, it illuminated that the same methods might be applied to other protein families with relationships that are difficult to resolve from sequence data alone. Herein, we used Molecular Weight and Hydrophobicity Physicochemical Dynamic Time Warping (MWHP PCDTW) along with structural and sequence alignment-based phylogenetic methodologies to analyze all of the Cytochrome P450s found both in the high-fidelity Structural Classificaction of Proteins (SCOP) database and the reviewed sequences with both experimentally resolved and de novo predicted structures in the Protein Data Bank and the AlphaFold (AF) Protein Structure Database, respectively. We compared the resulting phylogenetic topologies and found that in some cases, structure-based methods may be less able to resolve random/convergent similarity than physicochemical and sequence-based methodologies. This finding agrees with previous findings that demonstrate the usefulness of physicochemical properties in resolving both random structural similarity and potentially convergent relationships.
期刊介绍:
About the journal
Genome Biology and Evolution (GBE) publishes leading original research at the interface between evolutionary biology and genomics. Papers considered for publication report novel evolutionary findings that concern natural genome diversity, population genomics, the structure, function, organisation and expression of genomes, comparative genomics, proteomics, and environmental genomic interactions. Major evolutionary insights from the fields of computational biology, structural biology, developmental biology, and cell biology are also considered, as are theoretical advances in the field of genome evolution. GBE’s scope embraces genome-wide evolutionary investigations at all taxonomic levels and for all forms of life — within populations or across domains. Its aims are to further the understanding of genomes in their evolutionary context and further the understanding of evolution from a genome-wide perspective.