{"title":"基于Alphafold2预测的Pfam结构域结构变异性。","authors":"Elly Poretsky, Carson M Andorf, Taner Z Sen","doi":"10.1002/prot.70021","DOIUrl":null,"url":null,"abstract":"<p><p>Understanding the biological functions of proteins is one of the main goals of functional genomics. Such understanding will help control and manipulate biological processes to enhance desirable traits, including improved abiotic and biotic stress resistance in humans, animals, plants, and microbes. Protein domains, regarded as the functional building blocks of proteins, have been used extensively to predict protein function. Sequence-based approaches for protein function prediction, including the use of protein domain prediction from resources like the Pfam database, remain popular due to their reliability, low cost, and ease of use. Although the sequence variability of Pfam domains has been reported in several studies, their structural variability has been understudied. Here, we have extracted the Pfam domain structural portion from the predicted structures of the 16 model organism proteomes in the AlphaFold2 database. Our analysis revealed that many families contained between 20% and 40% members with no assigned regular secondary structures, demonstrating within-family structural variability. To better understand this structural variability, we used FoldSeek and agglomerative clustering to identify structural variability in Pfam families. We then analyzed specific cases to provide structural details for this variability. In this study, we have used two popular prediction applications/resources, Alphafold2 and Pfam, to demonstrate inherent variability in protein domain predictions by comparing their predicted structures. Our study shows that detection of structural variability in Pfam families can facilitate curation and refinement of Pfam families, while demonstrating the need to develop more accurate protein domain prediction workflows.</p>","PeriodicalId":56271,"journal":{"name":"Proteins-Structure Function and Bioinformatics","volume":" ","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Structural Variability of Pfam Domains Based on Alphafold2 Predictions.\",\"authors\":\"Elly Poretsky, Carson M Andorf, Taner Z Sen\",\"doi\":\"10.1002/prot.70021\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Understanding the biological functions of proteins is one of the main goals of functional genomics. Such understanding will help control and manipulate biological processes to enhance desirable traits, including improved abiotic and biotic stress resistance in humans, animals, plants, and microbes. Protein domains, regarded as the functional building blocks of proteins, have been used extensively to predict protein function. Sequence-based approaches for protein function prediction, including the use of protein domain prediction from resources like the Pfam database, remain popular due to their reliability, low cost, and ease of use. Although the sequence variability of Pfam domains has been reported in several studies, their structural variability has been understudied. Here, we have extracted the Pfam domain structural portion from the predicted structures of the 16 model organism proteomes in the AlphaFold2 database. Our analysis revealed that many families contained between 20% and 40% members with no assigned regular secondary structures, demonstrating within-family structural variability. To better understand this structural variability, we used FoldSeek and agglomerative clustering to identify structural variability in Pfam families. We then analyzed specific cases to provide structural details for this variability. In this study, we have used two popular prediction applications/resources, Alphafold2 and Pfam, to demonstrate inherent variability in protein domain predictions by comparing their predicted structures. Our study shows that detection of structural variability in Pfam families can facilitate curation and refinement of Pfam families, while demonstrating the need to develop more accurate protein domain prediction workflows.</p>\",\"PeriodicalId\":56271,\"journal\":{\"name\":\"Proteins-Structure Function and Bioinformatics\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-07-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proteins-Structure Function and Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1002/prot.70021\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proteins-Structure Function and Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/prot.70021","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
Structural Variability of Pfam Domains Based on Alphafold2 Predictions.
Understanding the biological functions of proteins is one of the main goals of functional genomics. Such understanding will help control and manipulate biological processes to enhance desirable traits, including improved abiotic and biotic stress resistance in humans, animals, plants, and microbes. Protein domains, regarded as the functional building blocks of proteins, have been used extensively to predict protein function. Sequence-based approaches for protein function prediction, including the use of protein domain prediction from resources like the Pfam database, remain popular due to their reliability, low cost, and ease of use. Although the sequence variability of Pfam domains has been reported in several studies, their structural variability has been understudied. Here, we have extracted the Pfam domain structural portion from the predicted structures of the 16 model organism proteomes in the AlphaFold2 database. Our analysis revealed that many families contained between 20% and 40% members with no assigned regular secondary structures, demonstrating within-family structural variability. To better understand this structural variability, we used FoldSeek and agglomerative clustering to identify structural variability in Pfam families. We then analyzed specific cases to provide structural details for this variability. In this study, we have used two popular prediction applications/resources, Alphafold2 and Pfam, to demonstrate inherent variability in protein domain predictions by comparing their predicted structures. Our study shows that detection of structural variability in Pfam families can facilitate curation and refinement of Pfam families, while demonstrating the need to develop more accurate protein domain prediction workflows.
期刊介绍:
PROTEINS : Structure, Function, and Bioinformatics publishes original reports of significant experimental and analytic research in all areas of protein research: structure, function, computation, genetics, and design. The journal encourages reports that present new experimental or computational approaches for interpreting and understanding data from biophysical chemistry, structural studies of proteins and macromolecular assemblies, alterations of protein structure and function engineered through techniques of molecular biology and genetics, functional analyses under physiologic conditions, as well as the interactions of proteins with receptors, nucleic acids, or other specific ligands or substrates. Research in protein and peptide biochemistry directed toward synthesizing or characterizing molecules that simulate aspects of the activity of proteins, or that act as inhibitors of protein function, is also within the scope of PROTEINS. In addition to full-length reports, short communications (usually not more than 4 printed pages) and prediction reports are welcome. Reviews are typically by invitation; authors are encouraged to submit proposed topics for consideration.