基于Alphafold2预测的Pfam结构域结构变异性。

IF 2.8 4区生物学 Q2 BIOCHEMISTRY & MOLECULAR BIOLOGY

Proteins-Structure Function and Bioinformatics Pub Date : 2025-07-22 DOI:10.1002/prot.70021

Elly Poretsky, Carson M Andorf, Taner Z Sen

{"title":"基于Alphafold2预测的Pfam结构域结构变异性。","authors":"Elly Poretsky, Carson M Andorf, Taner Z Sen","doi":"10.1002/prot.70021","DOIUrl":null,"url":null,"abstract":"Understanding the biological functions of proteins is one of the main goals of functional genomics. Such understanding will help control and manipulate biological processes to enhance desirable traits, including improved abiotic and biotic stress resistance in humans, animals, plants, and microbes. Protein domains, regarded as the functional building blocks of proteins, have been used extensively to predict protein function. Sequence-based approaches for protein function prediction, including the use of protein domain prediction from resources like the Pfam database, remain popular due to their reliability, low cost, and ease of use. Although the sequence variability of Pfam domains has been reported in several studies, their structural variability has been understudied. Here, we have extracted the Pfam domain structural portion from the predicted structures of the 16 model organism proteomes in the AlphaFold2 database. Our analysis revealed that many families contained between 20% and 40% members with no assigned regular secondary structures, demonstrating within-family structural variability. To better understand this structural variability, we used FoldSeek and agglomerative clustering to identify structural variability in Pfam families. We then analyzed specific cases to provide structural details for this variability. In this study, we have used two popular prediction applications/resources, Alphafold2 and Pfam, to demonstrate inherent variability in protein domain predictions by comparing their predicted structures. Our study shows that detection of structural variability in Pfam families can facilitate curation and refinement of Pfam families, while demonstrating the need to develop more accurate protein domain prediction workflows.","PeriodicalId":56271,"journal":{"name":"Proteins-Structure Function and Bioinformatics","volume":" ","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Structural Variability of Pfam Domains Based on Alphafold2 Predictions.\",\"authors\":\"Elly Poretsky, Carson M Andorf, Taner Z Sen\",\"doi\":\"10.1002/prot.70021\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Understanding the biological functions of proteins is one of the main goals of functional genomics. Such understanding will help control and manipulate biological processes to enhance desirable traits, including improved abiotic and biotic stress resistance in humans, animals, plants, and microbes. Protein domains, regarded as the functional building blocks of proteins, have been used extensively to predict protein function. Sequence-based approaches for protein function prediction, including the use of protein domain prediction from resources like the Pfam database, remain popular due to their reliability, low cost, and ease of use. Although the sequence variability of Pfam domains has been reported in several studies, their structural variability has been understudied. Here, we have extracted the Pfam domain structural portion from the predicted structures of the 16 model organism proteomes in the AlphaFold2 database. Our analysis revealed that many families contained between 20% and 40% members with no assigned regular secondary structures, demonstrating within-family structural variability. To better understand this structural variability, we used FoldSeek and agglomerative clustering to identify structural variability in Pfam families. We then analyzed specific cases to provide structural details for this variability. In this study, we have used two popular prediction applications/resources, Alphafold2 and Pfam, to demonstrate inherent variability in protein domain predictions by comparing their predicted structures. Our study shows that detection of structural variability in Pfam families can facilitate curation and refinement of Pfam families, while demonstrating the need to develop more accurate protein domain prediction workflows.\",\"PeriodicalId\":56271,\"journal\":{\"name\":\"Proteins-Structure Function and Bioinformatics\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-07-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proteins-Structure Function and Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1002/prot.70021\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proteins-Structure Function and Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/prot.70021","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

了解蛋白质的生物学功能是功能基因组学的主要目标之一。这样的理解将有助于控制和操纵生物过程，以增强理想的特性，包括提高人类、动物、植物和微生物的非生物和生物抗逆性。蛋白质结构域被认为是蛋白质的功能组成部分，已被广泛用于预测蛋白质的功能。基于序列的蛋白质功能预测方法，包括使用来自Pfam数据库等资源的蛋白质结构域预测，由于其可靠性，低成本和易用性而受到欢迎。虽然在一些研究中已经报道了Pfam结构域的序列变异性，但它们的结构变异性尚未得到充分的研究。在这里，我们从AlphaFold2数据库中16个模式生物蛋白质组的预测结构中提取了Pfam结构域结构部分。我们的分析显示，许多家庭包含20%至40%的成员，没有指定的常规二级结构，表明家庭内部结构的可变性。为了更好地理解这种结构变异性，我们使用FoldSeek和聚集聚类来识别Pfam家族的结构变异性。然后，我们分析了具体的案例，以提供这种可变性的结构细节。在这项研究中，我们使用了两个流行的预测应用程序/资源，Alphafold2和Pfam，通过比较它们预测的结构来证明蛋白质结构域预测的内在可变性。我们的研究表明，检测Pfam家族的结构变异性可以促进Pfam家族的管理和细化，同时证明需要开发更准确的蛋白质结构域预测工作流程。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Structural Variability of Pfam Domains Based on Alphafold2 Predictions.

Understanding the biological functions of proteins is one of the main goals of functional genomics. Such understanding will help control and manipulate biological processes to enhance desirable traits, including improved abiotic and biotic stress resistance in humans, animals, plants, and microbes. Protein domains, regarded as the functional building blocks of proteins, have been used extensively to predict protein function. Sequence-based approaches for protein function prediction, including the use of protein domain prediction from resources like the Pfam database, remain popular due to their reliability, low cost, and ease of use. Although the sequence variability of Pfam domains has been reported in several studies, their structural variability has been understudied. Here, we have extracted the Pfam domain structural portion from the predicted structures of the 16 model organism proteomes in the AlphaFold2 database. Our analysis revealed that many families contained between 20% and 40% members with no assigned regular secondary structures, demonstrating within-family structural variability. To better understand this structural variability, we used FoldSeek and agglomerative clustering to identify structural variability in Pfam families. We then analyzed specific cases to provide structural details for this variability. In this study, we have used two popular prediction applications/resources, Alphafold2 and Pfam, to demonstrate inherent variability in protein domain predictions by comparing their predicted structures. Our study shows that detection of structural variability in Pfam families can facilitate curation and refinement of Pfam families, while demonstrating the need to develop more accurate protein domain prediction workflows.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proteins-Structure Function and Bioinformatics 生物-生化与分子生物学

CiteScore

5.90

自引率

3.40%

发文量

172

审稿时长

3 months

期刊介绍： PROTEINS : Structure, Function, and Bioinformatics publishes original reports of significant experimental and analytic research in all areas of protein research: structure, function, computation, genetics, and design. The journal encourages reports that present new experimental or computational approaches for interpreting and understanding data from biophysical chemistry, structural studies of proteins and macromolecular assemblies, alterations of protein structure and function engineered through techniques of molecular biology and genetics, functional analyses under physiologic conditions, as well as the interactions of proteins with receptors, nucleic acids, or other specific ligands or substrates. Research in protein and peptide biochemistry directed toward synthesizing or characterizing molecules that simulate aspects of the activity of proteins, or that act as inhibitors of protein function, is also within the scope of PROTEINS. In addition to full-length reports, short communications (usually not more than 4 printed pages) and prediction reports are welcome. Reviews are typically by invitation; authors are encouraged to submit proposed topics for consideration.