{"title":"利用网络和统计方法识别蛋白质结构类别之间的序列差异。","authors":"Xiaogeng Wan, Xinying Tan","doi":"10.62617/mcb.v21.199","DOIUrl":null,"url":null,"abstract":"Protein sequence information are believed to embed the hint of their structures. In this study, we motivate to use network and statistical approaches to identify the sequential differences between different protein structural classes and between different structural motifs. By examine significant amino acid feature interactions and statistical distributions of feature series, both common and special characteristics are identified for the different protein structural types. Analyses suggest that all top protein structural classes of CATH and SCOP show Leu, Val, and Asn as the sources of strong feature interactions, while Cys, His, Trp, and Met exhibit weak intra-type interactions with other features. There are also significant interactions between amino acids features such as Ala and -helix and bend preference, Ala and side-chain size, Ala and Gly, and Met and Leu. These phenomena are observed in all structural classes, which are assumed to have little influences in distinguishing the different structures. In structures, Glu, Pro and side-chain size, hydrophobicity exhibit high importance in feature interactions, while in structures, Gly, Thr and physical properties such as -helix and bend preference, extended structural preference, pK-C value and surrounding hydrophobicity for structures show high importance in feature interactions. When comparing between the and structures, both types of structures show Ser as the common sources of feature interactions. The mixed and structures not only present common feature interactions with and structures, but exhibit special interactions between Met, Lys and double-bend preference property, and between the sequence arrangements of Cys, His, Met, Tyr and amino acid composition features. The intrinsically disordered proteins (IDPs) tends to present repetition patterns for a same kind of amino acids in high frequency Kmers, while the nine typical types of structural motifs also show different characteristics. Different value ranges are also found for different structural types according to statistical tests. The outcomes of this comparison study not only help to illuminate the mechanism of amino acid feature interactions in different types of structures, but also help us gain deeper understanding on how protein sequence influence structures.","PeriodicalId":153642,"journal":{"name":"Molecular & Cellular Biomechanics","volume":"87 4","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Identifying sequential differences between protein structural classes using network and statistical approaches.\",\"authors\":\"Xiaogeng Wan, Xinying Tan\",\"doi\":\"10.62617/mcb.v21.199\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Protein sequence information are believed to embed the hint of their structures. In this study, we motivate to use network and statistical approaches to identify the sequential differences between different protein structural classes and between different structural motifs. By examine significant amino acid feature interactions and statistical distributions of feature series, both common and special characteristics are identified for the different protein structural types. Analyses suggest that all top protein structural classes of CATH and SCOP show Leu, Val, and Asn as the sources of strong feature interactions, while Cys, His, Trp, and Met exhibit weak intra-type interactions with other features. There are also significant interactions between amino acids features such as Ala and -helix and bend preference, Ala and side-chain size, Ala and Gly, and Met and Leu. These phenomena are observed in all structural classes, which are assumed to have little influences in distinguishing the different structures. In structures, Glu, Pro and side-chain size, hydrophobicity exhibit high importance in feature interactions, while in structures, Gly, Thr and physical properties such as -helix and bend preference, extended structural preference, pK-C value and surrounding hydrophobicity for structures show high importance in feature interactions. When comparing between the and structures, both types of structures show Ser as the common sources of feature interactions. The mixed and structures not only present common feature interactions with and structures, but exhibit special interactions between Met, Lys and double-bend preference property, and between the sequence arrangements of Cys, His, Met, Tyr and amino acid composition features. The intrinsically disordered proteins (IDPs) tends to present repetition patterns for a same kind of amino acids in high frequency Kmers, while the nine typical types of structural motifs also show different characteristics. Different value ranges are also found for different structural types according to statistical tests. The outcomes of this comparison study not only help to illuminate the mechanism of amino acid feature interactions in different types of structures, but also help us gain deeper understanding on how protein sequence influence structures.\",\"PeriodicalId\":153642,\"journal\":{\"name\":\"Molecular & Cellular Biomechanics\",\"volume\":\"87 4\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular & Cellular Biomechanics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.62617/mcb.v21.199\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular & Cellular Biomechanics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.62617/mcb.v21.199","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
蛋白质序列信息被认为蕴含着其结构的暗示。在本研究中,我们利用网络和统计方法来识别不同蛋白质结构类别之间以及不同结构主题之间的序列差异。通过研究重要的氨基酸特征相互作用和特征序列的统计分布,我们发现了不同蛋白质结构类型的共性和特殊性。分析表明,CATH 和 SCOP 的所有顶级蛋白质结构类别都显示出 Leu、Val 和 Asn 是强特征相互作用的来源,而 Cys、His、Trp 和 Met 则显示出与其他特征的弱类型内相互作用。氨基酸特征之间也存在明显的相互作用,如 Ala 与螺旋和弯曲偏好、Ala 与侧链大小、Ala 与 Gly 以及 Met 与 Leu。这些现象在所有结构类别中都能观察到,但假定它们对区分不同结构的影响不大。在结构上,Glu、Pro 和侧链大小、疏水性在特征相互作用中表现出很高的重要性,而在结构上,Gly、Thr 和物理特性(如-螺旋和弯曲偏好、扩展结构偏好、pK-C 值和结构周围的疏水性)在特征相互作用中表现出很高的重要性。在比较混合结构和结构时,两类结构都显示 Ser 是特征相互作用的共同来源。混合结构和结构不仅与和结构之间存在共同的特征相互作用,而且在 Met、Lys 和双弯曲偏好特性之间,以及 Cys、His、Met、Tyr 的序列排列和氨基酸组成特征之间表现出特殊的相互作用。本征无序蛋白(IDPs)倾向于在高频 Kmers 中出现同类氨基酸的重复模式,而九种典型的结构图案也表现出不同的特征。根据统计检验,不同结构类型也有不同的数值范围。这项比较研究的结果不仅有助于阐明氨基酸特征在不同类型结构中的相互作用机制,而且有助于我们更深入地理解蛋白质序列如何影响结构。
Identifying sequential differences between protein structural classes using network and statistical approaches.
Protein sequence information are believed to embed the hint of their structures. In this study, we motivate to use network and statistical approaches to identify the sequential differences between different protein structural classes and between different structural motifs. By examine significant amino acid feature interactions and statistical distributions of feature series, both common and special characteristics are identified for the different protein structural types. Analyses suggest that all top protein structural classes of CATH and SCOP show Leu, Val, and Asn as the sources of strong feature interactions, while Cys, His, Trp, and Met exhibit weak intra-type interactions with other features. There are also significant interactions between amino acids features such as Ala and -helix and bend preference, Ala and side-chain size, Ala and Gly, and Met and Leu. These phenomena are observed in all structural classes, which are assumed to have little influences in distinguishing the different structures. In structures, Glu, Pro and side-chain size, hydrophobicity exhibit high importance in feature interactions, while in structures, Gly, Thr and physical properties such as -helix and bend preference, extended structural preference, pK-C value and surrounding hydrophobicity for structures show high importance in feature interactions. When comparing between the and structures, both types of structures show Ser as the common sources of feature interactions. The mixed and structures not only present common feature interactions with and structures, but exhibit special interactions between Met, Lys and double-bend preference property, and between the sequence arrangements of Cys, His, Met, Tyr and amino acid composition features. The intrinsically disordered proteins (IDPs) tends to present repetition patterns for a same kind of amino acids in high frequency Kmers, while the nine typical types of structural motifs also show different characteristics. Different value ranges are also found for different structural types according to statistical tests. The outcomes of this comparison study not only help to illuminate the mechanism of amino acid feature interactions in different types of structures, but also help us gain deeper understanding on how protein sequence influence structures.