{"title":"识别孔形成蛋白质的图形模型","authors":"Nan Xu, Theodore W. Kahn, Theju Jacob, Yan Liu","doi":"10.1002/prot.26687","DOIUrl":null,"url":null,"abstract":"Pore‐forming toxins (PFTs) are proteins that form lesions in biological membranes. Better understanding of the structure and function of these proteins will be beneficial in a number of biotechnological applications, including the development of new pest control methods in agriculture. When searching for new pore formers, existing sequence homology‐based methods fail to discover truly novel proteins with low sequence identity to known proteins. Search methodologies based on protein structures would help us move beyond this limitation. As the number of known structures for PFTs is very limited, it's quite challenging to identify new proteins having similar structures using computational approaches like deep learning. In this article, we therefore propose a sample‐efficient graphical model, where a protein structure graph is first constructed according to consensus secondary structures. A semi‐Markov conditional random fields model is then developed to perform protein sequence segmentation. We demonstrate that our method is able to distinguish structurally similar proteins even in the absence of sequence similarity (pairwise sequence identity < 0.4)—a feat not achievable by traditional approaches like HMMs. To extract proteins of interest from a genome‐wide protein database for further study, we also develop an efficient framework for UniRef50 with 43 million proteins.","PeriodicalId":56271,"journal":{"name":"Proteins-Structure Function and Bioinformatics","volume":"96 1","pages":""},"PeriodicalIF":3.2000,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Graphical models for identifying pore‐forming proteins\",\"authors\":\"Nan Xu, Theodore W. Kahn, Theju Jacob, Yan Liu\",\"doi\":\"10.1002/prot.26687\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Pore‐forming toxins (PFTs) are proteins that form lesions in biological membranes. Better understanding of the structure and function of these proteins will be beneficial in a number of biotechnological applications, including the development of new pest control methods in agriculture. When searching for new pore formers, existing sequence homology‐based methods fail to discover truly novel proteins with low sequence identity to known proteins. Search methodologies based on protein structures would help us move beyond this limitation. As the number of known structures for PFTs is very limited, it's quite challenging to identify new proteins having similar structures using computational approaches like deep learning. In this article, we therefore propose a sample‐efficient graphical model, where a protein structure graph is first constructed according to consensus secondary structures. A semi‐Markov conditional random fields model is then developed to perform protein sequence segmentation. We demonstrate that our method is able to distinguish structurally similar proteins even in the absence of sequence similarity (pairwise sequence identity < 0.4)—a feat not achievable by traditional approaches like HMMs. To extract proteins of interest from a genome‐wide protein database for further study, we also develop an efficient framework for UniRef50 with 43 million proteins.\",\"PeriodicalId\":56271,\"journal\":{\"name\":\"Proteins-Structure Function and Bioinformatics\",\"volume\":\"96 1\",\"pages\":\"\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2024-04-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proteins-Structure Function and Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1002/prot.26687\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proteins-Structure Function and Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/prot.26687","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
Graphical models for identifying pore‐forming proteins
Pore‐forming toxins (PFTs) are proteins that form lesions in biological membranes. Better understanding of the structure and function of these proteins will be beneficial in a number of biotechnological applications, including the development of new pest control methods in agriculture. When searching for new pore formers, existing sequence homology‐based methods fail to discover truly novel proteins with low sequence identity to known proteins. Search methodologies based on protein structures would help us move beyond this limitation. As the number of known structures for PFTs is very limited, it's quite challenging to identify new proteins having similar structures using computational approaches like deep learning. In this article, we therefore propose a sample‐efficient graphical model, where a protein structure graph is first constructed according to consensus secondary structures. A semi‐Markov conditional random fields model is then developed to perform protein sequence segmentation. We demonstrate that our method is able to distinguish structurally similar proteins even in the absence of sequence similarity (pairwise sequence identity < 0.4)—a feat not achievable by traditional approaches like HMMs. To extract proteins of interest from a genome‐wide protein database for further study, we also develop an efficient framework for UniRef50 with 43 million proteins.
期刊介绍:
PROTEINS : Structure, Function, and Bioinformatics publishes original reports of significant experimental and analytic research in all areas of protein research: structure, function, computation, genetics, and design. The journal encourages reports that present new experimental or computational approaches for interpreting and understanding data from biophysical chemistry, structural studies of proteins and macromolecular assemblies, alterations of protein structure and function engineered through techniques of molecular biology and genetics, functional analyses under physiologic conditions, as well as the interactions of proteins with receptors, nucleic acids, or other specific ligands or substrates. Research in protein and peptide biochemistry directed toward synthesizing or characterizing molecules that simulate aspects of the activity of proteins, or that act as inhibitors of protein function, is also within the scope of PROTEINS. In addition to full-length reports, short communications (usually not more than 4 printed pages) and prediction reports are welcome. Reviews are typically by invitation; authors are encouraged to submit proposed topics for consideration.