识别孔形成蛋白质的图形模型

IF 3.2 4区生物学 Q2 BIOCHEMISTRY & MOLECULAR BIOLOGY

Proteins-Structure Function and Bioinformatics Pub Date : 2024-04-15 DOI:10.1002/prot.26687

Nan Xu, Theodore W. Kahn, Theju Jacob, Yan Liu

{"title":"识别孔形成蛋白质的图形模型","authors":"Nan Xu, Theodore W. Kahn, Theju Jacob, Yan Liu","doi":"10.1002/prot.26687","DOIUrl":null,"url":null,"abstract":"Pore‐forming toxins (PFTs) are proteins that form lesions in biological membranes. Better understanding of the structure and function of these proteins will be beneficial in a number of biotechnological applications, including the development of new pest control methods in agriculture. When searching for new pore formers, existing sequence homology‐based methods fail to discover truly novel proteins with low sequence identity to known proteins. Search methodologies based on protein structures would help us move beyond this limitation. As the number of known structures for PFTs is very limited, it's quite challenging to identify new proteins having similar structures using computational approaches like deep learning. In this article, we therefore propose a sample‐efficient graphical model, where a protein structure graph is first constructed according to consensus secondary structures. A semi‐Markov conditional random fields model is then developed to perform protein sequence segmentation. We demonstrate that our method is able to distinguish structurally similar proteins even in the absence of sequence similarity (pairwise sequence identity < 0.4)—a feat not achievable by traditional approaches like HMMs. To extract proteins of interest from a genome‐wide protein database for further study, we also develop an efficient framework for UniRef50 with 43 million proteins.","PeriodicalId":56271,"journal":{"name":"Proteins-Structure Function and Bioinformatics","volume":"96 1","pages":""},"PeriodicalIF":3.2000,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Graphical models for identifying pore‐forming proteins\",\"authors\":\"Nan Xu, Theodore W. Kahn, Theju Jacob, Yan Liu\",\"doi\":\"10.1002/prot.26687\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Pore‐forming toxins (PFTs) are proteins that form lesions in biological membranes. Better understanding of the structure and function of these proteins will be beneficial in a number of biotechnological applications, including the development of new pest control methods in agriculture. When searching for new pore formers, existing sequence homology‐based methods fail to discover truly novel proteins with low sequence identity to known proteins. Search methodologies based on protein structures would help us move beyond this limitation. As the number of known structures for PFTs is very limited, it's quite challenging to identify new proteins having similar structures using computational approaches like deep learning. In this article, we therefore propose a sample‐efficient graphical model, where a protein structure graph is first constructed according to consensus secondary structures. A semi‐Markov conditional random fields model is then developed to perform protein sequence segmentation. We demonstrate that our method is able to distinguish structurally similar proteins even in the absence of sequence similarity (pairwise sequence identity < 0.4)—a feat not achievable by traditional approaches like HMMs. To extract proteins of interest from a genome‐wide protein database for further study, we also develop an efficient framework for UniRef50 with 43 million proteins.\",\"PeriodicalId\":56271,\"journal\":{\"name\":\"Proteins-Structure Function and Bioinformatics\",\"volume\":\"96 1\",\"pages\":\"\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2024-04-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proteins-Structure Function and Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1002/prot.26687\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proteins-Structure Function and Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/prot.26687","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

孔隙形成毒素（PFTs）是在生物膜上形成病变的蛋白质。更好地了解这些蛋白质的结构和功能将有利于许多生物技术应用，包括开发新的农业害虫控制方法。在寻找新的孔隙形成物时，现有的基于序列同源性的方法无法发现与已知蛋白质序列同一性较低的真正新型蛋白质。基于蛋白质结构的搜索方法将帮助我们突破这一局限。由于 PFT 的已知结构数量非常有限，因此使用深度学习等计算方法来识别具有相似结构的新蛋白质相当具有挑战性。因此，我们在本文中提出了一种样本高效图模型，即首先根据共识二级结构构建蛋白质结构图。然后建立一个半马尔科夫条件随机场模型来进行蛋白质序列分割。我们证明，即使在没有序列相似性（成对序列同一性为 0.4）的情况下，我们的方法也能区分结构相似的蛋白质--这是传统方法（如 HMMs）无法实现的。为了从全基因组蛋白质数据库中提取感兴趣的蛋白质供进一步研究，我们还为拥有 4300 万个蛋白质的 UniRef50 开发了一个高效框架。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Graphical models for identifying pore‐forming proteins

Pore‐forming toxins (PFTs) are proteins that form lesions in biological membranes. Better understanding of the structure and function of these proteins will be beneficial in a number of biotechnological applications, including the development of new pest control methods in agriculture. When searching for new pore formers, existing sequence homology‐based methods fail to discover truly novel proteins with low sequence identity to known proteins. Search methodologies based on protein structures would help us move beyond this limitation. As the number of known structures for PFTs is very limited, it's quite challenging to identify new proteins having similar structures using computational approaches like deep learning. In this article, we therefore propose a sample‐efficient graphical model, where a protein structure graph is first constructed according to consensus secondary structures. A semi‐Markov conditional random fields model is then developed to perform protein sequence segmentation. We demonstrate that our method is able to distinguish structurally similar proteins even in the absence of sequence similarity (pairwise sequence identity < 0.4)—a feat not achievable by traditional approaches like HMMs. To extract proteins of interest from a genome‐wide protein database for further study, we also develop an efficient framework for UniRef50 with 43 million proteins.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proteins-Structure Function and Bioinformatics 生物-生化与分子生物学

CiteScore

5.90

自引率

3.40%

发文量

172

审稿时长

3 months

期刊介绍： PROTEINS : Structure, Function, and Bioinformatics publishes original reports of significant experimental and analytic research in all areas of protein research: structure, function, computation, genetics, and design. The journal encourages reports that present new experimental or computational approaches for interpreting and understanding data from biophysical chemistry, structural studies of proteins and macromolecular assemblies, alterations of protein structure and function engineered through techniques of molecular biology and genetics, functional analyses under physiologic conditions, as well as the interactions of proteins with receptors, nucleic acids, or other specific ligands or substrates. Research in protein and peptide biochemistry directed toward synthesizing or characterizing molecules that simulate aspects of the activity of proteins, or that act as inhibitors of protein function, is also within the scope of PROTEINS. In addition to full-length reports, short communications (usually not more than 4 printed pages) and prediction reports are welcome. Reviews are typically by invitation; authors are encouraged to submit proposed topics for consideration.