A PDB-wide, evolution-based assessment of protein-protein interfaces

Q3 Biochemistry, Genetics and Molecular Biology
Kumaran Baskaran, Jose M Duarte, Nikhil Biyani, Spencer Bliven, Guido Capitani
{"title":"A PDB-wide, evolution-based assessment of protein-protein interfaces","authors":"Kumaran Baskaran,&nbsp;Jose M Duarte,&nbsp;Nikhil Biyani,&nbsp;Spencer Bliven,&nbsp;Guido Capitani","doi":"10.1186/s12900-014-0022-0","DOIUrl":null,"url":null,"abstract":"<p>Thanks to the growth in sequence and structure databases, more than 50 million sequences are now available in UniProt and 100,000 structures in the PDB. Rich information about protein-protein interfaces can be obtained by a comprehensive study of protein contacts in the PDB, their sequence conservation and geometric features.</p><p>An automated computational pipeline was developed to run our Evolutionary protein-protein Interface Classifier (EPPIC) software on the entire PDB and store the results in a relational database, currently containing &gt; 800,000 interfaces. This allows the analysis of interface data on a PDB-wide scale. Two large benchmark datasets of biological interfaces and crystal contacts, each containing about 3000 entries, were automatically generated based on criteria thought to be strong indicators of interface type. The BioMany set of biological interfaces includes NMR dimers solved as crystal structures and interfaces that are preserved across diverse crystal forms, as catalogued by the Protein Common Interface Database (ProtCID) from Xu and Dunbrack. The second dataset, XtalMany, is derived from interfaces that would lead to infinite assemblies and are therefore crystal contacts. BioMany and XtalMany were used to benchmark the EPPIC approach. The performance of EPPIC was also compared to classifications from the Protein Interfaces, Surfaces, and Assemblies (PISA) program on a PDB-wide scale, finding that the two approaches give the same call in about 88% of PDB interfaces. By comparing our safest predictions to the PDB author annotations, we provide a lower-bound estimate of the error rate of biological unit annotations in the PDB. Additionally, we developed a PyMOL plugin for direct download and easy visualization of EPPIC interfaces for any PDB entry. Both the datasets and the PyMOL plugin are available at http://www.eppic-web.org/ewui/#downloads.</p><p>Our computational pipeline allows us to analyze protein-protein contacts and their sequence conservation across the entire PDB. Two new benchmark datasets are provided, which are over an order of magnitude larger than existing manually curated ones. These tools enable the comprehensive study of several aspects of protein-protein contacts in the PDB and represent a basis for future, even larger scale studies of protein-protein interactions.</p>","PeriodicalId":51240,"journal":{"name":"BMC Structural Biology","volume":"14 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s12900-014-0022-0","citationCount":"46","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Structural Biology","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.1186/s12900-014-0022-0","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Biochemistry, Genetics and Molecular Biology","Score":null,"Total":0}
引用次数: 46

Abstract

Thanks to the growth in sequence and structure databases, more than 50 million sequences are now available in UniProt and 100,000 structures in the PDB. Rich information about protein-protein interfaces can be obtained by a comprehensive study of protein contacts in the PDB, their sequence conservation and geometric features.

An automated computational pipeline was developed to run our Evolutionary protein-protein Interface Classifier (EPPIC) software on the entire PDB and store the results in a relational database, currently containing > 800,000 interfaces. This allows the analysis of interface data on a PDB-wide scale. Two large benchmark datasets of biological interfaces and crystal contacts, each containing about 3000 entries, were automatically generated based on criteria thought to be strong indicators of interface type. The BioMany set of biological interfaces includes NMR dimers solved as crystal structures and interfaces that are preserved across diverse crystal forms, as catalogued by the Protein Common Interface Database (ProtCID) from Xu and Dunbrack. The second dataset, XtalMany, is derived from interfaces that would lead to infinite assemblies and are therefore crystal contacts. BioMany and XtalMany were used to benchmark the EPPIC approach. The performance of EPPIC was also compared to classifications from the Protein Interfaces, Surfaces, and Assemblies (PISA) program on a PDB-wide scale, finding that the two approaches give the same call in about 88% of PDB interfaces. By comparing our safest predictions to the PDB author annotations, we provide a lower-bound estimate of the error rate of biological unit annotations in the PDB. Additionally, we developed a PyMOL plugin for direct download and easy visualization of EPPIC interfaces for any PDB entry. Both the datasets and the PyMOL plugin are available at http://www.eppic-web.org/ewui/#downloads.

Our computational pipeline allows us to analyze protein-protein contacts and their sequence conservation across the entire PDB. Two new benchmark datasets are provided, which are over an order of magnitude larger than existing manually curated ones. These tools enable the comprehensive study of several aspects of protein-protein contacts in the PDB and represent a basis for future, even larger scale studies of protein-protein interactions.

Abstract Image

基于进化的蛋白质界面的pdb范围内评估
由于序列和结构数据库的增长,UniProt中现在有超过5000万个序列,PDB中有10万个结构。通过对PDB中蛋白质接触及其序列保守性和几何特征的全面研究,可以获得丰富的蛋白质-蛋白质界面信息。我们开发了一个自动化的计算管道,用于在整个PDB上运行我们的进化蛋白-蛋白接口分类器(Evolutionary protein-protein Interface Classifier, EPPIC)软件,并将结果存储在关系数据库中。800000接口。这允许在pdb范围内分析接口数据。根据被认为是界面类型强有力指标的标准,自动生成了生物界面和晶体接触的两个大型基准数据集,每个数据集包含约3000个条目。BioMany生物界面集包括作为晶体结构的核磁共振二聚体和保存在不同晶体形式中的界面,如Xu和Dunbrack的蛋白质公共界面数据库(ProtCID)所分类。第二个数据集,XtalMany,是从接口派生出来的,这些接口会导致无限的组装,因此是晶体接触。使用BioMany和XtalMany对EPPIC方法进行基准测试。我们还将EPPIC的性能与蛋白质接口、表面和组装(PISA)项目的分类进行了比较,发现这两种方法在大约88%的PDB接口中给出了相同的调用。通过将我们最安全的预测与PDB作者注释进行比较,我们提供了PDB中生物单位注释错误率的下限估计。此外,我们还开发了一个PyMOL插件,用于直接下载和方便地可视化任何PDB条目的EPPIC接口。数据集和PyMOL插件都可以在http://www.eppic-web.org/ewui/#downloads.Our上获得,计算管道允许我们分析整个PDB中的蛋白质-蛋白质接触及其序列保守性。提供了两个新的基准数据集,它们比现有的手动管理的数据大一个数量级。这些工具能够全面研究PDB中蛋白质-蛋白质接触的几个方面,并为未来更大规模的蛋白质-蛋白质相互作用研究奠定基础。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
3.60
自引率
0.00%
发文量
0
审稿时长
>12 weeks
期刊介绍: BMC Structural Biology is an open access, peer-reviewed journal that considers articles on investigations into the structure of biological macromolecules, including solving structures, structural and functional analyses, and computational modeling.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信