Jacob Sumner, Grace Meng, Naomi Brandt, Alex T. Grigas, Andrés Córdoba, Mark D. Shattuck, Corey S. O'Hern
{"title":"评估蛋白质-蛋白质界面计算模型的评分函数","authors":"Jacob Sumner, Grace Meng, Naomi Brandt, Alex T. Grigas, Andrés Córdoba, Mark D. Shattuck, Corey S. O'Hern","doi":"arxiv-2407.16580","DOIUrl":null,"url":null,"abstract":"A goal of computational studies of protein-protein interfaces (PPIs) is to\npredict the binding site between two monomers that form a heterodimer. The\nsimplest version of this problem is to rigidly re-dock the bound forms of the\nmonomers, which involves generating computational models of the heterodimer and\nthen scoring them to determine the most native-like models. Scoring functions\nhave been assessed previously using rank- and classification-based metrics,\nhowever, these methods are sensitive to the number and quality of models in the\nscoring function training set. We assess the accuracy of seven PPI scoring\nfunctions by comparing their scores to a measure of structural similarity to\nthe x-ray crystal structure (i.e. the DockQ score) for a non-redundant set of\nheterodimers from the Protein Data Bank. For each heterodimer, we generate\nre-docked models uniformly sampled over DockQ and calculate the Spearman\ncorrelation between the PPI scores and DockQ. For some targets, the scores and\nDockQ are highly correlated; however, for many targets, there are weak\ncorrelations. Several physical features can explain the difference between\ndifficult- and easy-to-score targets. For example, strong correlations exist\nbetween the score and DockQ for targets with highly intertwined monomers and\nmany interface contacts. We also develop a new score based on only three\nphysical features that matches or exceeds the performance of current PPI\nscoring functions. These results emphasize that PPI prediction can be improved\nby focusing on correlations between the PPI score and DockQ and incorporating\nmore discriminating physical features into PPI scoring functions.","PeriodicalId":501022,"journal":{"name":"arXiv - QuanBio - Biomolecules","volume":"45 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Assessment of scoring functions for computational models of protein-protein interfaces\",\"authors\":\"Jacob Sumner, Grace Meng, Naomi Brandt, Alex T. Grigas, Andrés Córdoba, Mark D. Shattuck, Corey S. O'Hern\",\"doi\":\"arxiv-2407.16580\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A goal of computational studies of protein-protein interfaces (PPIs) is to\\npredict the binding site between two monomers that form a heterodimer. The\\nsimplest version of this problem is to rigidly re-dock the bound forms of the\\nmonomers, which involves generating computational models of the heterodimer and\\nthen scoring them to determine the most native-like models. Scoring functions\\nhave been assessed previously using rank- and classification-based metrics,\\nhowever, these methods are sensitive to the number and quality of models in the\\nscoring function training set. We assess the accuracy of seven PPI scoring\\nfunctions by comparing their scores to a measure of structural similarity to\\nthe x-ray crystal structure (i.e. the DockQ score) for a non-redundant set of\\nheterodimers from the Protein Data Bank. For each heterodimer, we generate\\nre-docked models uniformly sampled over DockQ and calculate the Spearman\\ncorrelation between the PPI scores and DockQ. For some targets, the scores and\\nDockQ are highly correlated; however, for many targets, there are weak\\ncorrelations. Several physical features can explain the difference between\\ndifficult- and easy-to-score targets. For example, strong correlations exist\\nbetween the score and DockQ for targets with highly intertwined monomers and\\nmany interface contacts. We also develop a new score based on only three\\nphysical features that matches or exceeds the performance of current PPI\\nscoring functions. These results emphasize that PPI prediction can be improved\\nby focusing on correlations between the PPI score and DockQ and incorporating\\nmore discriminating physical features into PPI scoring functions.\",\"PeriodicalId\":501022,\"journal\":{\"name\":\"arXiv - QuanBio - Biomolecules\",\"volume\":\"45 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - QuanBio - Biomolecules\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2407.16580\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Biomolecules","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.16580","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
蛋白质-蛋白质界面(PPI)计算研究的一个目标是预测形成异源二聚体的两个单体之间的结合位点。这个问题的最简单版本是对单体的结合形式进行刚性再对接,包括生成异源二聚体的计算模型,然后对它们进行评分,以确定最像本机的模型。以前曾使用基于等级和分类的指标对评分功能进行过评估,但是这些方法对评分功能训练集中模型的数量和质量很敏感。我们评估了七种 PPI 评分函数的准确性,方法是将它们的得分与蛋白质数据库中一组非冗余异源二聚体的 X 射线晶体结构相似度(即 DockQ 分数)进行比较。对于每个异源二聚体,我们都会生成在 DockQ 上均匀采样的对接模型,并计算 PPI 得分与 DockQ 之间的 Spearmancorrelation(斯皮尔曼相关性)。对于某些靶标,得分与 DockQ 高度相关;但对于许多靶标,相关性较弱。一些物理特征可以解释难得分目标和易得分目标之间的差异。例如,对于具有高度交织单体和大量界面接触的目标,得分与 DockQ 之间存在很强的相关性。我们还开发了一种仅基于三个物理特征的新评分方法,其性能可媲美或超越当前的 PPI 评分函数。这些结果表明,通过关注 PPI 得分与 DockQ 之间的相关性,并在 PPI 评分函数中加入更多区分性物理特征,可以改进 PPI 预测。
Assessment of scoring functions for computational models of protein-protein interfaces
A goal of computational studies of protein-protein interfaces (PPIs) is to
predict the binding site between two monomers that form a heterodimer. The
simplest version of this problem is to rigidly re-dock the bound forms of the
monomers, which involves generating computational models of the heterodimer and
then scoring them to determine the most native-like models. Scoring functions
have been assessed previously using rank- and classification-based metrics,
however, these methods are sensitive to the number and quality of models in the
scoring function training set. We assess the accuracy of seven PPI scoring
functions by comparing their scores to a measure of structural similarity to
the x-ray crystal structure (i.e. the DockQ score) for a non-redundant set of
heterodimers from the Protein Data Bank. For each heterodimer, we generate
re-docked models uniformly sampled over DockQ and calculate the Spearman
correlation between the PPI scores and DockQ. For some targets, the scores and
DockQ are highly correlated; however, for many targets, there are weak
correlations. Several physical features can explain the difference between
difficult- and easy-to-score targets. For example, strong correlations exist
between the score and DockQ for targets with highly intertwined monomers and
many interface contacts. We also develop a new score based on only three
physical features that matches or exceeds the performance of current PPI
scoring functions. These results emphasize that PPI prediction can be improved
by focusing on correlations between the PPI score and DockQ and incorporating
more discriminating physical features into PPI scoring functions.