柔性蛋白-蛋白对接算法的量化不确定性

Workshop on Algorithms in Bioinformatics Pub Date : 2019-06-24 DOI:10.4230/LIPIcs.WABI.2019.3

Nathan L. Clement

{"title":"柔性蛋白-蛋白对接算法的量化不确定性","authors":"Nathan L. Clement","doi":"10.4230/LIPIcs.WABI.2019.3","DOIUrl":null,"url":null,"abstract":"The strength or weakness of an algorithm is ultimately governed by the confidence of its result. When the domain of the problem is large (e.g. traversal of a high-dimensional space), a perfect solution cannot be obtained, so approximations must be made. These approximations often lead to a reported quantity of interest (QOI) which varies between runs, decreasing the confidence of any single run. When the algorithm further computes this final QOI based on uncertain or noisy data, the variability (or lack of confidence) of the final QOI increases. Unbounded, these two sources of uncertainty (algorithmic approximations and uncertainty in input data) can result in a reported statistic that has low correlation with ground truth. \nIn biological applications, this is especially applicable, as the search space is generally approximated at least to some degree (e.g. a high percentage of protein structures are invalid or energetically unfavorable) and the explicit conversion from continuous to discrete space for protein representation implies some uncertainty in the input data. This research applies uncertainty quantification techniques to the difficult protein-protein docking problem, first showing the variability that exists in existing software, and then providing a method for computing probabilistic certificates in the form of Chernoff-like bounds. Finally, this paper leverages these probabilistic certificates to accurately bound the uncertainty in docking from two docking algorithms, providing a QOI that is both robust and statistically meaningful.","PeriodicalId":329847,"journal":{"name":"Workshop on Algorithms in Bioinformatics","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Quantified Uncertainty of Flexible Protein-Protein Docking Algorithms\",\"authors\":\"Nathan L. Clement\",\"doi\":\"10.4230/LIPIcs.WABI.2019.3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The strength or weakness of an algorithm is ultimately governed by the confidence of its result. When the domain of the problem is large (e.g. traversal of a high-dimensional space), a perfect solution cannot be obtained, so approximations must be made. These approximations often lead to a reported quantity of interest (QOI) which varies between runs, decreasing the confidence of any single run. When the algorithm further computes this final QOI based on uncertain or noisy data, the variability (or lack of confidence) of the final QOI increases. Unbounded, these two sources of uncertainty (algorithmic approximations and uncertainty in input data) can result in a reported statistic that has low correlation with ground truth. \\nIn biological applications, this is especially applicable, as the search space is generally approximated at least to some degree (e.g. a high percentage of protein structures are invalid or energetically unfavorable) and the explicit conversion from continuous to discrete space for protein representation implies some uncertainty in the input data. This research applies uncertainty quantification techniques to the difficult protein-protein docking problem, first showing the variability that exists in existing software, and then providing a method for computing probabilistic certificates in the form of Chernoff-like bounds. Finally, this paper leverages these probabilistic certificates to accurately bound the uncertainty in docking from two docking algorithms, providing a QOI that is both robust and statistically meaningful.\",\"PeriodicalId\":329847,\"journal\":{\"name\":\"Workshop on Algorithms in Bioinformatics\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-06-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Workshop on Algorithms in Bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4230/LIPIcs.WABI.2019.3\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Algorithms in Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/LIPIcs.WABI.2019.3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

算法的强弱最终取决于其结果的置信度。当问题的域较大时(如遍历高维空间)，不可能得到完美解，因此必须进行近似。这些近似值通常会导致在运行之间变化的报告兴趣量(QOI)，从而降低任何单个运行的置信度。当算法基于不确定或有噪声的数据进一步计算最终QOI时，最终QOI的可变性(或缺乏置信度)会增加。如果不受限制，这两个不确定性来源(算法近似和输入数据中的不确定性)可能导致报告的统计数据与基本事实的相关性较低。在生物应用中，这尤其适用，因为搜索空间通常至少在某种程度上是近似的(例如，高比例的蛋白质结构是无效的或能量上不利的)，并且蛋白质表示从连续空间到离散空间的显式转换意味着输入数据中的一些不确定性。本研究将不确定性量化技术应用于困难的蛋白质-蛋白质对接问题，首先展示了现有软件中存在的可变性，然后提供了一种以类切诺夫界形式计算概率证书的方法。最后，本文利用这些概率证明准确地绑定了两种对接算法的对接不确定性，提供了一个鲁棒且具有统计意义的QOI。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Quantified Uncertainty of Flexible Protein-Protein Docking Algorithms

The strength or weakness of an algorithm is ultimately governed by the confidence of its result. When the domain of the problem is large (e.g. traversal of a high-dimensional space), a perfect solution cannot be obtained, so approximations must be made. These approximations often lead to a reported quantity of interest (QOI) which varies between runs, decreasing the confidence of any single run. When the algorithm further computes this final QOI based on uncertain or noisy data, the variability (or lack of confidence) of the final QOI increases. Unbounded, these two sources of uncertainty (algorithmic approximations and uncertainty in input data) can result in a reported statistic that has low correlation with ground truth. In biological applications, this is especially applicable, as the search space is generally approximated at least to some degree (e.g. a high percentage of protein structures are invalid or energetically unfavorable) and the explicit conversion from continuous to discrete space for protein representation implies some uncertainty in the input data. This research applies uncertainty quantification techniques to the difficult protein-protein docking problem, first showing the variability that exists in existing software, and then providing a method for computing probabilistic certificates in the form of Chernoff-like bounds. Finally, this paper leverages these probabilistic certificates to accurately bound the uncertainty in docking from two docking algorithms, providing a QOI that is both robust and statistically meaningful.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Workshop on Algorithms in Bioinformatics

自引率

0.00%

发文量