基于迭代采样和模型聚类的蛋白质相互作用实验噪声检测

Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings. Pub Date : 2003-03-10 DOI:10.1109/BIBE.2003.1188977

Hiroshi Mamitsuka

{"title":"基于迭代采样和模型聚类的蛋白质相互作用实验噪声检测","authors":"Hiroshi Mamitsuka","doi":"10.1109/BIBE.2003.1188977","DOIUrl":null,"url":null,"abstract":"One of the most important issues in current molecular biology is to build exact networks of protein-protein interactions. Recently developed high-throughput experimental techniques accumulate a vast amount of protein-protein interaction data, but it is well known that data reliability has not reached at a satisfactory level. In this paper we attempt to computationally detect experimental errors or noises presumably contained in the protein-protein interaction data by an iterative sampling method using the learning of a stochastic model as its subroutine. The method repeats two steps of selecting examples that can be regarded as non-noises, and training the component algorithm with the selected examples alternately. Noise candidates are selected as the examples having the smallest average likelihoods computed by previously obtained stochastic models. We empirically evaluated the method with other two methods by using both synthetic and real data sets. We examined the effect of noises and data sizes by using medium- and large-sized synthetic data sets that contain noises added intentionally. The results obtained by the medium-sized synthetic data sets show that the significance level of the performance difference between the method and the two other methods has more pronounced for higher noise ratios. Further experiments show that this experimental finding was also true of a large-scale data set. The performance advantage of the method was further confirmed by the experiments using a real protein-protein interaction data set.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Detecting experimental noises in protein-protein interactions with iterative sampling and model-based clustering\",\"authors\":\"Hiroshi Mamitsuka\",\"doi\":\"10.1109/BIBE.2003.1188977\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One of the most important issues in current molecular biology is to build exact networks of protein-protein interactions. Recently developed high-throughput experimental techniques accumulate a vast amount of protein-protein interaction data, but it is well known that data reliability has not reached at a satisfactory level. In this paper we attempt to computationally detect experimental errors or noises presumably contained in the protein-protein interaction data by an iterative sampling method using the learning of a stochastic model as its subroutine. The method repeats two steps of selecting examples that can be regarded as non-noises, and training the component algorithm with the selected examples alternately. Noise candidates are selected as the examples having the smallest average likelihoods computed by previously obtained stochastic models. We empirically evaluated the method with other two methods by using both synthetic and real data sets. We examined the effect of noises and data sizes by using medium- and large-sized synthetic data sets that contain noises added intentionally. The results obtained by the medium-sized synthetic data sets show that the significance level of the performance difference between the method and the two other methods has more pronounced for higher noise ratios. Further experiments show that this experimental finding was also true of a large-scale data set. The performance advantage of the method was further confirmed by the experiments using a real protein-protein interaction data set.\",\"PeriodicalId\":178814,\"journal\":{\"name\":\"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-03-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BIBE.2003.1188977\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBE.2003.1188977","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

当前分子生物学中最重要的问题之一是建立蛋白质相互作用的精确网络。近年来发展起来的高通量实验技术积累了大量的蛋白质-蛋白质相互作用数据，但众所周知，数据的可靠性还没有达到令人满意的水平。在本文中，我们尝试使用随机模型的学习作为其子程序，通过迭代采样方法计算检测可能包含在蛋白质-蛋白质相互作用数据中的实验误差或噪声。该方法重复两个步骤:选择可视为无噪声的样例，并用所选择的样例交替训练分量算法。选择噪声候选者作为由先前获得的随机模型计算的平均似然最小的例子。我们使用合成数据集和真实数据集对其他两种方法进行了经验评估。我们通过使用包含有意添加的噪声的中型和大型合成数据集来检查噪声和数据大小的影响。中型合成数据集的结果表明，当噪声比较高时，该方法与其他两种方法的性能差异的显著性水平更为明显。进一步的实验表明，这一实验发现也适用于大规模的数据集。利用真实蛋白质相互作用数据集的实验进一步证实了该方法的性能优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Detecting experimental noises in protein-protein interactions with iterative sampling and model-based clustering

One of the most important issues in current molecular biology is to build exact networks of protein-protein interactions. Recently developed high-throughput experimental techniques accumulate a vast amount of protein-protein interaction data, but it is well known that data reliability has not reached at a satisfactory level. In this paper we attempt to computationally detect experimental errors or noises presumably contained in the protein-protein interaction data by an iterative sampling method using the learning of a stochastic model as its subroutine. The method repeats two steps of selecting examples that can be regarded as non-noises, and training the component algorithm with the selected examples alternately. Noise candidates are selected as the examples having the smallest average likelihoods computed by previously obtained stochastic models. We empirically evaluated the method with other two methods by using both synthetic and real data sets. We examined the effect of noises and data sizes by using medium- and large-sized synthetic data sets that contain noises added intentionally. The results obtained by the medium-sized synthetic data sets show that the significance level of the performance difference between the method and the two other methods has more pronounced for higher noise ratios. Further experiments show that this experimental finding was also true of a large-scale data set. The performance advantage of the method was further confirmed by the experiments using a real protein-protein interaction data set.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.

自引率

0.00%

发文量