生物网络分析中并行自适应采样算法的发展

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum Pub Date : 2012-05-21 DOI:10.1109/IPDPSW.2012.90

K. Dempsey, K. Duraisamy, S. Bhowmick, H. Ali

{"title":"生物网络分析中并行自适应采样算法的发展","authors":"K. Dempsey, K. Duraisamy, S. Bhowmick, H. Ali","doi":"10.1109/IPDPSW.2012.90","DOIUrl":null,"url":null,"abstract":"The availability of biological data in massive scales continues to represent unlimited opportunities as well as great challenges in bioinformatics research. Developing innovative data mining techniques and efficient parallel computational methods to implement them will be crucial in extracting useful knowledge from this raw unprocessed data, such as in discovering significant cellular subsystems from gene correlation networks. In this paper, we present a scalable combinatorial sampling technique, based on identifying maximum chordal sub graphs, that reduces noise from biological correlation networks, thereby making it possible to find biologically relevant clusters from the filtered network. We show how selecting the appropriate filter is crucial in maintaining the key structures from the original networks and uncovering new ones after removing noisy relationships. We also conduct one of the first comparisons in two important sensitivity criteria - the perturbation due to the vertex numbers of the network and perturbations due to data distribution. We demonstrate that our chordal-graph based filter is effective across many different vertex permutations, as is our parallel implementation of the sampling algorithm.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"The Development of Parallel Adaptive Sampling Algorithms for Analyzing Biological Networks\",\"authors\":\"K. Dempsey, K. Duraisamy, S. Bhowmick, H. Ali\",\"doi\":\"10.1109/IPDPSW.2012.90\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The availability of biological data in massive scales continues to represent unlimited opportunities as well as great challenges in bioinformatics research. Developing innovative data mining techniques and efficient parallel computational methods to implement them will be crucial in extracting useful knowledge from this raw unprocessed data, such as in discovering significant cellular subsystems from gene correlation networks. In this paper, we present a scalable combinatorial sampling technique, based on identifying maximum chordal sub graphs, that reduces noise from biological correlation networks, thereby making it possible to find biologically relevant clusters from the filtered network. We show how selecting the appropriate filter is crucial in maintaining the key structures from the original networks and uncovering new ones after removing noisy relationships. We also conduct one of the first comparisons in two important sensitivity criteria - the perturbation due to the vertex numbers of the network and perturbations due to data distribution. We demonstrate that our chordal-graph based filter is effective across many different vertex permutations, as is our parallel implementation of the sampling algorithm.\",\"PeriodicalId\":378335,\"journal\":{\"name\":\"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum\",\"volume\":\"49 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-05-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW.2012.90\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2012.90","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

摘要

在生物信息学研究中，大规模生物数据的可用性继续代表着无限的机会，同时也面临着巨大的挑战。开发创新的数据挖掘技术和有效的并行计算方法来实现这些技术，对于从这些原始的未处理数据中提取有用的知识至关重要，例如从基因相关网络中发现重要的细胞子系统。在本文中，我们提出了一种基于识别最大弦子图的可扩展组合采样技术，该技术减少了来自生物相关网络的噪声，从而使从过滤网络中找到生物相关簇成为可能。我们展示了如何选择合适的滤波器对于保持原始网络中的关键结构以及在去除噪声关系后发现新结构至关重要。我们还在两个重要的灵敏度标准中进行了第一次比较-由于网络顶点数的扰动和由于数据分布的扰动。我们证明了基于弦图的滤波器在许多不同的顶点排列上是有效的，我们的并行采样算法也是如此。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

The Development of Parallel Adaptive Sampling Algorithms for Analyzing Biological Networks

The availability of biological data in massive scales continues to represent unlimited opportunities as well as great challenges in bioinformatics research. Developing innovative data mining techniques and efficient parallel computational methods to implement them will be crucial in extracting useful knowledge from this raw unprocessed data, such as in discovering significant cellular subsystems from gene correlation networks. In this paper, we present a scalable combinatorial sampling technique, based on identifying maximum chordal sub graphs, that reduces noise from biological correlation networks, thereby making it possible to find biologically relevant clusters from the filtered network. We show how selecting the appropriate filter is crucial in maintaining the key structures from the original networks and uncovering new ones after removing noisy relationships. We also conduct one of the first comparisons in two important sensitivity criteria - the perturbation due to the vertex numbers of the network and perturbations due to data distribution. We demonstrate that our chordal-graph based filter is effective across many different vertex permutations, as is our parallel implementation of the sampling algorithm.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum

自引率

0.00%

发文量