大规模生物系统组合建模的并行、可扩展、内存高效回溯

2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-04-14 DOI:10.1109/IPDPS.2008.4536180

Byung-Hoon Park, Matthew C. Schmidt, K. Thomas, T. Karpinets, N. Samatova

{"title":"大规模生物系统组合建模的并行、可扩展、内存高效回溯","authors":"Byung-Hoon Park, Matthew C. Schmidt, K. Thomas, T. Karpinets, N. Samatova","doi":"10.1109/IPDPS.2008.4536180","DOIUrl":null,"url":null,"abstract":"Data-driven modeling of biological systems such as protein- protein interaction networks is data-intensive and combinatorially challenging. Backtracking can constrain a combinatorial search space. Yet, its recursive nature, exacerbated by data-intensity, limits its applicability for large-scale systems. Parallel, scalable, and memory-efficient backtracking is a promising approach. Parallel backtracking suffers from unbalanced loads. Load rebalancing via synchronization and data movement is prohibitively expensive. Balancing these discrepancies, while minimizing end-to-end execution time and memory requirements, is desirable. This paper introduces such a framework. Its scalability and efficiency, demonstrated on the maximal clique enumeration problem, are attributed to the proposed: (a) representation of search tree decomposition to enable parallelization; (b) depth-first parallel search to minimize memory requirement; (c) least stringent synchronization to minimize data movement; and (d) on-demand work stealing with stack splitting to minimize processors' idle time. The applications of this framework to real biological problems related to bioethanol production are discussed.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Parallel, scalable, memory-efficient backtracking for combinatoria modeling of large-scale biological systems\",\"authors\":\"Byung-Hoon Park, Matthew C. Schmidt, K. Thomas, T. Karpinets, N. Samatova\",\"doi\":\"10.1109/IPDPS.2008.4536180\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data-driven modeling of biological systems such as protein- protein interaction networks is data-intensive and combinatorially challenging. Backtracking can constrain a combinatorial search space. Yet, its recursive nature, exacerbated by data-intensity, limits its applicability for large-scale systems. Parallel, scalable, and memory-efficient backtracking is a promising approach. Parallel backtracking suffers from unbalanced loads. Load rebalancing via synchronization and data movement is prohibitively expensive. Balancing these discrepancies, while minimizing end-to-end execution time and memory requirements, is desirable. This paper introduces such a framework. Its scalability and efficiency, demonstrated on the maximal clique enumeration problem, are attributed to the proposed: (a) representation of search tree decomposition to enable parallelization; (b) depth-first parallel search to minimize memory requirement; (c) least stringent synchronization to minimize data movement; and (d) on-demand work stealing with stack splitting to minimize processors' idle time. The applications of this framework to real biological problems related to bioethanol production are discussed.\",\"PeriodicalId\":162608,\"journal\":{\"name\":\"2008 IEEE International Symposium on Parallel and Distributed Processing\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-04-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 IEEE International Symposium on Parallel and Distributed Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPS.2008.4536180\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE International Symposium on Parallel and Distributed Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2008.4536180","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

蛋白质-蛋白质相互作用网络等生物系统的数据驱动建模是数据密集型的，并且具有组合挑战性。回溯可以约束组合搜索空间。然而，它的递归性质，加上数据强度，限制了它在大规模系统中的适用性。并行、可伸缩和内存高效的回溯是一种很有前途的方法。并行回溯受不平衡负载的影响。通过同步和数据移动进行负载再平衡的成本非常高。平衡这些差异，同时最小化端到端执行时间和内存需求是可取的。本文介绍了这样一个框架。它的可扩展性和效率，在最大团枚举问题上得到了证明，归功于提出的(a)搜索树分解的表示，以实现并行化;(b)深度优先并行搜索以最小化内存需求;(c)最不严格的同步以尽量减少数据移动;(d)按需工作窃取与堆栈拆分，以尽量减少处理器的空闲时间。讨论了该框架在与生物乙醇生产有关的实际生物学问题中的应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Parallel, scalable, memory-efficient backtracking for combinatoria modeling of large-scale biological systems

Data-driven modeling of biological systems such as protein- protein interaction networks is data-intensive and combinatorially challenging. Backtracking can constrain a combinatorial search space. Yet, its recursive nature, exacerbated by data-intensity, limits its applicability for large-scale systems. Parallel, scalable, and memory-efficient backtracking is a promising approach. Parallel backtracking suffers from unbalanced loads. Load rebalancing via synchronization and data movement is prohibitively expensive. Balancing these discrepancies, while minimizing end-to-end execution time and memory requirements, is desirable. This paper introduces such a framework. Its scalability and efficiency, demonstrated on the maximal clique enumeration problem, are attributed to the proposed: (a) representation of search tree decomposition to enable parallelization; (b) depth-first parallel search to minimize memory requirement; (c) least stringent synchronization to minimize data movement; and (d) on-demand work stealing with stack splitting to minimize processors' idle time. The applications of this framework to real biological problems related to bioethanol production are discussed.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2008 IEEE International Symposium on Parallel and Distributed Processing

自引率

0.00%

发文量