2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)最新文献

筛选
英文 中文
The Green500 List: Year two Green500榜单:第二年
Wu-chun Feng, Heshan Lin
{"title":"The Green500 List: Year two","authors":"Wu-chun Feng, Heshan Lin","doi":"10.1109/IPDPSW.2010.5470905","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470905","url":null,"abstract":"The Green500 turned two years old this past November at the ACM/IEEE SC|09 Conference. As part of the grassroots movement of the Green500, this paper takes a look back and reflects on how the Green500 has evolved in its second year as well as since its inception. Specifically, it analyzes trends in the Green500 and reports on the implications of these trends. In addition, based on significant feedback from the high-end computing (HEC) community, the Green500 announced three exploratory sub-lists: the Little Green500, the Open Green500, and the HPCC Green500, which are each discussed in this paper.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117202688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Statistical predictors of computing power in heterogeneous clusters 异构集群中计算能力的统计预测
R. C. Chiang, A. A. Maciejewski, A. Rosenberg, H. Siegel
{"title":"Statistical predictors of computing power in heterogeneous clusters","authors":"R. C. Chiang, A. A. Maciejewski, A. Rosenberg, H. Siegel","doi":"10.1109/IPDPSW.2010.5470869","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470869","url":null,"abstract":"If cluster C<inf>1</inf> consists of computers with a faster mean speed than the computers in cluster C<inf>2</inf>, does this imply that cluster C<inf>1</inf> is more productive than cluster C<inf>2</inf>? What if the computers in cluster C<inf>1</inf> have the same mean speed as the computers in cluster C<inf>2</inf>: is the one with computers that have a higher variance in speed more productive? Simulation experiments are performed to explore the above questions within a formal framework for measuring the performance of a cluster. Simulation results show that both mean speed and variance in speed (when mean speeds are equal) are typically correlated with the performance of a cluster, but not always; these statements are quantified statistically for our simulation environments. In addition, simulation results also show that: (1) If the mean speed of computers in cluster C<inf>1</inf> is faster by at least a threshold amount than the mean speed of computers in cluster C<inf>2</inf>, then C<inf>1</inf> is more productive than C<inf>2</inf>. (2) If the computers in clusters C<inf>1</inf> and C<inf>2</inf> have the same mean speed, then C<inf>1</inf> is more productive than C<inf>2</inf> when the variance in speed of computers in cluster C<inf>1</inf> is higher by at least a threshold amount than the variance in speed of computers in cluster C<inf>2</inf>.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132471935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
An efficient GPU implementation of the revised simplex method 修正单纯形法的高效GPU实现
Jakob Bieling, Patrick Peschlow, P. Martini
{"title":"An efficient GPU implementation of the revised simplex method","authors":"Jakob Bieling, Patrick Peschlow, P. Martini","doi":"10.1109/IPDPSW.2010.5470831","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470831","url":null,"abstract":"The computational power provided by the massive parallelism of modern graphics processing units (GPUs) has moved increasingly into focus over the past few years. In particular, general purpose computing on GPUs (GPGPU) is attracting attention among researchers and practitioners alike. Yet GPGPU research is still in its infancy, and a major challenge is to rearrange existing algorithms so as to obtain a significant performance gain from the execution on a GPU. In this paper, we address this challenge by presenting an efficient GPU implementation of a very popular algorithm for linear programming, the revised simplex method. We describe how to carry out the steps of the revised simplex method to take full advantage of the parallel processing capabilities of a GPU. Our experiments demonstrate considerable speedup over a widely used CPU implementation, thus underlining the tremendous potential of GPGPU.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134286837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
An adaptive I/O load distribution scheme for distributed systems 分布式系统的自适应I/O负载分配方案
Xin Chen, J. Langston, Xubin He, Fengjiang Mao
{"title":"An adaptive I/O load distribution scheme for distributed systems","authors":"Xin Chen, J. Langston, Xubin He, Fengjiang Mao","doi":"10.1109/IPDPSW.2010.5470787","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470787","url":null,"abstract":"A fundamental issue in a large-scale distributed system consisting of heterogeneous machines which vary in both I/O and computing capabilities is to distribute workloads with respect to the capabilities of each node to achieve the optimal performance. However, node capabilities are often not stable due to various factors. Simply using a static workload distribution scheme may not well match the capability of each node. To address this issue, we distribute workload adaptively to the change of system node capability. In this paper we present an adaptive I/O load distribution scheme to dynamically capture the I/O capabilities among system nodes and to predictively determine an suitable load distribution pattern. A case study is conducted by applying our load distribution scheme into a popular distributed file system PVFS2. Experiments results show that our adaptive load distribution scheme can dramatically improve the performance: up to 70% performance gain for writes and 80% for reads, and up to 63% overall performance loss can be avoided in the presence of an unstable Object Storage Device (OSD).","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132988057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An empirical study of a scalable Byzantine agreement algorithm 可扩展拜占庭协议算法的实证研究
O. Oluwasanmi, Jared Saia, Valerie King
{"title":"An empirical study of a scalable Byzantine agreement algorithm","authors":"O. Oluwasanmi, Jared Saia, Valerie King","doi":"10.1109/IPDPSW.2010.5470874","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470874","url":null,"abstract":"A recent theoretical result by King and Saia shows that it is possible to solve the Byzantine agreement, leader election and universe reduction problems in the full information model with Õ(n3/2) total bits sent. However, this result, while theoretically interesting, is not practical due to large hidden constants. In this paper, we design a new practical algorithm, based on this theoretical result. For networks containing more than about 1,000 processors, our new algorithm sends significantly fewer bits than a well-known algorithm due to Cachin, Kursawe and Shoup. To obtain our practical algorithm, we relax the fault model compared to the model of King and Saia by (1) allowing the adversary to control only a 1/8, and not a 1/3 fraction of the processors; and (2) assuming the existence of a cryptographic bit commitment primitive. Our algorithm assumes a partially synchronous communication model, where any message sent from one honest player to another honest player needs at most Δ time steps to be received and processed by the recipient for some fixed Δ, and we assume that the clock speeds of the honest players are roughly the same. However, the clocks do not have to be synchronized (i.e., show the same time)","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133213492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Massive streaming data analytics: A case study with clustering coefficients 大规模流数据分析:聚类系数的案例研究
David Ediger, Karl Jiang, E. J. Riedy, David A. Bader
{"title":"Massive streaming data analytics: A case study with clustering coefficients","authors":"David Ediger, Karl Jiang, E. J. Riedy, David A. Bader","doi":"10.1109/IPDPSW.2010.5470687","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470687","url":null,"abstract":"We present a new approach for parallel massive graph analysis of streaming, temporal data with a dynamic and extensible representation. Handling the constant stream of new data from health care, security, business, and social network applications requires new algorithms and data structures. We examine data structure and algorithm trade-offs that extract the parallelism necessary for high-performance updating analysis of massive graphs. Static analysis kernels often rely on storing input data in a specific structure. Maintaining these structures for each possible kernel with high data rates incurs a significant performance cost. A case study computing clustering coefficients on a general-purpose data structure demonstrates incremental updates can be more efficient than global recomputation. Within this kernel, we compare three methods for dynamically updating local clustering coefficients: a brute-force local recalculation, a sorting algorithm, and our new approximation method using a Bloom filter. On 32 processors of a Cray XMT with a synthetic scale-free graph of 224 ≈ 16 million vertices and 229 ≈ 537 million edges, the brute-force method processes a mean of over 50 000 updates per second and our Bloom filter approaches 200 000 updates per second.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"197 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133233930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 71
An architectural space exploration tool for domain specific reconfigurable computing 用于特定领域可重构计算的体系结构空间探索工具
Gayatri Mehta, A. Jones
{"title":"An architectural space exploration tool for domain specific reconfigurable computing","authors":"Gayatri Mehta, A. Jones","doi":"10.1109/IPDPSW.2010.5470735","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470735","url":null,"abstract":"In this paper, we describe a design space exploration (DSE) tool for domain specific reconfigurable computing where the needs of the applications drive the construction of the device architecture. The tool has been developed to automate the design space case studies which allows application developers to explore architectural tradeoffs efficiently and reach solutions quickly. We selected some of the core signal processing benchmarks from the MediaBench benchmark suite and some of the edge-detection benchmarks from the image processing domain for our case studies. We compare the energy consumption of the architecture selected from manual design space case studies with the architectural solution selected by the design space exploration tool. The architecture selected by the DSE tool consumes approximately 9% less energy on an average as compared to the best candidate from the manual design space case studies. The fabric architecture selected from the manual design case studies and the one selected by the tool were synthesized on 130 nm cell-based ASIC fabrication process from IBM. We compare the energy of the benchmarks implemented onto the fabric with other hardware and software implementations. Both fabric architectures (manual and tool) yield energy within 3X of a direct ASIC implementation, 330X better than a Virtex-II Pro FPGA and 2016X better than an Intel XScale processor.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"176 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124336940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Collaborative execution environment for heterogeneous parallel systems 异构并行系统的协同执行环境
A. Ilic, L. Sousa
{"title":"Collaborative execution environment for heterogeneous parallel systems","authors":"A. Ilic, L. Sousa","doi":"10.1109/IPDPSW.2010.5470835","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470835","url":null,"abstract":"Nowadays, commodity computers are complex heterogeneous systems that provide a huge amount of computational power. However, to take advantage of this power we have to orchestrate the use of processing units with different characteristics. Such distributed memory systems make use of relatively slow interconnection networks, such as system buses. Therefore, most of the time we only individually take advantage of the central processing unit (CPU) or processing accelerators, which are simpler homogeneous subsystems. In this paper we propose a collaborative execution environment for exploiting data parallelism in a heterogeneous system. It is shown that this environment can be applied to program both CPU and graphics processing units (GPUs) to collaboratively compute matrix multiplication and fast Fourier transform (FFT). Experimental results show that significant performance benefits are achieved when both CPU and GPU are used.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122801941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Multicore-aware reuse distance analysis 多核感知的重用距离分析
Derek L. Schuff, Benjamin S. Parsons, Vijay S. Pai
{"title":"Multicore-aware reuse distance analysis","authors":"Derek L. Schuff, Benjamin S. Parsons, Vijay S. Pai","doi":"10.1109/IPDPSW.2010.5470780","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470780","url":null,"abstract":"This paper presents and validates methods to extend reuse distance analysis of application locality characteristics to shared-memory multicore platforms by accounting for invalidation-based cache-coherence and inter-core cache sharing. Existing reuse distance analysis methods track the number of distinct addresses referenced between reuses of the same address by a given thread, but do not model the effects of data references by other threads. This paper shows several methods to keep reuse stacks consistent so that they account for invalidations and cache sharing, either as references arise in a simulated execution or at synchronization points. These methods are evaluated against a Simics-based coherent cache simulator running several OpenMP and transaction-based benchmarks. The results show that adding multicore-awareness substantially improves the ability of reuse distance analysis to model cache behavior, reducing the error in miss ratio prediction (relative to cache simulation for a specific cache size) by an average of 70% for per-core caches and an average of 90% for shared caches.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123951970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 55
GridP2P: Resource usage in Grids and Peer-to-Peer systems 网格和点对点系统中的资源使用
Sérgio Esteves, L. Veiga, P. Ferreira
{"title":"GridP2P: Resource usage in Grids and Peer-to-Peer systems","authors":"Sérgio Esteves, L. Veiga, P. Ferreira","doi":"10.1109/IPDPSW.2010.5470917","DOIUrl":"https://doi.org/10.1109/IPDPSW.2010.5470917","url":null,"abstract":"The last few years have witnessed huge growth in computer technology and available resources throughout the Internet. These resources can be used to run CPU-intensive applications requiring long periods of processing time. Grid systems allow us to take advantage of available resources lying over a network. However, these systems impose several difficulties to their usage (e.g. heavy authentication and configuration management); in order to overcome them, Peer-to-Peer systems provide open access making the Grid available to any user. Our solution consists of a platform for distributed cycle sharing which attempts to combine Grid and Peer-to-Peer models. A major goal is to allow any ordinary user to use remote idle cycles in order to speedup commodity applications. On the other hand, users can also provide spare cycles of their machines when they are not using them. Our solution encompasses the following functionalities: application management, job creation and scheduling, resource discovery, security policies, and overlay network management. The simple and modular organization of this system allows that components can be changed at minimum cost. In addition, the use of history-based policies provides powerful usage semantics concerning the resource management.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127700511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信