19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07)最新文献

A Multigrid-Schwarz Method for the Solution of Hydrodynamics and Heat Transfer Problems in Unstructured Meshes 求解非结构网格流体力学和传热问题的多重网格- schwarz方法

19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07) Pub Date : 2007-11-19 DOI: 10.1109/SBAC-PAD.2007.20

Guilherme Galante, Rogério Luís Rizzi, T. A. Diverio

引用次数: 0

Performance Analysis and Linear Optimization Modeling of All-to-all Collective Communication Algorithms All-to-all集体通信算法的性能分析与线性优化建模

19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07) Pub Date : 2007-11-19 DOI: 10.1109/SBAC-PAD.2007.25

Hyacinthe Nzigou Mamadou, T. Nanri, K. Murakami, Guilherme de Melo Baptista Domingues

{"title":"Performance Analysis and Linear Optimization Modeling of All-to-all Collective Communication Algorithms","authors":"Hyacinthe Nzigou Mamadou, T. Nanri, K. Murakami, Guilherme de Melo Baptista Domingues","doi":"10.1109/SBAC-PAD.2007.25","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2007.25","url":null,"abstract":"The performance of collective communication operations still represents a critical issue for high performance computing systems. Users of parallel machines need to have a good grasp of how different communication patterns and styles affect the performance of message-passing applications. This paper reports our contribution of the analysis of collective communication algorithms in the context of MPI programming paradigm by extending a standard point- to-point communication model, which is P-LogP. We focus on MPI Alltoall since this function is one of the most communication intensive collective operations known. In order to reduce the gap between the predicted and the measured run-time, all the system parameters are also taken into account with the total performance estimation, by applying the linear regression modeling with the empirical data. Results on InfiniBand clusters show that the final performance prediction models can accurately capture the entire system communication behavior of all algorithms, even for large size messages and large number of processors.","PeriodicalId":261956,"journal":{"name":"19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132119098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A Scalable Parallel Deduplication Algorithm 可扩展的并行重复数据删除算法

19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07) Pub Date : 2007-11-19 DOI: 10.1109/SBAC-PAD.2007.32

W. Santos, Thiago Teixeira, Carla Machado, Wagner Meira Jr, R. Ferreira, Dorgival Olavo Guedes Neto, A. D. Silva

{"title":"A Scalable Parallel Deduplication Algorithm","authors":"W. Santos, Thiago Teixeira, Carla Machado, Wagner Meira Jr, R. Ferreira, Dorgival Olavo Guedes Neto, A. D. Silva","doi":"10.1109/SBAC-PAD.2007.32","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2007.32","url":null,"abstract":"The identification of replicas in a database is fundamental to improve the quality of the information. Deduplication is the task of identifying replicas in a database that refer to the same real world entity. This process is not always trivial, because data may be corrupted during their gathering, storing or even manipulation. Problems such as misspelled names, data truncation, data input in a wrong format, lack of conventions (like how to abbreviate a name), missing data or even fraud may lead to the insertion of replicas in a database. The deduplication process may be very hard, if not impossible, to be performed manually, since actual databases may have hundreds of millions of records. In this paper, we present our parallel deduplication algorithm, called FER- APARDA. By using probabilistic record linkage, we were able to successfully detect replicas in synthetic datasets with more than 1 million records in about 7 minutes using a 20- computer cluster, achieving an almost linear speedup. We believe that our results do not have similar in the literature when it comes to the size of the data set and the processing time.","PeriodicalId":261956,"journal":{"name":"19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124819204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

Register File Energy Optimization for Snooping Based Clustered VLIW Architectures 基于窥探的集群VLIW体系结构的寄存器文件能量优化

19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07) Pub Date : 2007-11-19 DOI: 10.1109/SBAC-PAD.2007.35

Rahul Nagpal, Y. Srikant

{"title":"Register File Energy Optimization for Snooping Based Clustered VLIW Architectures","authors":"Rahul Nagpal, Y. Srikant","doi":"10.1109/SBAC-PAD.2007.35","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2007.35","url":null,"abstract":"Frequent accesses to the register file make it one of the major sources of energy consumption in ILP architectures. The large number of functional units connected to a large unified register file in VLIW architectures make power dissipation in the register file even worse because of the need for a large number of ports. High power dissipation in a relatively smaller area occupied by a register file leads to a high power density in the register file and makes it one of the prime hot-spots. This makes it highly susceptible to the possibility of a catastrophic heatstroke. This in turn impacts the performance and cost because of the need for periodic cool down and sophisticated packaging and cooling techniques respectively. Clustered VLIW architectures partition the register file among clusters of functional units and reduce the number of ports required thereby reducing the power dissipation. However, we observe that the aggregate accesses to register files in clustered VLIW architectures (and associated energy consumption) become very high compared to the centralized VLIW architectures and this can be attributed to a large number of explicit inter-cluster communications. Snooping based clustered VLIW architectures provide very limited but very fast way of inter-cluster communication by allowing some of the functional units to directly read some of the operands from the register file of some of the other clusters. In this paper, we propose instruction scheduling algorithms that exploit the limited snooping capability to reduce the register file energy consumption on an average by 12% and 18% and improve the overall performance by 5% and 11% for a 2-clustered and a 4-clustered machine respectively, over an earlier state-of-the-art clustered scheduling algorithm when evaluated in the context of snooping based clustered VLIW architectures.","PeriodicalId":261956,"journal":{"name":"19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125742886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Impacts of Multiprocessor Configurations on Workloads in Bioinformatics 生物信息学中多处理器配置对工作量的影响

19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07) Pub Date : 2007-11-19 DOI: 10.1109/SBAC-PAD.2007.30

Youfeng Wu, M. Breternitz, V. Ying

{"title":"Impacts of Multiprocessor Configurations on Workloads in Bioinformatics","authors":"Youfeng Wu, M. Breternitz, V. Ying","doi":"10.1109/SBAC-PAD.2007.30","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2007.30","url":null,"abstract":"Bioinformatics is among the most active research areas in computer science. In this study, we investigate a suite of workloads in bioinformatics on two multiprocessor systems with different configurations, and examine the effects of the configurations on the performance of the workloads. Our result indicates that the configurations of the multiprocessor systems have significant impact on the performance and scalability of the workloads. For example, a number of workloads have significantly higher scalability on one of the systems, but poorer absolute performance than on the other system. However, traditional scalability failed to capture the impacts of the system configurations on the workloads. We present insights on what kinds of workloads will run faster on which systems and propose new metrics to capture the impacts of multiple processor configurations on the workloads. These findings not only provide an easy way to compare results running on different systems, but also enable re-configuration of the underlying systems to run specific workloads efficiently. We also show how processor mapping and loop spreading may help map the workoads to the underlining multiprocessor configuration and achieve consistent scalability for these workloads.","PeriodicalId":261956,"journal":{"name":"19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07)","volume":"31 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116339539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Queue Register File Optimization Algorithm for QueueCore Processor QueueCore处理器的队列寄存器文件优化算法

19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07) Pub Date : 2007-11-19 DOI: 10.1109/SBAC-PAD.2007.10

A. Canedo, B. Abderazek, M. Sowa

引用次数: 13

Fault-tolerance in filter-labeled-stream applications 过滤器标签流应用中的容错性

19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07) Pub Date : 2007-11-19 DOI: 10.1109/SBAC-PAD.2007.31

Bruno Coutinho, Dorgival Olavo Guedes Neto, Wagner Meira Jr, R. Ferreira

{"title":"Fault-tolerance in filter-labeled-stream applications","authors":"Bruno Coutinho, Dorgival Olavo Guedes Neto, Wagner Meira Jr, R. Ferreira","doi":"10.1109/SBAC-PAD.2007.31","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2007.31","url":null,"abstract":"Fault tolerance is a desirable feature in distributed high-performance systems, since applications tend to run for long periods of time and faults become more likely as the number of nodes in the system increase. However, most distributed environments lack any fault tolerant features, since they tend to be hard to implement and use, and often hurt performance dramatically. In this paper we discuss how we successfully added fault-tolerance to the Anthill distributed programming environment by using an application-level checkpoint/rollback solution. The programming model offers an abstraction where the programmer can easily identify points during the execution where the communication pattern is well defined, forming a consistent cut where checkpoints may be saved consistently without requiring extra communication, avoiding any domino effect during recovery from faults. We present the new abstractions for fault tolerance, describe how the solution was implemented and present performance results that show the efficiency of the solution with both regular and irregular applications.","PeriodicalId":261956,"journal":{"name":"19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132949893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-level Parallelism in the Computational Modeling of the Heart 心脏计算建模中的多级并行性

19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07) Pub Date : 2007-11-19 DOI: 10.1109/SBAC-PAD.2007.19

C. R. Xavier, R. S. Oliveira, V. D. F. Vieira, R. D. Santos, Wagner Meira Jr

{"title":"Multi-level Parallelism in the Computational Modeling of the Heart","authors":"C. R. Xavier, R. S. Oliveira, V. D. F. Vieira, R. D. Santos, Wagner Meira Jr","doi":"10.1109/SBAC-PAD.2007.19","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2007.19","url":null,"abstract":"Computational modeling of the heart has demonstrated to be a useful tool for the investigation and comprehension of the complex biophysical processes that underlie cardiac function. Unfortunately, large scale simulations, such as those resulting from the discretization of an entire heart, remain a computational challenge. In order to reduce simulation execution times, parallel implementations have traditionally exploited data parallelism via numerical schemes based on domain-decomposition. However, it has been verified that the parallel efficiency of these implementations severely degrades as the number of processors increases. In this work, we propose and implement a new parallel algorithm for the solution of cardiac models. By relaxing the coherence of the execution, a new level of parallelism could be identified and exploited: pipelining. A synchronous parallel algorithm that uses both pipelining and data decomposition techniques was implemented and used the MPI library for communication. Numerical tests were performed in a 8-node Linux-cluster. Our preliminary results indicate that the proposed algorithm is able to increase the parallel efficiency up to 20% when compared to the traditional approach that uses pure data-level parallelism. In addition, the numerical precision was kept under control (relative errors under 4%) when the relaxed coherence execution was adopted.","PeriodicalId":261956,"journal":{"name":"19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125296246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

A Selector of Grid Resources based on the Semantic Integration of Multiple Ontologies 基于多本体语义集成的网格资源选择器

19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07) Pub Date : 2007-11-19 DOI: 10.1109/SBAC-PAD.2007.16

A. C. Silva, M. Dantas

引用次数: 7

High-Level Service Connectors for Component-Based High Performance Computing 用于基于组件的高性能计算的高级服务连接器

19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07) Pub Date : 2007-11-19 DOI: 10.1109/SBAC-PAD.2007.34

Francisco Heron de Carvalho Junior, Ricardo C. Corrêa, G. A. Araújo, Jefferson de Carvalho Silva, R. Lins

引用次数: 3