2010 22nd International Symposium on Computer Architecture and High Performance Computing最新文献

筛选
英文 中文
BatchQueue: Fast and Memory-Thrifty Core to Core Communication BatchQueue:快速和内存节约的核心到核心通信
Thomas Preud'homme, Julien Sopena, Gaël Thomas, B. Folliot
{"title":"BatchQueue: Fast and Memory-Thrifty Core to Core Communication","authors":"Thomas Preud'homme, Julien Sopena, Gaël Thomas, B. Folliot","doi":"10.1109/SBAC-PAD.2010.34","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2010.34","url":null,"abstract":"Sequential applications can take advantage of multi-core systems by way of pipeline parallelism to improve their performance. In such parallelism, core to core communication overhead is the main limit of speedup. This paper presents BatchQueue, a fast and memory-thrifty core to core communication system based on batch processing of whole cache line. BatchQueue is able to send a 32bit word of data in just 12.5 ns on a Xeon X5472 and only needs 2 full cache lines plus 3 byte-sized variables — each on a different cache line for optimal performance — to work. The characteristics of BatchQueue — high throughput and increased latency resulting from its batch processing — makes it well suited for highly communicative tasks with no real time requirements such as monitoring.","PeriodicalId":432670,"journal":{"name":"2010 22nd International Symposium on Computer Architecture and High Performance Computing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115392317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
An Analytical Model on the Execution of Transactional Memory 事务性记忆执行的分析模型
Xiao Yu, Zhengyu He, Bo Hong
{"title":"An Analytical Model on the Execution of Transactional Memory","authors":"Xiao Yu, Zhengyu He, Bo Hong","doi":"10.1109/SBAC-PAD.2010.29","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2010.29","url":null,"abstract":"In this paper, we develop an analytical model of the execution of transactional memory (TM) systems. This model employs queuing theory to analyze the impact of an essential set of TM design parameters including the conflict rate, number of checkpoints, and implementation overhead, etc. The model is validated via extensive experiments. To demonstrate the effectiveness of the model, we further study the performance impact of two factors. Our study shows that, for a given TM-based program, the frequency of performing checkpoint can be carefully chosen to minimize the mean transaction completion time. Our study also demonstrated the importance of reducing implementation overhead.","PeriodicalId":432670,"journal":{"name":"2010 22nd International Symposium on Computer Architecture and High Performance Computing","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115687811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Achieving Fault Tolerance on Grids with the CPPC Framework and the GridWay Metascheduler 用CPPC框架和GridWay元调度器实现网格容错
Iván Cores, Gabriel Rodríguez, María J. Martín, P. González
{"title":"Achieving Fault Tolerance on Grids with the CPPC Framework and the GridWay Metascheduler","authors":"Iván Cores, Gabriel Rodríguez, María J. Martín, P. González","doi":"10.1109/SBAC-PAD.2010.22","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2010.22","url":null,"abstract":"Grids have brought a significant increase in the number of available resources that can be provided to applications. In the last decade, an important effort has been made to develop middleware that provides grids with functionalities related to application execution. However, support for fault-tolerant executions is either lacking or limited. This paper presents an experience to endow with fault tolerance support parallel executions on grids through the integration of CPPC, a check pointing tool for parallel applications, and Grid Way, a well-known met scheduler provided with the Globus Toolkit. Since both tools are not immediately compatible, a new architecture, called CPPC-GW, has been designed and implemented to allow for the transparent execution of CPPC applications through Grid Way. The performance of the solution has been evaluated using the NAS Parallel Benchmarks. Detailed experimental results show the low overhead of the approach.","PeriodicalId":432670,"journal":{"name":"2010 22nd International Symposium on Computer Architecture and High Performance Computing","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132169949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Analyzing Cache Coherence Protocols for Server Consolidation 分析服务器整合的缓存一致性协议
Antonio García-Guirado, Ricardo Fernández Pascual, José M. García
{"title":"Analyzing Cache Coherence Protocols for Server Consolidation","authors":"Antonio García-Guirado, Ricardo Fernández Pascual, José M. García","doi":"10.1109/SBAC-PAD.2010.31","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2010.31","url":null,"abstract":"Server consolidation is commonly used today to make the most out of all the cores of a chip multiprocessor by running several virtual machines (VMs) on it. Cache coherence protocols can be adapted to take advantage of such an scenario. In this line, Virtual Hierarchies (VHs) use two levels of cache coherence in a consolidated server. They isolate the coherence actions of each VM and improve performance by maximizing the number of memory accesses serviced by caches within the VM. In this paper we show how hierarchical protocols with no single ordering point for the requests, such as VHs in the form currently proposed, are prone to deadlocks. Besides, when memory deduplication is used, VHs cannot take advantage of memory deduplication at the cache level, both because deduplicated data is reduplicated in cache, and because accesses to deduplicated data often require the access to the cache tiles used by a different VM by means of broadcast. We analyze all these problems and we propose solutions for them, showing the actual performance of these protocols, and giving some insights for the future development of coherence protocols optimized for server consolidation.","PeriodicalId":432670,"journal":{"name":"2010 22nd International Symposium on Computer Architecture and High Performance Computing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130850618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Tree Projection-Based Frequent Itemset Mining on Multicore CPUs and GPUs 基于树投影的多核cpu和gpu频繁项集挖掘
George Teodoro, Nathan Mariano, Wagner Meira Jr, R. Ferreira
{"title":"Tree Projection-Based Frequent Itemset Mining on Multicore CPUs and GPUs","authors":"George Teodoro, Nathan Mariano, Wagner Meira Jr, R. Ferreira","doi":"10.1109/SBAC-PAD.2010.15","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2010.15","url":null,"abstract":"Frequent itemset mining (FIM) is a core operation for several data mining applications as association rules computation, correlations, document classification, and many others, which has been extensively studied over the last decades. Moreover, databases are becoming increasingly larger, thus requiring a higher computing power to mine them in reasonable time. At the same time, the advances in high performance computing platforms are transforming them into hierarchical parallel environments equipped with multi-core processors and many-core accelerators, such as GPUs. Thus, fully exploiting these systems to perform FIM tasks poses as a challenging and critical problem that we address in this paper. We present efficient multi-core and GPU accelerated parallelizations of the Tree Projection, one of the most competitive FIM algorithms. The experimental results show that our Tree Projection implementation scales almost linearly in a CPU shared-memory environment after careful optimizations, while the GPU versions are up to 173 times faster than standard the CPU version.","PeriodicalId":432670,"journal":{"name":"2010 22nd International Symposium on Computer Architecture and High Performance Computing","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128217023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Simultaneous Evaluation of Multiple I/O Strategies 多个I/O策略的同时评估
Pilar González-Férez, J. Piernas, Toni Cortes
{"title":"Simultaneous Evaluation of Multiple I/O Strategies","authors":"Pilar González-Férez, J. Piernas, Toni Cortes","doi":"10.1109/SBAC-PAD.2010.30","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2010.30","url":null,"abstract":"We present a framework for simulating the performance obtained by different I/O system mechanisms and algorithms at the same time, and for dynamically turning them on and off to improve the overall system performance. A key element of this framework is the the design and implementation of a virtual disk inside the Linux kernel. Our virtual disk creates a virtual block device which is able to simulate any hard drive with a negligible overhead, without interfering with regular I/O requests. We describe the potential of our proposal in REDCAP, a RAM-based disk cache which is dynamically activated/deactivated according to the throughput achieved. The results show that, by using our virtual disk, REDCAP obtains its maximum possible improvements: up to 80% for workloads with some spatial locality, and the same performance as a ``normal system'' for workloads with random or large sequential reads.","PeriodicalId":432670,"journal":{"name":"2010 22nd International Symposium on Computer Architecture and High Performance Computing","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128938774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Towards a Peer-to-Peer Framework for Parallel and Distributed Computing 面向并行和分布式计算的对等框架
L. José, Senger Márcio Augusto de Souza, D. Foltran
{"title":"Towards a Peer-to-Peer Framework for Parallel and Distributed Computing","authors":"L. José, Senger Márcio Augusto de Souza, D. Foltran","doi":"10.1109/SBAC-PAD.2010.23","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2010.23","url":null,"abstract":"This paper presents a framework for developing and executing parallel and distributed applications using the peer-to-peer computing model. The framework - called P2PComp - follows the main philosophy of the pure peer-to-peer model, since there is no hierarchy among the peers, all peers have the same functions and there is no central authority server responsible for the system organization. SPMD parallel applications can be implemented by extending the framework functionalities, which includes functions for starting and monitoring processes, searching resources and communicating by message passing. This paper presents a detailed description of the framework and examples of its utilization for building and executing parallel applications. The results obtained show that the framework can be effectively used for executing computational programs in a flexible peer-to-peer environment.","PeriodicalId":432670,"journal":{"name":"2010 22nd International Symposium on Computer Architecture and High Performance Computing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128858012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Performance Issues for Parallel Implementations of Bootstrap Simulation Algorithm Bootstrap仿真算法并行实现的性能问题
R. Czekster, Paulo Fernandes, Afonso Sales, T. Webber
{"title":"Performance Issues for Parallel Implementations of Bootstrap Simulation Algorithm","authors":"R. Czekster, Paulo Fernandes, Afonso Sales, T. Webber","doi":"10.1109/SBAC-PAD.2010.28","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2010.28","url":null,"abstract":"The solution of state-based stochastic models is usually a demanding application, then it is a natural subject to high performance techniques. We are particularly interested in the speedup of Bootstrap Simulation of structured Markovian models. This approach is a quite recent development in the performance evaluation area, and it brings a considerable improvement in the results accuracy, despite the intrinsic effect of randomness in simulation experiments. Unfortunately, Bootstrap Simulation has higher computational cost than other alternatives. We present experiments with different options to optimize the parallel solution of Bootstrap Simulation applied to three practical examples described in Stochastic Automata Networks (SAN) formalism. This paper contribution resides in the discussion of theoretical implementation issues, the obtained speedup and the actual processing and communication times for all experiments. Additionally, we also suggest future works to improve even more the proposed solution and we discuss some interesting insights for parallelization of similar applications.","PeriodicalId":432670,"journal":{"name":"2010 22nd International Symposium on Computer Architecture and High Performance Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130848021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Clock Synchronization Strategy for Minimizing Clock Variance at Runtime in High-End Computing Environments 在高端计算环境中最小化运行时时钟方差的时钟同步策略
T. Jones, G. Koenig
{"title":"A Clock Synchronization Strategy for Minimizing Clock Variance at Runtime in High-End Computing Environments","authors":"T. Jones, G. Koenig","doi":"10.1109/SBAC-PAD.2010.33","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2010.33","url":null,"abstract":"We present a new software-based clock synchronization scheme that provides high precision time agreement among distributed memory nodes. The technique is designed to minimize variance from a reference chimer during runtime and with minimal time-request latency. Our scheme permits initial unbounded variations in time and corrects both slow and fast chimers (clock skew). An implementation developed within the context of the MPI message passing interface is described and time coordination measurements are presented. Among our results, the mean time variance among a set of nodes improved from 20.0 milliseconds under standard Network Time Protocol (NTP) to 2.29 μsecs under our scheme.","PeriodicalId":432670,"journal":{"name":"2010 22nd International Symposium on Computer Architecture and High Performance Computing","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115573538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信