2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing最新文献

筛选
英文 中文
A Performance Isolation Analysis of Disk-Intensive Workloads on Container-Based Clouds 基于容器的云上磁盘密集型工作负载的性能隔离分析
Miguel G. Xavier, Israel C. De Oliveira, F. Rossi, Robson D. Dos Passos, Kassiano J. Matteussi, C. Rose
{"title":"A Performance Isolation Analysis of Disk-Intensive Workloads on Container-Based Clouds","authors":"Miguel G. Xavier, Israel C. De Oliveira, F. Rossi, Robson D. Dos Passos, Kassiano J. Matteussi, C. Rose","doi":"10.1109/PDP.2015.67","DOIUrl":"https://doi.org/10.1109/PDP.2015.67","url":null,"abstract":"The popularity of Cloud computing due to the increasing number of customers has led Cloud providers to adopt resource-sharing solutions to meet growing demand for infrastructure resources. As the adoption of resource-sharing/consolidation in Cloud computing became arguably a well-established solution, the ability the underlying virtualization systems of preventing performance interferences from customers must also be understood. Virtualization systems based on containers, such as LXC, are the basis of the next-generation of Cloud computing and have become the most popular solution under PaaS/IaaS Cloud platforms with the rise of Docker -- an open platform for developers and sysadmins to build, ship, and run distributed applications. Such platforms have enticed many attentions globally, since they leverage container-based virtualization systems to offer high scalability while low performance overheads, the performance might be solely aggravated if the customers' workloads are consolidated onto the same hardware and the isolation layer does not properly isolate the shared resources. Performance isolation is an inherent concern of such systems due to the nature as they are conceived and is still an unexplored and open research topic, the consequences might influence in the adoption under shared Cloud computing platforms where Quality-of-Service is a crucial factor that cannot be disregarded. In this paper we analyze the performance interference suffered by disk-intensive workloads within very noisy-perturbed containers (different hardware components stressed). Our results show workload combinations whose performance degradation goes up to 38%, but in contrast we expose a workload-balanced scenario wherein the performance does not suffer any interference.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126709652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 63
An Adaptive, Low Restrictive and Fault Resilient Routing Algorithm for 3D Network-on-Chip 三维片上网络的自适应、低约束和故障弹性路由算法
R. Salamat, M. Ebrahimi, N. Bagherzadeh
{"title":"An Adaptive, Low Restrictive and Fault Resilient Routing Algorithm for 3D Network-on-Chip","authors":"R. Salamat, M. Ebrahimi, N. Bagherzadeh","doi":"10.1109/PDP.2015.91","DOIUrl":"https://doi.org/10.1109/PDP.2015.91","url":null,"abstract":"The cost and reliability issues of TSVs move 3D-NoCs toward heterogonous designs with limited number of TSVs. However, designing a deadlock-free routing algorithm for such heterogonous architectures is extremely challenging due to the increased possibilities of forming cycles between and within layers for 3D designs. In this paper, we target designing a routing algorithm for heterogeneous 3D-NoCs with the capability of working under the technical limit in which there is just one TSV in the network. This algorithm is light-weight and provides adaptivity by using only one virtual channel along the Y dimension.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126622787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Reliability Analysis of Highly Redundant Distributed Storage Systems with Dynamic Refuging 考虑动态避难的高冗余分布式存储系统可靠性分析
Hiroaki Akutsu, K. Ueda, Takeru Chiba, Tomohiro Kawaguchi, Norio Shimozono
{"title":"Reliability Analysis of Highly Redundant Distributed Storage Systems with Dynamic Refuging","authors":"Hiroaki Akutsu, K. Ueda, Takeru Chiba, Tomohiro Kawaguchi, Norio Shimozono","doi":"10.1109/PDP.2015.32","DOIUrl":"https://doi.org/10.1109/PDP.2015.32","url":null,"abstract":"In recent data centres, large-scale storage systems storing big data comprise thousands of large-capacity drives. Our goal is to establish a method for building highly reliable storage systems using more than a thousand low-cost large-capacity drives. Some large-scale storage systems protect data by erasure coding to prevent data loss. As the redundancy level of erasure coding is increased, the probability of data loss will decrease, but the increase in normal data write operation and additional storage for coding will be incurred. We therefore need to achieve high reliability at the lowest possible redundancy level. There are two concerns regarding reliability in large-scale storage systems: (i) as the number of drives increases, systems are more subject to multiple drive failures and (ii) distributing stripes among many drives can speed up the rebuild time but increase the risk of data loss due to multiple drive failures. These concerns were not addressed in prior quantitative reliability studies based on realistic settings. In this work, we analyze the reliability of large-scale storage systems with distributed stripes, focusing on an effective rebuild method which we call Dynamic Refuging. Dynamic Refuging rebuilds failed storage areas from those with the lowest redundancy and strategically selects blocks to read for repairing lost data. We modeled the dynamically changing amount of storage at each redundancy level due to multiple drive failures, and performed reliability analysis with Monte Carlo simulation using realistic drive failure characteristics. When stripes with redundancy level 3 were sufficiently distributed and rebuilt by Dynamic Refuging, we found that the probability of data loss decreased by two orders of magnitude for systems with 384 or more drives compared to normal RAID. This technique turned out to scale well, and a system with 1536 inexpensive drives attained lower data loss probability than RAID 6 with 16 enterprise-class drives.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127346662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Performance Evaluation of Parallel HEVC Strategies 并行HEVC策略的性能评价
Georgios Georgakarakos, Leonidas Tsiopoulos, J. Lilius, Joakim Haldin, U. Falk
{"title":"Performance Evaluation of Parallel HEVC Strategies","authors":"Georgios Georgakarakos, Leonidas Tsiopoulos, J. Lilius, Joakim Haldin, U. Falk","doi":"10.1109/PDP.2015.61","DOIUrl":"https://doi.org/10.1109/PDP.2015.61","url":null,"abstract":"Parallel video coding has emerged from the need to map video algorithms in many/multi-core architectures and achieve ever-growing performance goals in video-based applications. Several parallelization methods have been proposed around H.264 algorithm but it was only until the new HEVC video standard that two parallelization strategies-Tiles and Wave front Parallel Processing (WPP) became part of the specification. Effective selection and usage of Tiles or WPP is an open issue. In this paper we evaluate the performance of both strategies in terms of video decoding speed-up including their correlation with additional optimization possibilities like parallel filtering and low-level SIMD operations.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130711693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
MACRON: The NoC-Based Many-Core Parallel Processing Platform and Its Applications in 4G Communication Systems 基于noc的多核并行处理平台及其在4G通信系统中的应用
X. Ling, Yiou Chen, Zhiliang Yu, Shih-Hsiang Chen, Xiaodong Wang, Gui Liang
{"title":"MACRON: The NoC-Based Many-Core Parallel Processing Platform and Its Applications in 4G Communication Systems","authors":"X. Ling, Yiou Chen, Zhiliang Yu, Shih-Hsiang Chen, Xiaodong Wang, Gui Liang","doi":"10.1109/PDP.2015.86","DOIUrl":"https://doi.org/10.1109/PDP.2015.86","url":null,"abstract":"The increasing demand of computation capacity has made many-core parallel processing (MPP) a compelling choice for computation-intensive applications. The networks-on-chip (NoC) architecture is an effective way to interconnect dozens of processing cores, while the logic circuits and the actual performance need to be verified in specific platform. We proposed and implemented the MACRON platform to provide verification for complicated applications based on NoC architecture by coordinating the software tool and the hardware devices closely. In MACRON, the virtual output queue with look-ahead routing is proposed to reduce the transmission delay through the NoC router. The heterogeneous processing elements: vector processor core, scalar processor core and accelerator core are designed, thus a thorough 'soft' signal processing can be approached. A real-time 4G wireless communication system based on NoC is demonstrated on this MACRON platform.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134155725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Row Tables: Design Choices to Exploit Bank Locality in Multiprogram Workloads 行表:在多程序工作负载中利用银行局部性的设计选择
Paula Navarro, Vicent Selfa, J. Sahuquillo, M. E. Gómez, Crispín Gómez Requena
{"title":"Row Tables: Design Choices to Exploit Bank Locality in Multiprogram Workloads","authors":"Paula Navarro, Vicent Selfa, J. Sahuquillo, M. E. Gómez, Crispín Gómez Requena","doi":"10.1109/PDP.2015.100","DOIUrl":"https://doi.org/10.1109/PDP.2015.100","url":null,"abstract":"Main memory is a major performance bottleneck in current chip multiprocessors. Current DRAM banks latch the last accessed row in an internal buffer, namely row buffer (RB), which allows fast subsequent accesses to that row. This throughput-oriented approach was originally designed for single-thread processors and pursues to take advantage of the spatial locality that individual applications exhibit. This paper proposes row tables, a pool of row buffers shared among threads. Depending on the needs of each thread, row buffers are dynamically allocated to threads. Two design approaches are devised differing on the table location, and referred to as BRT (Bank Row Table) and CRT (Controller Row Table), which place the table at the bank, as traditionally done in existing modules, and at the memory controller side, respectively. CRT performs better than BRT in high RB locality applications (or mixes) but performs worse in poor RB locality applications since the increase in transfer times is not later amortized. A variant of CRT referred to as CRT 1/x has been devised to reduce this performance penalty. Results for a 4-core system show that, on average, BRT and CRT 1/x mechanisms save energy by 23% and 7%-16% (depending on the X value) and improve IPC by 10% and 9%-14%, respectively.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"334 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134326268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Experiences of Using Cassandra for Molecular Dynamics Simulations 利用Cassandra进行分子动力学模拟的经验
R. Hernandez, C. Cugnasco, Y. Becerra, J. Torres, E. Ayguadé
{"title":"Experiences of Using Cassandra for Molecular Dynamics Simulations","authors":"R. Hernandez, C. Cugnasco, Y. Becerra, J. Torres, E. Ayguadé","doi":"10.1109/PDP.2015.43","DOIUrl":"https://doi.org/10.1109/PDP.2015.43","url":null,"abstract":"In response to the requirements of applications that work with large amounts of data, various NoSQL databases have appeared to deal specifically with these challenges. These systems have become popular in environments such as data analytics and OLTP, however these are not the only data-intensive applications that can benefit from these databases. In the life sciences domain, there are many applications that still use flat files as a medium to store data, and they see themselves very limited in terms of scalability and performance, as well as code complexity. We present an analysis on the viability of using these databases for applications with data demands that differ in some of the characteristics from what these systems were originally designed for. By using these databases, we can also observe that the design of the data model, queries and other configuration parameters can have a considerable impact on performance, thus we present examples of different data and system configurations to analyse their effects on performance. With the executions that are presented in this paper we can see performance gaps of a factor of up to almost 5 between using different models, queries and configuration parameters.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133881939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Enhancing and Evaluating the Configuration Capability of a Skeleton for Irregular Computations 不规则计算中骨架构型能力的增强与评价
Carlos H. Gonzalez, B. Fraguela
{"title":"Enhancing and Evaluating the Configuration Capability of a Skeleton for Irregular Computations","authors":"Carlos H. Gonzalez, B. Fraguela","doi":"10.1109/PDP.2015.41","DOIUrl":"https://doi.org/10.1109/PDP.2015.41","url":null,"abstract":"Although skeletons largely facilitate the parallelization of algorithms, they often provide little support for the work decomposition. Also, while they have been widely applied to regular computations, this has not been case for irregular algorithms that can exploit amorphous data-parallelism, whose parallelization in fact requires much more effort from programmers and thus benefits more from a structured approach. In this paper we improve and evaluate the configurability of a recently proposed skeleton that allows to parallelize this latter kind of algorithms. Namely, the skeleton allows to easily change critical details such as the data structures, the work partitioning algorithm or the task granularity to use. The simple procedures to choose among these possibilities and their influence on performance are described and evaluated. We conclude that the skeleton allows to conveniently explore different possibilities for the parallelization of irregular applications, which can result in substantial performance improvements.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128671109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Revealing Potential Performance Improvements by Utilizing Hybrid Work-Sharing for Resource-Intensive Seismic Applications 利用混合工作共享揭示资源密集型地震应用的潜在性能改进
P. Siegl, R. Buchty, Mladen Berekovic
{"title":"Revealing Potential Performance Improvements by Utilizing Hybrid Work-Sharing for Resource-Intensive Seismic Applications","authors":"P. Siegl, R. Buchty, Mladen Berekovic","doi":"10.1109/PDP.2015.28","DOIUrl":"https://doi.org/10.1109/PDP.2015.28","url":null,"abstract":"Heterogeneous system architectures are becoming more and more of a commodity in the scientific community. While it remains challenging to fully exploit such architectures, the benefits in performance and hybrid speed-up, by using a host processor and accelerators in parallel in a non-monolithic matter, are significant. Hereby, the energy efficiency is becoming an increasingly critical challenge for future high-performance computing (HPC) systems, which do want to exceed the Exascale barrier with several competing architecture concepts ranging from high-performance CPUs, combined with GPUs acting as floating-point accelerators, to computationally weak CPUs, paired with dedicated and highly-perform ant FPGA-based accelerators. In this paper, we realize and evaluate a hybrid computing approach based on a two-dimensional seismic streaming algorithm with several heterogeneous system architectures, including conventional HPC approaches based on powerful CPUs and GPUs. Furthermore, we elaborate the effort on an embedded system platform claiming to be a \"mini supercomputer\" [1]. Several CPU and accelerator combinations are utilized in a manual work-sharing manner with the aim of achieving significant performance speed-ups and a detailed energy-efficiency study. Based on roofline models and experimental evaluations, the paper provides an insight into the fact that hybrid computing is mostly unconditionally beneficial for balanced systems regarding the performance as well as the energy efficiency, aiding the programmer in the decision whether or not costly, manually tuned, homogeneous implementations are worthwhile.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116225935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Selecting Points of Interest in Traces Using Patterns of Events 使用事件模式选择轨迹中的兴趣点
François Trahay, E. Brunet, Mohamed Said Mosli Bouksiaa, Jianwei Liao
{"title":"Selecting Points of Interest in Traces Using Patterns of Events","authors":"François Trahay, E. Brunet, Mohamed Said Mosli Bouksiaa, Jianwei Liao","doi":"10.1109/PDP.2015.30","DOIUrl":"https://doi.org/10.1109/PDP.2015.30","url":null,"abstract":"Over the past few years, the architecture of supercomputing platforms has evolved towards more complexity: multicore processors attached to multiple memory banks are now combined with accelerators. Exploiting such architecture often requires to mix programming models (MPI + CUDA for instance). As a result, understanding the performance of an application has become tedious. The use of performance analysis tools, such as tracing tools, now becomes unavoidable to optimize a parallel application. However, analyzing a trace file composed of millions of events requires a tremendous amount of work in order to spot the cause of the poor performance of an application. In this paper, we propose mechanisms for assisting application developers in their exploration of trace files. We propose an algorithm for detecting repetitive patterns of events in trace files. Thanks to this algorithm, a trace can be viewed as loops and groups of events instead of the usual representation as a sequential list of events. We also propose a method to filter traces in order to eliminate duplicated information and to highlight points of interest. These mechanisms allow the performance analysis tool to pre-select the subsets of the trace that are more likely to contain useful information. We implemented the proposed mechanism in the EZTrace performance analysis framework and the experiments show that detecting patterns in various benchmarking applications is done in reasonable time, even when the trace contains millions of events. We also show that the filtering process can reduce the quantity of information in the trace that the user has to analyze by up to 99 %.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116777096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信