2016 45th International Conference on Parallel Processing (ICPP)最新文献_第4页

Performance Maximization via Frequency Oscillation on Temperature Constrained Multi-core Processors 基于频率振荡的温度约束多核处理器性能最大化

2016 45th International Conference on Parallel Processing (ICPP) Pub Date : 2016-08-01 DOI: 10.1109/ICPP.2016.67

Shi Sha, Wujie Wen, Ming Fan, Shaolei Ren, Gang Quan

引用次数: 8

Sparse Matrix Format Selection with Multiclass SVM for SpMV on GPU GPU上SpMV的多类SVM稀疏矩阵格式选择

2016 45th International Conference on Parallel Processing (ICPP) Pub Date : 2016-08-01 DOI: 10.1109/ICPP.2016.64

Akrem Benatia, Weixing Ji, Yizhuo Wang, Feng Shi

{"title":"Sparse Matrix Format Selection with Multiclass SVM for SpMV on GPU","authors":"Akrem Benatia, Weixing Ji, Yizhuo Wang, Feng Shi","doi":"10.1109/ICPP.2016.64","DOIUrl":"https://doi.org/10.1109/ICPP.2016.64","url":null,"abstract":"Sparse Matrix-Vector Multiplication (SpMV) kernel dominates the computing cost in numerous scientific applications. Many implementations based on different sparse formats were proposed recently for this kernel on the GPU side. Since the performance of these sparse formats varies significantly according to the sparsity characteristics of the input matrix and the hardware specifications, no one of them can be considered as the best one to use for every sparse matrix. In this paper, we address the problem of selecting the best representation for a given sparse matrix on GPU by using a machine learning approach. First, we present some interesting and easy to compute features for characterizing the sparse matrices on GPU. Second, we use a multiclass Support Vector Machine (SVM) classifier to select the best format for each input matrix. We consider in this paper four popular formats (COO, CSR, ELL, and HYB), but our work can be extended to support more sparse representations. Experimental results on two different GPUs (Fermi GTX 580 and Maxwell GTX 980 Ti) show that we achieved more than 98% of the performance possible with a perfect selection.","PeriodicalId":409991,"journal":{"name":"2016 45th International Conference on Parallel Processing (ICPP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126670121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 53

AppBag: Application-Aware Bandwidth Allocation for Virtual Machines in Cloud Environment AppBag:云环境下虚拟机的应用感知带宽分配

2016 45th International Conference on Parallel Processing (ICPP) Pub Date : 2016-08-01 DOI: 10.1109/ICPP.2016.10

Dian Shen, Junzhou Luo, Fang Dong, Junxue Zhang

{"title":"AppBag: Application-Aware Bandwidth Allocation for Virtual Machines in Cloud Environment","authors":"Dian Shen, Junzhou Luo, Fang Dong, Junxue Zhang","doi":"10.1109/ICPP.2016.10","DOIUrl":"https://doi.org/10.1109/ICPP.2016.10","url":null,"abstract":"It is challenging to allocate the network bandwidth to virtual machines(VMs) hosting communication-intensive applications. Due to the temporal and spatial variability of the hosted applications, it is crucial how much bandwidth to be reserved for each VM and when to adjust it. Prior approaches typically resort to predicting the applications' network demands, according to which the VMs are placed once for all or periodically migrated. However, recent works conceded that the network demands of applications can only be accurately derived right before each execution phase. In this paper, we propose AppBag, an Application-aware Bandwidth guarantee framework which allocates the bandwidth to VMs using only one-stepahead information. An efficient VM migration algorithm is then proposed to adjust the bandwidth allocation and corresponding VM placement, subjected to the network demands variation in future execution phases. We further implement AppBag with OpenStack and deploy it on the testbed environment in our data center. Extensive evaluations using popular applications show that AppBag can handle the bandwidth requests at run-time while improving applications' performance and reducing the global traffic in the data center fabric.","PeriodicalId":409991,"journal":{"name":"2016 45th International Conference on Parallel Processing (ICPP)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133327105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

RRect: A Novel Server-centric Data Center Network with High Availability rect:一种新型的以服务器为中心的高可用性数据中心网络

2016 45th International Conference on Parallel Processing (ICPP) Pub Date : 2016-08-01 DOI: 10.1109/ICPP.2016.12

Zhenhua Li, Yuanyuan Yang

{"title":"RRect: A Novel Server-centric Data Center Network with High Availability","authors":"Zhenhua Li, Yuanyuan Yang","doi":"10.1109/ICPP.2016.12","DOIUrl":"https://doi.org/10.1109/ICPP.2016.12","url":null,"abstract":"In this paper, we propose a novel server-centric network for data centers, called RRect. Compared to existing server-centric networks, RRect has a linear diameter to the network order and abundant parallel paths with near-equal lengths, so that traffic in RRect enjoys a short and predictable communication latency. We present an efficient routing algorithm to find paths between any pair of servers in RRect. A complete addressing scheme and recursive RRect construction procedure are also provided in this paper. Meanwhile, to meet today's stringent high availability requirement, unlike existing server-centric network structures, RRect can be configured into redundancy and failover scheme, in which the backup server can fully take the place of the corresponding malfunctioning server without losing topological advantages, such as multiple near-equal parallel paths. Our comprehensive simulations show that RRect gives a better average path lengths and a more balanced path distribution among all pairs of servers. Meanwhile, RRect can maintain the same performance on many critical metrics as BCube, including short diameter and excellent aggregate throughput. All these features make RRect a very empirical structure for enterprise dater center network products.","PeriodicalId":409991,"journal":{"name":"2016 45th International Conference on Parallel Processing (ICPP)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129393402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

In Situ Storage Layout Optimization for AMR Spatio-temporal Read Accesses AMR时空读访问的原位存储布局优化

2016 45th International Conference on Parallel Processing (ICPP) Pub Date : 2016-08-01 DOI: 10.1109/ICPP.2016.53

Houjun Tang, S. Byna, Steve Harenberg, Wenzhao Zhang, Xiaocheng Zou, Daniel F. Martin, Bin Dong, D. Devendran, Kesheng Wu, D. Trebotich, S. Klasky, N. Samatova

{"title":"In Situ Storage Layout Optimization for AMR Spatio-temporal Read Accesses","authors":"Houjun Tang, S. Byna, Steve Harenberg, Wenzhao Zhang, Xiaocheng Zou, Daniel F. Martin, Bin Dong, D. Devendran, Kesheng Wu, D. Trebotich, S. Klasky, N. Samatova","doi":"10.1109/ICPP.2016.53","DOIUrl":"https://doi.org/10.1109/ICPP.2016.53","url":null,"abstract":"Analyses of large simulation data often concentrate on regions in space and in time that contain important information. As simulations adopt Adaptive Mesh Refinement (AMR), the data records from a region of interest could be widely scattered on storage devices and accessing interesting regions results in significantly reduced I/O performance. In this work, we study the organization of block-structured AMR data on storage to improve performance of spatio-temporal data accesses. AMR has a complex hierarchical multi-resolution data structure that does not fit easily with the existing approaches that focus on uniform mesh data. To enable efficient AMR read accesses, we develop an in situ data layout optimization framework. Our framework automatically selects from a set of candidate layouts based on a performance model, and reorganizes the data before writing to storage. We evaluate this framework with three AMR datasets and access patterns derived from scientific applications. Our performance model is able to identify the best layout scheme and yields up to a 3X read performance improvement compared to the original layout. Though it is not possible to turn all read accesses into contiguous reads, we are able to achieve 90% of contiguous read throughput with the optimized layouts on average.","PeriodicalId":409991,"journal":{"name":"2016 45th International Conference on Parallel Processing (ICPP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129249378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Fast RFID Polling Protocols 快速RFID轮询协议

2016 45th International Conference on Parallel Processing (ICPP) Pub Date : 2016-08-01 DOI: 10.1109/ICPP.2016.42

Jia Liu, Bin Xiao, Xuan Liu, Lijun Chen

{"title":"Fast RFID Polling Protocols","authors":"Jia Liu, Bin Xiao, Xuan Liu, Lijun Chen","doi":"10.1109/ICPP.2016.42","DOIUrl":"https://doi.org/10.1109/ICPP.2016.42","url":null,"abstract":"Polling is a widely used anti-collision protocol that interrogates RFID tags in a request-response way. In conventional polling, the reader needs to broadcast 96-bit tag IDs to separate each tag from others, leading to long interrogation delay. This paper takes the first step to design fast polling protocols by shortening the polling vector. We first propose an efficient Hash Polling Protocol (HPP) that uses hash indices rather than tag IDs as the polling vector to query each tag. The length of the polling vector is dropped from 96 bits to no more than log(n) bits (n is the number of tags). We then enhance HPP (EHPP) to make it not only more efficient but also more steady with respect to the number of tags. To avoid redundant transmissions in both HPP and EHPP, we finally propose a Tree-based Polling Protocol (TPP) that reserves the invariant portion of the polling vector while updates only the discrepancy by constructing and broadcasting a polling tree. Theoretical analysis shows that the average length of the polling vector in TPP levels off at only 3.44, 28 times less than 96-bit tag IDs. We also apply our protocols to collect tag information and simulation results demonstrate that our best protocol TPP outperforms the state-of-the-art information collection protocol.","PeriodicalId":409991,"journal":{"name":"2016 45th International Conference on Parallel Processing (ICPP)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121118373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Declarative Tuning for Locality in Parallel Programs 并行程序中局部性的声明调优

2016 45th International Conference on Parallel Processing (ICPP) Pub Date : 2016-08-01 DOI: 10.1109/ICPP.2016.58

S. Chatterjee, Nick Vrvilo, Zoran Budimlic, K. Knobe, Vivek Sarkar

{"title":"Declarative Tuning for Locality in Parallel Programs","authors":"S. Chatterjee, Nick Vrvilo, Zoran Budimlic, K. Knobe, Vivek Sarkar","doi":"10.1109/ICPP.2016.58","DOIUrl":"https://doi.org/10.1109/ICPP.2016.58","url":null,"abstract":"Optimized placement of data and computation for locality is critical for improving performance and reducing energy consumption on modern computing systems. However, for most programming models, modifying data and computation placements typically requires rewriting large portions of the application, thereby posing a huge performance portability challenge in today's rapidly evolving architecture landscape. In this paper we present TunedCnC, a novel, declarative and flexible CnC tuning framework for controlling the spatial and temporal placement of data and computation by specifying hierarchical affinity groups and distribution functions. TunedCnC emphasizes a separation of concerns: the domain expert specifies a parallel application by defining data and control dependences, while the tuning expert specifies how the application should be executed on a given architecture - defining when and where for data and computation placement. The application remains unchanged when tuned for a different platform or towards different performance goals. We evaluate the utility of TunedCnC on several applications, and demonstrate that varying the tuning specification can have a significant impact on an application's performance. Our evaluation is performed using an implementation of the Concurrent Collections (CnC) declarative parallel programming model, but our results should be applicable to tuning of other data-flow task-parallel programming models as well.","PeriodicalId":409991,"journal":{"name":"2016 45th International Conference on Parallel Processing (ICPP)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126138604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

On the Impact of Widening Vector Registers on Sequence Alignment 向量寄存器加宽对序列对齐的影响

2016 45th International Conference on Parallel Processing (ICPP) Pub Date : 2016-08-01 DOI: 10.1109/ICPP.2016.65

J. Daily, A. Kalyanaraman, S. Krishnamoorthy, Bin Ren

引用次数: 2

RCHC: A Holistic Runtime System for Concurrent Heterogeneous Computing RCHC:并发异构计算的整体运行时系统

2016 45th International Conference on Parallel Processing (ICPP) Pub Date : 2016-08-01 DOI: 10.1109/ICPP.2016.31

Jinsu Park, Woongki Baek

{"title":"RCHC: A Holistic Runtime System for Concurrent Heterogeneous Computing","authors":"Jinsu Park, Woongki Baek","doi":"10.1109/ICPP.2016.31","DOIUrl":"https://doi.org/10.1109/ICPP.2016.31","url":null,"abstract":"Concurrent heterogeneous computing (CHC) is rapidly emerging as a promising solution for high-performance and energy-efficient computing. The fundamental challenges for efficient CHC are how to partition the workload of the target application across the devices in the underlying CHC system and how to control the operating frequency of each device in order to maximize the overall efficiency. Despite the extensive prior work on the system software techniques for CHC, efficient runtime support for CHC that robustly supports both functional and performance heterogeneity without the need for extensive offline profiling still remains unexplored. To bridge this gap, we propose RCHC, a holistic runtime system for concurrent heterogeneous computing. RCHC dynamically profiles the target application and constructs the performance and power estimation models based on the runtime information. Guided by the estimation models, RCHC explores the system state space, determines the best system state that is expected to maximize the efficiency of the target application, and accordingly executes it. Our experimental results demonstrate that RCHC significantly outperforms the baseline version (e.g., 61.0% higher energy efficiency on average) that employs the GPU and achieves the efficiency comparable with that of the static best version, which requires extensive offline profiling.","PeriodicalId":409991,"journal":{"name":"2016 45th International Conference on Parallel Processing (ICPP)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114640900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

A Scalability Comparison Study of Data Management Approaches for Smart Metering Systems 智能计量系统数据管理方法的可扩展性比较研究

2016 45th International Conference on Parallel Processing (ICPP) Pub Date : 2016-08-01 DOI: 10.1109/ICPP.2016.61

Houssem-Eddine Chihoub, C. Collet

{"title":"A Scalability Comparison Study of Data Management Approaches for Smart Metering Systems","authors":"Houssem-Eddine Chihoub, C. Collet","doi":"10.1109/ICPP.2016.61","DOIUrl":"https://doi.org/10.1109/ICPP.2016.61","url":null,"abstract":"Nowadays, more and more data are being generated and collected in electrical smart grids. Most of these data are coming from smart meters and sensors deployed massively throughout the power grid. As the generation of data is becoming ever more frequent and with the constantly increasing volumes, it is becoming harder and harder to manage and process these data at the scale of a smart grid within legacy systems. In this work, we focus on investigating the scalability and performance of different data management approaches for meter data processing. To this end, we conduct a thorough experimental study of various systems including a parallel relational database system, MapReduce based systems including Hadoop and Spark, and a NoSQL datastore system. Our experiment sets were conducted on up to 140 nodes on Grid5000 and up to 1.4 TB of meter data. Our results demonstrate that parallel relational systems are more suited for most processing types on smart meter data in the smart grid but at the cost of very slow data loading. In contrast, we show that with the appropriate distribution model, data partitioning and modeling choices we achieve very fast and scalable bill computations, the main complex processing for utilities providers.","PeriodicalId":409991,"journal":{"name":"2016 45th International Conference on Parallel Processing (ICPP)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117154674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7