2016 45th International Conference on Parallel Processing (ICPP)最新文献_第5页

EchoLoc: Accurate Device-Free Hand Localization Using COTS Devices EchoLoc:使用COTS设备进行精确的手部定位

2016 45th International Conference on Parallel Processing (ICPP) Pub Date : 2016-08-01 DOI: 10.1109/ICPP.2016.45

Huijie Chen, Fan Li, Yu Wang

引用次数: 10

Think Global, Act Local: A Buffer Cache Design for Global Ordering and Parallel Processing in the WAFL File System 全局思考，局部行动:WAFL文件系统中全局排序和并行处理的缓存设计

2016 45th International Conference on Parallel Processing (ICPP) Pub Date : 2016-08-01 DOI: 10.1109/ICPP.2016.51

P. Denz, Matthew Curtis-Maury, V. Devadas

{"title":"Think Global, Act Local: A Buffer Cache Design for Global Ordering and Parallel Processing in the WAFL File System","authors":"P. Denz, Matthew Curtis-Maury, V. Devadas","doi":"10.1109/ICPP.2016.51","DOIUrl":"https://doi.org/10.1109/ICPP.2016.51","url":null,"abstract":"Given the enormous disparity in access speeds between main memory and storage media, modern storage servers must leverage highly effective buffer cache policies to meet demanding performance requirements. At the same time, these page replacement policies need to scale efficiently with ever-increasing core counts and memory sizes, which necessitate parallel buffer cache management. However, these requirements of effectiveness and scalability are at odds, because centralized processing does not scale with more processors and parallel policies are a challenge to implement with maximum effectiveness. We have overcome this difficulty in the NetApp Data ONTAP WAFL file system by using a sophisticated technique to simultaneously allow global buffer prioritization while providing parallel management operations. In addition, we have extended the buffer cache to provide a soft isolation of different workloads' buffer cache usage, which is akin to buffer cache quality of server (QoS). This paper presents the design and implementation of these significant extensions in the buffer cache of a high-performance commercial file system.","PeriodicalId":409991,"journal":{"name":"2016 45th International Conference on Parallel Processing (ICPP)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116750350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

ROP: Alleviating Refresh Overheads via Reviving the Memory System in Frozen Cycles ROP:通过在冻结周期中恢复内存系统来减轻刷新开销

2016 45th International Conference on Parallel Processing (ICPP) Pub Date : 2016-08-01 DOI: 10.1109/ICPP.2016.26

Ping-Hsiu Huang, Wenjie Liu, Kun Tang, Xubin He, Ke Zhou

{"title":"ROP: Alleviating Refresh Overheads via Reviving the Memory System in Frozen Cycles","authors":"Ping-Hsiu Huang, Wenjie Liu, Kun Tang, Xubin He, Ke Zhou","doi":"10.1109/ICPP.2016.26","DOIUrl":"https://doi.org/10.1109/ICPP.2016.26","url":null,"abstract":"DRAM memory performs periodic refreshes to prevent data loss due to charge leakage, while memory refreshes cause performance degradation and energy consumption, referred to as refresh overheads. In this paper, we propose Refresh-Oriented Prefetching (ROP) to alleviate memory refresh overheads. Before a refresh starts, ROP prefetches cache lines from the tobe-refreshed rank into an added SRAM buffer. In doing so, when a rank is undergoing refresh, memory requests can still be serviced rather than being blocked. At the core of ROP is a probabilistic prefetch model determining which cache lines are prefetched for a refresh based on the access patterns appearing in an observational window ahead of the refresh. A Pattern Profiler collects statistics about memory traffic occurring before and after the starting time of each refresh operation in a period of training time and it outputs two conditional probabilities which are used to control subsequent prefetch decisions. A Prefetcher maintains a prediction table which helps to ascertain access patterns appearing around refresh operations. The prediction table is updated every time an access occurs to the to-be-nextrefreshed ran during the observational window and is consulted to decide which cache lines are prefetched. Extensive evaluation results with benchmarks from SPEC CPU2006 on a DDR4 memory have demonstrated that with ROP memory performance can be improved by up to 9.2% (3.3% on average) for singlecore simulations, while reducing the overall memory energy by up to 6.7% (3.6% on average), relative to an auto-refresh baseline memory. Moreover, it increases the Weighted Speedup by up to 2.22X (1.32X on average) for 4-core multiprogram simulations, while reducing energy by up to 48.8% (24.4% on average).","PeriodicalId":409991,"journal":{"name":"2016 45th International Conference on Parallel Processing (ICPP)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128408657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Scalable Hierarchical Polyhedral Compilation 可扩展的分层多面体编译

2016 45th International Conference on Parallel Processing (ICPP) Pub Date : 2016-08-01 DOI: 10.1109/ICPP.2016.56

B. Pradelle, Benoît Meister, M. Baskaran, A. Konstantinidis, Thomas Henretty, R. Lethin

{"title":"Scalable Hierarchical Polyhedral Compilation","authors":"B. Pradelle, Benoît Meister, M. Baskaran, A. Konstantinidis, Thomas Henretty, R. Lethin","doi":"10.1109/ICPP.2016.56","DOIUrl":"https://doi.org/10.1109/ICPP.2016.56","url":null,"abstract":"Computers across the board, from embedded to future exascale computers, are consistently designed with deeper memory hierarchies. While this opens up exciting opportunities for improving software performance and energy efficiency, it also makes it increasingly difficult to efficiently exploit the hardware. Advanced compilation techniques are a possible solution to this difficult problem and, among them, the polyhedral compilation technology provides a pathway for performing advanced automatic parallelization and code transformations. However, the polyhedral model is also known for its poor scalability with respect to the number of dimensions in the polyhedra that are used for representing programs. Although current compilers can cope with such limitation when targeting shallow hierarchies, polyhedral optimizations often become intractable as soon as deeper hardware hierarchies are considered. We address this problem by introducing two new operators for polyhedral compilers: focalisation and defocalisation. When applied in the compilation flow, the new operators reduce the dimensionality of polyhedra, which drastically simplifies the mathematical problems solved during the compilation. We prove that the presented operators preserve the original program semantics, allowing them to be safely used in compilers. We implemented the operators in a production compiler, which drastically improved its performance and scalability when targeting deep hierarchies.","PeriodicalId":409991,"journal":{"name":"2016 45th International Conference on Parallel Processing (ICPP)","volume":"16 Suppl 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131069850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Criticality-Aware Partitioning for Multicore Mixed-Criticality Systems 多核混合临界系统的临界感知分区

2016 45th International Conference on Parallel Processing (ICPP) Pub Date : 2016-08-01 DOI: 10.1109/ICPP.2016.33

Jianjun Han, Xin Tao, Dakai Zhu, Hakan Aydin

{"title":"Criticality-Aware Partitioning for Multicore Mixed-Criticality Systems","authors":"Jianjun Han, Xin Tao, Dakai Zhu, Hakan Aydin","doi":"10.1109/ICPP.2016.33","DOIUrl":"https://doi.org/10.1109/ICPP.2016.33","url":null,"abstract":"The scheduling for mixed-criticality (MC) systems, where multiple activities have different certification requirements and thus different criticality on a shared hardware platform, has recently become an important research focus. In this work, considering that multicore processors have emerged as the de-facto platform for modern embedded systems, we propose a novel and efficient criticality-aware task partitioning algorithm (CA-TPA) for a set of periodic MC tasks running on multicore systems. We employ the state-of-the art EDF-VD scheduler on each core. Our work is based on the observation that the utilizations of MC tasks at different criticality levels can have quite large variations, hence when a task is allocated, its utilization contribution on different processors may vary by large margins and this can significantly affect the schedulability of tasks. During partitioning, CA-TPA sorts the tasks according to their utilization contributions on individual processors. Several heuristics are investigated to balance the workload on processors with the objective of improving the schedulability of tasks under CA-TPA. The simulation results show that our proposed CA-TPA scheme is effective, giving much higher schedulability ratios when compared to the classical partitioning schemes.","PeriodicalId":409991,"journal":{"name":"2016 45th International Conference on Parallel Processing (ICPP)","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123542310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

MPI Overlap: Benchmark and Analysis MPI重叠:基准和分析

2016 45th International Conference on Parallel Processing (ICPP) Pub Date : 2016-08-01 DOI: 10.1109/ICPP.2016.37

Alexandre Denis, François Trahay

引用次数: 22

An Efficient Wireless Power Transfer System to Balance the State of Charge of Electric Vehicles 一种高效的电动汽车充电平衡无线传输系统

2016 45th International Conference on Parallel Processing (ICPP) Pub Date : 2016-08-01 DOI: 10.1109/ICPP.2016.44

Ankur Sarker, Chenxi Qiu, Haiying Shen, A. Gil, J. Taiber, M. Chowdhury, Jim Martin, Mac Devine, A. J. Rindos

{"title":"An Efficient Wireless Power Transfer System to Balance the State of Charge of Electric Vehicles","authors":"Ankur Sarker, Chenxi Qiu, Haiying Shen, A. Gil, J. Taiber, M. Chowdhury, Jim Martin, Mac Devine, A. J. Rindos","doi":"10.1109/ICPP.2016.44","DOIUrl":"https://doi.org/10.1109/ICPP.2016.44","url":null,"abstract":"As an alternate form in the road transportation system, electric vehicle (EV) can help reduce the fossil-fuel consumption. However, the usage of EVs is constrained by the limited capacity of battery. Wireless Power Transfer (WPT) can increase the driving range of EVs by charging EVs in motion when they drive through a wireless charging lane embedded in a road. The amount of power that can be supplied by a charging lane at a time is limited. A problem here is when a large number of EVs pass a charging lane, how to efficiently distribute the power among different penetrations levels of EVs? However, there has been no previous research devoted to tackling this challenge. To handle this challenge, we propose a system to balance the State of Charge (called BSoC) among the EVs. It consists of three components: i) fog-based power distribution architecture, ii) power scheduling model, and iii) efficient vehicle-to-fog communication protocol. The fog computing center collects information from EVs and schedules the power distribution. We use fog closer to vehicles rather than cloud in order to reduce the communication latency. The power scheduling model schedules the power allocated to each EV. In order to avoid network congestion between EVs and the fog, we let vehicles choose their own communication channel to communicate with local controllers. Finally, we evaluate our system using extensive simulation studies in Network Simulator-3, MatLab, and Simulation for Urban MObility tools, and the experimental results confirm the efficiency of our system.","PeriodicalId":409991,"journal":{"name":"2016 45th International Conference on Parallel Processing (ICPP)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133248891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

Parallel Two-Dimensional Unstructured Anisotropic Delaunay Mesh Generation of Complex Domains for Aerospace Applications 航空航天应用中平行二维非结构各向异性Delaunay网格生成

2016 45th International Conference on Parallel Processing (ICPP) Pub Date : 2016-08-01 DOI: 10.1109/ICPP.2016.76

Juliette Pardue, Andrey N. Chernikov

引用次数: 2

Managing I/O Interference in a Shared Burst Buffer System 在共享突发缓冲系统中管理I/O干扰

2016 45th International Conference on Parallel Processing (ICPP) Pub Date : 2016-08-01 DOI: 10.1109/ICPP.2016.54

Sagar Thapaliya, P. Bangalore, J. Lofstead, K. Mohror, A. Moody

引用次数: 33

High Performance Parallel Algorithms for the Tucker Decomposition of Sparse Tensors 稀疏张量Tucker分解的高性能并行算法

2016 45th International Conference on Parallel Processing (ICPP) Pub Date : 2016-08-01 DOI: 10.1109/ICPP.2016.19

O. Kaya, B. Uçar

{"title":"High Performance Parallel Algorithms for the Tucker Decomposition of Sparse Tensors","authors":"O. Kaya, B. Uçar","doi":"10.1109/ICPP.2016.19","DOIUrl":"https://doi.org/10.1109/ICPP.2016.19","url":null,"abstract":"We investigate an efficient parallelization of a class of algorithms for the well-known Tucker decomposition of general N-dimensional sparse tensors. The targeted algorithms are iterative and use the alternating least squares method. At each iteration, for each dimension of an N-dimensional input tensor, the following operations are performed: (i) the tensor is multiplied with (N - 1) matrices (TTMc step), (ii) the product is then converted to a matrix, and (iii) a few leading left singular vectors of the resulting matrix are computed (TRSVD step) to update one of the matrices for the next TTMc step. We propose an efficient parallelization of these algorithms for the current parallel platforms with multicore nodes. We discuss a set of preprocessing steps which takes all computational decisions out of the main iteration of the algorithm and provides an intuitive shared-memory parallelism for the TTM and TRSVD steps. We propose a coarse and a fine-grain parallel algorithm in a distributed memory environment, investigate data dependencies, and identify efficient communication schemes. We demonstrate how the computation of singular vectors in the TRSVD step can be carried out efficiently following the TTMc step. Finally, we develop a hybrid MPI-OpenMP implementation of the overall algorithm and report scalability results on up to 4096 cores on 256 nodes of an IBM BlueGene/Q supercomputer.","PeriodicalId":409991,"journal":{"name":"2016 45th International Conference on Parallel Processing (ICPP)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123619515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 55