2016 45th International Conference on Parallel Processing (ICPP)最新文献

筛选
英文 中文
EchoLoc: Accurate Device-Free Hand Localization Using COTS Devices EchoLoc:使用COTS设备进行精确的手部定位
2016 45th International Conference on Parallel Processing (ICPP) Pub Date : 2016-08-01 DOI: 10.1109/ICPP.2016.45
Huijie Chen, Fan Li, Yu Wang
{"title":"EchoLoc: Accurate Device-Free Hand Localization Using COTS Devices","authors":"Huijie Chen, Fan Li, Yu Wang","doi":"10.1109/ICPP.2016.45","DOIUrl":"https://doi.org/10.1109/ICPP.2016.45","url":null,"abstract":"Hand tracking systems are becoming increasingly popular as a fundamental HCI approach. The trajectory of moving hand can be estimated through smoothing the position coordinates collected from continuous localization. Therefore, hand localization is a key component of any hand tracking systems. This paper presents EchoLoc, which locates the human hand by leveraging the speaker array in Commercial Off-The-Shelf (COTS) devices (i.e., a smart phone plugged with a stereo speaker). EchoLoc measures the distance from the hand to the speaker array via the Time Of Flight (TOF) of the chirp. The speaker array and hand yield a unique triangle, therefore, the hand can be localized with triangular geometry. We prototype EchoLoc on iOS as an application, and find it is capable of localization with the average resolution within five centimeters of 73% and three centimeters of 48%.","PeriodicalId":409991,"journal":{"name":"2016 45th International Conference on Parallel Processing (ICPP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123267071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Think Global, Act Local: A Buffer Cache Design for Global Ordering and Parallel Processing in the WAFL File System 全局思考,局部行动:WAFL文件系统中全局排序和并行处理的缓存设计
2016 45th International Conference on Parallel Processing (ICPP) Pub Date : 2016-08-01 DOI: 10.1109/ICPP.2016.51
P. Denz, Matthew Curtis-Maury, V. Devadas
{"title":"Think Global, Act Local: A Buffer Cache Design for Global Ordering and Parallel Processing in the WAFL File System","authors":"P. Denz, Matthew Curtis-Maury, V. Devadas","doi":"10.1109/ICPP.2016.51","DOIUrl":"https://doi.org/10.1109/ICPP.2016.51","url":null,"abstract":"Given the enormous disparity in access speeds between main memory and storage media, modern storage servers must leverage highly effective buffer cache policies to meet demanding performance requirements. At the same time, these page replacement policies need to scale efficiently with ever-increasing core counts and memory sizes, which necessitate parallel buffer cache management. However, these requirements of effectiveness and scalability are at odds, because centralized processing does not scale with more processors and parallel policies are a challenge to implement with maximum effectiveness. We have overcome this difficulty in the NetApp Data ONTAP WAFL file system by using a sophisticated technique to simultaneously allow global buffer prioritization while providing parallel management operations. In addition, we have extended the buffer cache to provide a soft isolation of different workloads' buffer cache usage, which is akin to buffer cache quality of server (QoS). This paper presents the design and implementation of these significant extensions in the buffer cache of a high-performance commercial file system.","PeriodicalId":409991,"journal":{"name":"2016 45th International Conference on Parallel Processing (ICPP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116750350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
ROP: Alleviating Refresh Overheads via Reviving the Memory System in Frozen Cycles ROP:通过在冻结周期中恢复内存系统来减轻刷新开销
2016 45th International Conference on Parallel Processing (ICPP) Pub Date : 2016-08-01 DOI: 10.1109/ICPP.2016.26
Ping-Hsiu Huang, Wenjie Liu, Kun Tang, Xubin He, Ke Zhou
{"title":"ROP: Alleviating Refresh Overheads via Reviving the Memory System in Frozen Cycles","authors":"Ping-Hsiu Huang, Wenjie Liu, Kun Tang, Xubin He, Ke Zhou","doi":"10.1109/ICPP.2016.26","DOIUrl":"https://doi.org/10.1109/ICPP.2016.26","url":null,"abstract":"DRAM memory performs periodic refreshes to prevent data loss due to charge leakage, while memory refreshes cause performance degradation and energy consumption, referred to as refresh overheads. In this paper, we propose Refresh-Oriented Prefetching (ROP) to alleviate memory refresh overheads. Before a refresh starts, ROP prefetches cache lines from the tobe-refreshed rank into an added SRAM buffer. In doing so, when a rank is undergoing refresh, memory requests can still be serviced rather than being blocked. At the core of ROP is a probabilistic prefetch model determining which cache lines are prefetched for a refresh based on the access patterns appearing in an observational window ahead of the refresh. A Pattern Profiler collects statistics about memory traffic occurring before and after the starting time of each refresh operation in a period of training time and it outputs two conditional probabilities which are used to control subsequent prefetch decisions. A Prefetcher maintains a prediction table which helps to ascertain access patterns appearing around refresh operations. The prediction table is updated every time an access occurs to the to-be-nextrefreshed ran during the observational window and is consulted to decide which cache lines are prefetched. Extensive evaluation results with benchmarks from SPEC CPU2006 on a DDR4 memory have demonstrated that with ROP memory performance can be improved by up to 9.2% (3.3% on average) for singlecore simulations, while reducing the overall memory energy by up to 6.7% (3.6% on average), relative to an auto-refresh baseline memory. Moreover, it increases the Weighted Speedup by up to 2.22X (1.32X on average) for 4-core multiprogram simulations, while reducing energy by up to 48.8% (24.4% on average).","PeriodicalId":409991,"journal":{"name":"2016 45th International Conference on Parallel Processing (ICPP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128408657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Scalable Hierarchical Polyhedral Compilation 可扩展的分层多面体编译
2016 45th International Conference on Parallel Processing (ICPP) Pub Date : 2016-08-01 DOI: 10.1109/ICPP.2016.56
B. Pradelle, Benoît Meister, M. Baskaran, A. Konstantinidis, Thomas Henretty, R. Lethin
{"title":"Scalable Hierarchical Polyhedral Compilation","authors":"B. Pradelle, Benoît Meister, M. Baskaran, A. Konstantinidis, Thomas Henretty, R. Lethin","doi":"10.1109/ICPP.2016.56","DOIUrl":"https://doi.org/10.1109/ICPP.2016.56","url":null,"abstract":"Computers across the board, from embedded to future exascale computers, are consistently designed with deeper memory hierarchies. While this opens up exciting opportunities for improving software performance and energy efficiency, it also makes it increasingly difficult to efficiently exploit the hardware. Advanced compilation techniques are a possible solution to this difficult problem and, among them, the polyhedral compilation technology provides a pathway for performing advanced automatic parallelization and code transformations. However, the polyhedral model is also known for its poor scalability with respect to the number of dimensions in the polyhedra that are used for representing programs. Although current compilers can cope with such limitation when targeting shallow hierarchies, polyhedral optimizations often become intractable as soon as deeper hardware hierarchies are considered. We address this problem by introducing two new operators for polyhedral compilers: focalisation and defocalisation. When applied in the compilation flow, the new operators reduce the dimensionality of polyhedra, which drastically simplifies the mathematical problems solved during the compilation. We prove that the presented operators preserve the original program semantics, allowing them to be safely used in compilers. We implemented the operators in a production compiler, which drastically improved its performance and scalability when targeting deep hierarchies.","PeriodicalId":409991,"journal":{"name":"2016 45th International Conference on Parallel Processing (ICPP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131069850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Criticality-Aware Partitioning for Multicore Mixed-Criticality Systems 多核混合临界系统的临界感知分区
2016 45th International Conference on Parallel Processing (ICPP) Pub Date : 2016-08-01 DOI: 10.1109/ICPP.2016.33
Jianjun Han, Xin Tao, Dakai Zhu, Hakan Aydin
{"title":"Criticality-Aware Partitioning for Multicore Mixed-Criticality Systems","authors":"Jianjun Han, Xin Tao, Dakai Zhu, Hakan Aydin","doi":"10.1109/ICPP.2016.33","DOIUrl":"https://doi.org/10.1109/ICPP.2016.33","url":null,"abstract":"The scheduling for mixed-criticality (MC) systems, where multiple activities have different certification requirements and thus different criticality on a shared hardware platform, has recently become an important research focus. In this work, considering that multicore processors have emerged as the de-facto platform for modern embedded systems, we propose a novel and efficient criticality-aware task partitioning algorithm (CA-TPA) for a set of periodic MC tasks running on multicore systems. We employ the state-of-the art EDF-VD scheduler on each core. Our work is based on the observation that the utilizations of MC tasks at different criticality levels can have quite large variations, hence when a task is allocated, its utilization contribution on different processors may vary by large margins and this can significantly affect the schedulability of tasks. During partitioning, CA-TPA sorts the tasks according to their utilization contributions on individual processors. Several heuristics are investigated to balance the workload on processors with the objective of improving the schedulability of tasks under CA-TPA. The simulation results show that our proposed CA-TPA scheme is effective, giving much higher schedulability ratios when compared to the classical partitioning schemes.","PeriodicalId":409991,"journal":{"name":"2016 45th International Conference on Parallel Processing (ICPP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123542310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
MPI Overlap: Benchmark and Analysis MPI重叠:基准和分析
2016 45th International Conference on Parallel Processing (ICPP) Pub Date : 2016-08-01 DOI: 10.1109/ICPP.2016.37
Alexandre Denis, François Trahay
{"title":"MPI Overlap: Benchmark and Analysis","authors":"Alexandre Denis, François Trahay","doi":"10.1109/ICPP.2016.37","DOIUrl":"https://doi.org/10.1109/ICPP.2016.37","url":null,"abstract":"In HPC applications, one of the major overhead compared to sequential code, is communication cost. Application programmers often amortize this cost by overlapping communications with computation. To do so, they post a non-blocking MPI request, perform computation, and wait for communication completion, assuming MPI communication will progress in background. In this paper, we propose to measure what really happens when trying to overlap non-blocking point-to-point communications with computation. We explain how background progression works, we describe relevant test cases, we identify challenges for a benchmark, then we propose a benchmark suite to measure how much overlap happen in various cases. We exhibit overlap benchmark results on a wide panel of MPI libraries and hardware platforms. Finally, we classify, analyze, and explain the results using low-level traces to reveal the internal behavior of the MPI library.","PeriodicalId":409991,"journal":{"name":"2016 45th International Conference on Parallel Processing (ICPP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122794222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
An Efficient Wireless Power Transfer System to Balance the State of Charge of Electric Vehicles 一种高效的电动汽车充电平衡无线传输系统
2016 45th International Conference on Parallel Processing (ICPP) Pub Date : 2016-08-01 DOI: 10.1109/ICPP.2016.44
Ankur Sarker, Chenxi Qiu, Haiying Shen, A. Gil, J. Taiber, M. Chowdhury, Jim Martin, Mac Devine, A. J. Rindos
{"title":"An Efficient Wireless Power Transfer System to Balance the State of Charge of Electric Vehicles","authors":"Ankur Sarker, Chenxi Qiu, Haiying Shen, A. Gil, J. Taiber, M. Chowdhury, Jim Martin, Mac Devine, A. J. Rindos","doi":"10.1109/ICPP.2016.44","DOIUrl":"https://doi.org/10.1109/ICPP.2016.44","url":null,"abstract":"As an alternate form in the road transportation system, electric vehicle (EV) can help reduce the fossil-fuel consumption. However, the usage of EVs is constrained by the limited capacity of battery. Wireless Power Transfer (WPT) can increase the driving range of EVs by charging EVs in motion when they drive through a wireless charging lane embedded in a road. The amount of power that can be supplied by a charging lane at a time is limited. A problem here is when a large number of EVs pass a charging lane, how to efficiently distribute the power among different penetrations levels of EVs? However, there has been no previous research devoted to tackling this challenge. To handle this challenge, we propose a system to balance the State of Charge (called BSoC) among the EVs. It consists of three components: i) fog-based power distribution architecture, ii) power scheduling model, and iii) efficient vehicle-to-fog communication protocol. The fog computing center collects information from EVs and schedules the power distribution. We use fog closer to vehicles rather than cloud in order to reduce the communication latency. The power scheduling model schedules the power allocated to each EV. In order to avoid network congestion between EVs and the fog, we let vehicles choose their own communication channel to communicate with local controllers. Finally, we evaluate our system using extensive simulation studies in Network Simulator-3, MatLab, and Simulation for Urban MObility tools, and the experimental results confirm the efficiency of our system.","PeriodicalId":409991,"journal":{"name":"2016 45th International Conference on Parallel Processing (ICPP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133248891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Parallel Two-Dimensional Unstructured Anisotropic Delaunay Mesh Generation of Complex Domains for Aerospace Applications 航空航天应用中平行二维非结构各向异性Delaunay网格生成
2016 45th International Conference on Parallel Processing (ICPP) Pub Date : 2016-08-01 DOI: 10.1109/ICPP.2016.76
Juliette Pardue, Andrey N. Chernikov
{"title":"Parallel Two-Dimensional Unstructured Anisotropic Delaunay Mesh Generation of Complex Domains for Aerospace Applications","authors":"Juliette Pardue, Andrey N. Chernikov","doi":"10.1109/ICPP.2016.76","DOIUrl":"https://doi.org/10.1109/ICPP.2016.76","url":null,"abstract":"In this paper, we present a bottom-up approach to parallel anisotropic mesh generation by building a mesh generator from principles. Applications focusing on high-lift design or dynamic stall, or numerical methods and modeling test cases still focus on the two-dimensions. Our push-button parallel mesh generation approach can generate high-fidelity unstructured meshes with anisotropic boundary layers for use in the computational fluid dynamics field. The anisotropy requirement adds a level of complexity to a parallel meshing algorithm by making computation depend on the local alignment of elements, which in turn is dictated by geometric boundaries and the density functions. Our experimental results show 70% parallel efficiency over the fastest sequential isotropic mesh generator on 256 distributed memory nodes.","PeriodicalId":409991,"journal":{"name":"2016 45th International Conference on Parallel Processing (ICPP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131406071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Managing I/O Interference in a Shared Burst Buffer System 在共享突发缓冲系统中管理I/O干扰
2016 45th International Conference on Parallel Processing (ICPP) Pub Date : 2016-08-01 DOI: 10.1109/ICPP.2016.54
Sagar Thapaliya, P. Bangalore, J. Lofstead, K. Mohror, A. Moody
{"title":"Managing I/O Interference in a Shared Burst Buffer System","authors":"Sagar Thapaliya, P. Bangalore, J. Lofstead, K. Mohror, A. Moody","doi":"10.1109/ICPP.2016.54","DOIUrl":"https://doi.org/10.1109/ICPP.2016.54","url":null,"abstract":"In this work, we investigate the problem of inter-application interference in a shared Burst Buffer (BB) system. A BB is a new storage technology for HPC architectures that acts as an intermediate layer between performance-hungry HPC applications and the slow parallel file system. While the BB is meant to alleviate the problem of slow I/O in HPC systems, it is itself prone to performance degradation under interference. We observe that the magnitude of interference effects can reach a level that matters to the HPC system and the jobs that run on it. We investigate I/O scheduling techniques as a mechanism to mitigate BB I/O interference. With our results, we show that scheduling techniques tuned to BBs can control interference and significant performance benefits can be achieved.","PeriodicalId":409991,"journal":{"name":"2016 45th International Conference on Parallel Processing (ICPP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123194990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
High Performance Parallel Algorithms for the Tucker Decomposition of Sparse Tensors 稀疏张量Tucker分解的高性能并行算法
2016 45th International Conference on Parallel Processing (ICPP) Pub Date : 2016-08-01 DOI: 10.1109/ICPP.2016.19
O. Kaya, B. Uçar
{"title":"High Performance Parallel Algorithms for the Tucker Decomposition of Sparse Tensors","authors":"O. Kaya, B. Uçar","doi":"10.1109/ICPP.2016.19","DOIUrl":"https://doi.org/10.1109/ICPP.2016.19","url":null,"abstract":"We investigate an efficient parallelization of a class of algorithms for the well-known Tucker decomposition of general N-dimensional sparse tensors. The targeted algorithms are iterative and use the alternating least squares method. At each iteration, for each dimension of an N-dimensional input tensor, the following operations are performed: (i) the tensor is multiplied with (N - 1) matrices (TTMc step), (ii) the product is then converted to a matrix, and (iii) a few leading left singular vectors of the resulting matrix are computed (TRSVD step) to update one of the matrices for the next TTMc step. We propose an efficient parallelization of these algorithms for the current parallel platforms with multicore nodes. We discuss a set of preprocessing steps which takes all computational decisions out of the main iteration of the algorithm and provides an intuitive shared-memory parallelism for the TTM and TRSVD steps. We propose a coarse and a fine-grain parallel algorithm in a distributed memory environment, investigate data dependencies, and identify efficient communication schemes. We demonstrate how the computation of singular vectors in the TRSVD step can be carried out efficiently following the TTMc step. Finally, we develop a hybrid MPI-OpenMP implementation of the overall algorithm and report scalability results on up to 4096 cores on 256 nodes of an IBM BlueGene/Q supercomputer.","PeriodicalId":409991,"journal":{"name":"2016 45th International Conference on Parallel Processing (ICPP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123619515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 55
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信