2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)最新文献

Using surrogate-based modeling to predict optimal I/O parameters of applications at the extreme scale 使用基于代理的建模来预测极端规模下应用程序的最佳I/O参数

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS) Pub Date : 2014-12-01 DOI: 10.1109/PADSW.2014.7097855

Michael Matheny, Stephen Herbein, N. Podhorszki, S. Klasky, M. Taufer

引用次数: 8

Achieving cost effective cloud video services via fine grained multicore scheduling 通过细粒度多核调度实现低成本的云视频服务

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS) Pub Date : 2014-12-01 DOI: 10.1109/PADSW.2014.7097843

Hao-Che Kao, Hao-Ping Kang, Che-Rung Lee, Kun-Hsien Lu, Shu-Hsin Chang

引用次数: 0

Combine thread with memory scheduling for maximizing performance in multi-core systems 将线程与内存调度相结合，以在多核系统中最大化性能

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS) Pub Date : 2014-12-01 DOI: 10.1109/PADSW.2014.7097821

Gangyong Jia, Guangjie Han, Liang Shi, Jian Wan, Dong Dai

{"title":"Combine thread with memory scheduling for maximizing performance in multi-core systems","authors":"Gangyong Jia, Guangjie Han, Liang Shi, Jian Wan, Dong Dai","doi":"10.1109/PADSW.2014.7097821","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097821","url":null,"abstract":"The growing gap between microprocessor speed and DRAM speed is a major problem that computer designers are facing. In order to narrow the gap, it is necessary to improve DRAM's speed and throughput. Moreover, on multi-core platforms, DRAM memory shared by all cores usually suffers from the memory contention and interference problem, which can cause serious performance degradation and unfairness among parallel running threads. To address these problems, this paper proposes techniques to take both advantages of partitioning cores, threads and memory banks into groups to reduce interference among different groups and grouping the memory accesses of the same row together to reduce cache miss rate. A memory optimization framework combined thread scheduling with memory scheduling (CTMS) is proposed in this paper, which simultaneously minimizes memory access schedule length, memory access time and reduce interference to maximize performance for multi-core systems. Experimental results show CTMS is 12.6% shorter in memory access time, while improving 11.8% throughput on average. Moreover, CTMS also saves 5.8% of the energy consumption.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128050976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Building a large-scale direct network with low-radix routers 使用低基数路由器构建大规模直连网络

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS) Pub Date : 2014-12-01 DOI: 10.1109/PADSW.2014.7097830

Yong Su, Zheng Cao, Zhiguo Fan, Zhan Wang, Xiaoli Liu, Xiaobing Liu, Li Qiang, Xuejun An, Ninghui Sun

引用次数: 1

Heterogeneous CPU-GPU computing for the finite volume method on 3D unstructured meshes 三维非结构化网格有限体积法的异构CPU-GPU计算

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS) Pub Date : 2014-12-01 DOI: 10.1109/PADSW.2014.7097808

J. Langguth, Xing Cai

{"title":"Heterogeneous CPU-GPU computing for the finite volume method on 3D unstructured meshes","authors":"J. Langguth, Xing Cai","doi":"10.1109/PADSW.2014.7097808","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097808","url":null,"abstract":"A recent trend in modern high-performance computing environments is the introduction of accelerators such as GPU and Xeon Phi, i.e. specialized computing devices that are optimized for highly parallel applications and coexist with CPUs. In regular compute-intensive applications with predictable data access patterns, these devices often outperform traditional CPUs by far and thus relegate them to pure control functions instead of computations. For irregular applications however, the gap in relative performance can be much smaller, and sometimes even reversed. Thus, maximizing overall performance in such systems requires that full use of all available computational resources is made. In this paper we study the attainable performance of the cell-centered finite volume method on 3D unstructured tetrahedral meshes using heterogeneous systems consisting of CPUs and multiple GPUs. Finite volume methods are widely used numerical strategies for solving partial differential equations. The advantages of using finite volumes include built-in support for conservation laws and suitability for unstructured meshes. Our focus lies in demonstrating how a workload distribution that maximizes overall performance can be derived from the actual performance attained by the different computing devices in the heterogeneous environment. We also highlight the dual role of partitioning software in reordering and partitioning the input mesh, thus giving rise to a new combined approach to partitioning.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124347832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Performance analysis of HPC applications with irregular tree data structures 不规则树状数据结构的高性能计算应用性能分析

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS) Pub Date : 2014-12-01 DOI: 10.1109/PADSW.2014.7097837

A. Khawaja, Jiajun Wang, A. Gerstlauer, L. John, D. Malhotra, G. Biros

{"title":"Performance analysis of HPC applications with irregular tree data structures","authors":"A. Khawaja, Jiajun Wang, A. Gerstlauer, L. John, D. Malhotra, G. Biros","doi":"10.1109/PADSW.2014.7097837","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097837","url":null,"abstract":"Adaptive mesh refinement (AMR) numerical methods utilizing octree data structures are an important class of HPC applications, in particular the solution of partial differential equations. Much effort goes into the implementation of efficient versions of these types of programs, where the emphasis is often on increasing multi-node performance when utilizing GPUs and coprocessors. By contrast, our analysis aims to characterize these workloads on traditional CPUs, as we believe that single-threaded intra-node performance of critical kernels is still a key factor for achieving performance at scale. Especially irregular workloads such as AMR methods, however, exhibit severe underutilization on general purpose processors. In this paper, we analyze the single core performance of two state-of-the-art, highly scalable adaptive mesh refinement codes, one based on the Fast Multipole Method (FMM) and one based on the Finite Element Method (FEM), when running on a x86 CPU. We examined both scalar and vectorized implementations to identify performance bottlenecks. We demonstrate that vectorization can provide a significant benefit in achieving high performance. The greatest bottleneck to peak performance is the high fraction of non-floating point instructions in the kernels.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114345779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

ArPat: Accurate RFID reader positioning with mere boundary tags ArPat:仅用边界标签就能精确定位RFID阅读器

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS) Pub Date : 2014-12-01 DOI: 10.1109/PADSW.2014.7097901

Guanglian Liu, Shigeng Zhang, Jianxin Wang, Xuan Liu

{"title":"ArPat: Accurate RFID reader positioning with mere boundary tags","authors":"Guanglian Liu, Shigeng Zhang, Jianxin Wang, Xuan Liu","doi":"10.1109/PADSW.2014.7097901","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097901","url":null,"abstract":"The Radio Frequency IDentification (RFID) technology provides a promising solution to location discovery in indoor environments. Existing RFID reader positioning algorithms usually use all the collected reference tags to determine the position of the target reader, and thus are time-consuming as well as susceptible to the communication irregularity between the reader and reference tags. Especially, they usually perform poorly when the target reader is near the wall or at the corner. In this paper, we propose ArPat, an Accurate RFID reader Positioning algorithm that uses mere boundary reference Tags to calculate the position of the reader. ArPat uses only boundary tags to determine the position of the target reader, which effectively mitigates the negative impact of communication irregularity on the localization accuracy. The localization accuracy of ArPat is higher than 0.2 ft when the space between references tags is 1 ft. Compared with state-of-the-art solutions for RFID reader positioning, ArPat improves localization accuracy by up to 42 percent and 36 percent on average. Furthermore, it uses a geometric approach rather than iterative optimization approaches employed by previous solutions, making it superior in time efficiency. Compared with previous solutions, the computational time of ArPat is nearly two orders of magnitude less. This is critical for a localization system to provide real time location discovery and tracking services.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114629306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Atomic reduction based sparse matrix-transpose vector multiplication on GPUs gpu上基于稀疏矩阵转置向量乘法的原子约简

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS) Pub Date : 2014-12-01 DOI: 10.1109/PADSW.2014.7097920

Yuan Tao, Yangdong Deng, Shuai Mu, Mingfa Zhu, Limin Xiao, Li Ruan, Zhibin Huang

引用次数: 6

Design and analysis of software defined Vehicular Cyber Physical Systems 软件定义车辆网络物理系统的设计与分析

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS) Pub Date : 2014-12-01 DOI: 10.1109/PADSW.2014.7097836

P. Duan, Chao Peng, Qin Zhu, Jingmin Shi, Haibin Cai

{"title":"Design and analysis of software defined Vehicular Cyber Physical Systems","authors":"P. Duan, Chao Peng, Qin Zhu, Jingmin Shi, Haibin Cai","doi":"10.1109/PADSW.2014.7097836","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097836","url":null,"abstract":"VCPS (Vehicular Cyber Physical Systems) is a special kind of networked cyber physical system in which each vehicle is regarded as a communication unit. Vehicle's movement is restricted by road and environment in VCPS, while traditional random mobility model and waypoint mobility model cannot reflect the realistic vehicle traces. In VCPS, with the high speed of vehicles, the network topology undergoing tremendous changes all the time, which greatly undermines the stability of communication between vehicles. The diversity and complexity of traffic scenarios in VCPS have also increased the difficulty of designing an efficient and stable routing protocol. In this paper, we creatively combine SDN (Software Defined Networking) and VCPS together and propose a new VCPS communication architecture, which enable VCPS to be manageable by remote controller. SD-VCPS can flexibly change routing policies depending on different traffic scenes or traffic periods, adjusting the topology of VCPS to adapt to different network requirements. We further present a new location-based routing protocol for SD-VCPS, and corroborate the efficiency of our proposed framework by experiments using network simulator NS3.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121954679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Providing hybrid block storage for virtual machines using object-based storage 使用基于对象的存储为虚拟机提供混合块存储

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS) Pub Date : 2014-12-01 DOI: 10.1109/PADSW.2014.7097803

Sixiang Ma, Hao-peng Chen, Yuxi Shen, Heng Lu, Bin Wei, P. He

{"title":"Providing hybrid block storage for virtual machines using object-based storage","authors":"Sixiang Ma, Hao-peng Chen, Yuxi Shen, Heng Lu, Bin Wei, P. He","doi":"10.1109/PADSW.2014.7097803","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097803","url":null,"abstract":"This paper presents the design, implementation, and evaluation of a multi-tiered storage system called MOBBS, which provides hybrid block storage for Virtual Machines (VMs) on top of object-based storage infrastructure. MOBBS is mainly motivated by the gap between the lack of studies on hybrid block storage for VMs and the increasing prevalence of hybrid storage systems. By stripping disk images into partitions and intelligently storing them on different storage tiers according to real-time workload patterns, MOBBS achieves efficient use of multiple storage devices and relieves the burden of data placement. Leveraging the benefits of object-based storage, MOBBS is able to dynamically perform non-disruptive and fine-grained data migration between storage tiers and distribute the complexity of data migration across entire storage nodes. Such designs enable our system to deliver storage for VMs with high scalability and availability under an efficient use of SSDs. We evaluated a Ceph implementation of MOBBS using both block and file system workloads. The results comprehensively demonstrate MOBBS's effectiveness in performance improvement as well as efficient utilization of different storage devices.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116845534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8