2011 International Conference on Parallel Processing最新文献_第2页

Video-Like Compression for High Efficiency Database Storage of Wireless Sensor Networks 面向无线传感器网络数据库高效存储的类视频压缩

2011 International Conference on Parallel Processing Pub Date : 2011-09-13 DOI: 10.1109/ICPP.2011.9

Niang-Ying Huang, Chung-Yuan Su, Chi-Cheng Chuang, R.-I. Chang

{"title":"Video-Like Compression for High Efficiency Database Storage of Wireless Sensor Networks","authors":"Niang-Ying Huang, Chung-Yuan Su, Chi-Cheng Chuang, R.-I. Chang","doi":"10.1109/ICPP.2011.9","DOIUrl":"https://doi.org/10.1109/ICPP.2011.9","url":null,"abstract":"Wireless Sensor Networks (WSNs) consist of group sensor nodes which are placed in an area to monitor the changes of environment. Usually, sensing data are gathered and stored in a data server which maintains a database to organize and manage numerous of WSNs data. It allows researchers to retrieve these data for further study or analysis. Since the size of WSNs data is huge and the storage resource is limited, this database needs compression to lower the data size. In this paper, we propose a video-like compression method for high efficiency database storage of WSNs. First, the raw data are arranged according to the spatial correlation as an image frame. Then, several image frames with temporal correlation are maintained as a sequence of frames and lossless video compression is adopt for lowering the data size. Based on this idea, we also propose a data retrieve/query algorithm for parallel processing. The trade-off between space saving and query time is discussed after experiencing with real-world data. At last, we compare our proposed method to MySQL, a well-known database which compression is supported. The experimental results reveal that our method achieves over 96% of the space savings. It is over 13% more than that achieved by MySQL.","PeriodicalId":115365,"journal":{"name":"2011 International Conference on Parallel Processing","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121781588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Virtual Machine Provisioning Based on Analytical Performance and QoS in Cloud Computing Environments 云计算环境下基于分析性能和QoS的虚拟机发放

2011 International Conference on Parallel Processing Pub Date : 2011-09-13 DOI: 10.1109/ICPP.2011.17

R. Calheiros, R. Ranjan, R. Buyya

{"title":"Virtual Machine Provisioning Based on Analytical Performance and QoS in Cloud Computing Environments","authors":"R. Calheiros, R. Ranjan, R. Buyya","doi":"10.1109/ICPP.2011.17","DOIUrl":"https://doi.org/10.1109/ICPP.2011.17","url":null,"abstract":"Cloud computing is the latest computing paradigm that delivers IT resources as services in which users are free from the burden of worrying about the low-level implementation or system administration details. However, there are significant problems that exist with regard to efficient provisioning and delivery of applications using Cloud-based IT resources. These barriers concern various levels such as workload modeling, virtualization, performance modeling, deployment, and monitoring of applications on virtualized IT resources. If these problems can be solved, then applications can operate more efficiently, with reduced financial and environmental costs, reduced under-utilization of resources, and better performance at times of peak load. In this paper, we present a provisioning technique that automatically adapts to workload changes related to applications for facilitating the adaptive management of system and offering end-users guaranteed Quality of Services (QoS) in large, autonomous, and highly dynamic environments. We model the behavior and performance of applications and Cloud-based IT resources to adaptively serve end-user requests. To improve the efficiency of the system, we use analytical performance (queueing network system model) and workload information to supply intelligent input about system requirements to an application provisioner with limited information about the physical infrastructure. Our simulation-based experimental results using production workload models indicate that the proposed provisioning technique detects changes in workload intensity (arrival pattern, resource demands) that occur over time and allocates multiple virtualized IT resources accordingly to achieve application QoS targets.","PeriodicalId":115365,"journal":{"name":"2011 International Conference on Parallel Processing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127709147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 297

CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart 用于通用检查点/重启的轻量级用户级文件系统

2011 International Conference on Parallel Processing Pub Date : 2011-09-13 DOI: 10.1109/ICPP.2011.85

Xiangyong Ouyang, R. Rajachandrasekar, Xavier Besseron, Hao Wang, Jian Huang, D. Panda

{"title":"CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart","authors":"Xiangyong Ouyang, R. Rajachandrasekar, Xavier Besseron, Hao Wang, Jian Huang, D. Panda","doi":"10.1109/ICPP.2011.85","DOIUrl":"https://doi.org/10.1109/ICPP.2011.85","url":null,"abstract":"Checkpoint/Restart (C/R) mechanisms have been widely adopted by many MPI libraries [1 -- 3] to achieve fault-tolerance. However, a major limitation of such mechanisms is the intensive IO bottleneck caused by the need to dump the snapshots of all processes into persistent storage. Several studies have been conducted to minimize this overhead [4, 5], but most of these proposed optimizations are performed inside specific MPI stack or check pointing library or applications, hence they are not portable enough to be applied to other MPI stacks and applications. In this paper, we propose a filesystem based approach to alleviate this checkpoint IO bottleneck. We propose a new filesystem, named Checkpoint-Restart File system (CRFS), which is a lightweight user-level filesystem based on FUSE (File system in User space). CRFS is designed with Checkpoint/Restart I/O traffic in mind to efficiently handle the concurrent write requests. Any software component using standard filesystem interfaces can transparently benefit from CRFS's capabilities. CRFS intercepts the checkpoint file write system calls and aggregates them into fewer bigger chunks which are asynchronously written to the underlying filesystem for more efficient IO. CRFS manages a ?exible internal IO thread pool to throttle concurrent IO to alleviate IO contention for better IO performance. CRFS can be mounted over any standard filesystem like ext3, NFS and Lustre. We have implemented CRFS and evaluated its performance using three popular C/R capable MPI stacks: MVAPICH2, MPICH2 and OpenMPI. Experimental results show significant performance gains for all three MPI stacks. CRFS achieves up to 5.5X speedup in checkpoint writing performance to Lustre filesystem. Similar level of improvements are also obtained with ext3 and NFS filesystems. To the best of our knowledge, this is the first such portable and light-weight filesystem designed for generic Checkpoint/Restart data.","PeriodicalId":115365,"journal":{"name":"2011 International Conference on Parallel Processing","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131872227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Optimal Data Allocation for Scratch-Pad Memory on Embedded Multi-core Systems 嵌入式多核系统刮刮板存储器的优化数据分配

2011 International Conference on Parallel Processing Pub Date : 2011-09-13 DOI: 10.1109/ICPP.2011.79

Yibo Guo, Qingfeng Zhuge, J. Hu, Meikang Qiu, E. Sha

引用次数: 43

A Scalable Tridiagonal Solver for GPUs 一个可扩展的gpu三对角线求解器

2011 International Conference on Parallel Processing Pub Date : 2011-09-13 DOI: 10.1109/ICPP.2011.41

Heehoon Kim, Shengzhao Wu, Li-Wen Chang, Wen-mei W. Hwu

引用次数: 61

Kernel Assisted Collective Intra-node MPI Communication among Multi-Core and Many-Core CPUs 内核辅助的多核和多核cpu间节点内MPI通信

2011 International Conference on Parallel Processing Pub Date : 2011-09-13 DOI: 10.1109/ICPP.2011.29

Teng Ma, G. Bosilca, Aurélien Bouteiller, Brice Goglin, J. Squyres, J. Dongarra

{"title":"Kernel Assisted Collective Intra-node MPI Communication among Multi-Core and Many-Core CPUs","authors":"Teng Ma, G. Bosilca, Aurélien Bouteiller, Brice Goglin, J. Squyres, J. Dongarra","doi":"10.1109/ICPP.2011.29","DOIUrl":"https://doi.org/10.1109/ICPP.2011.29","url":null,"abstract":"Shared memory is among the most common approaches to implementing message passing within multicorenodes. However, current shared memory techniques donot scale with increasing numbers of cores and expanding memory hierarchies -- most notably when handling large data transfers and collective communication. Neglecting the underlying hardware topology, using copy-in/copy-out memory transfer operations, and overloading the memory subsystem using one-to-many types of operations are some of the most common mistakes in today's shared memory implementations. Unfortunately, they all negatively impact the performance and scalability of MPI libraries -- and therefore applications. In this paper, we present several kernel-assisted intra-node collective communication techniques that address these three issues on many-core systems. We also present a new OpenMPI collective communication component that uses the KNEMLinux module for direct inter-process memory copying. Our Open MPI component implements several novel strategies to decrease the number of intermediate memory copies and improve data locality in order to diminish both cache pollution and memory pressure. Experimental results show that our KNEM-enabled Open MPI collective component can outperform state-of-art MPI libraries (Open MPI and MPICH2) on synthetic benchmarks, resulting in a significant improvement for a typical graph application.","PeriodicalId":115365,"journal":{"name":"2011 International Conference on Parallel Processing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128275489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 44

OCL-BodyScan: A Case Study for Application-centric Programming of Many-Core Processors OCL-BodyScan:以应用为中心的多核处理器编程案例研究

2011 International Conference on Parallel Processing Pub Date : 2011-09-13 DOI: 10.1109/ICPP.2011.89

M. Raskovic, A. Varbanescu, W. Vlothuizen, M. Ditzel, H. Sips

{"title":"OCL-BodyScan: A Case Study for Application-centric Programming of Many-Core Processors","authors":"M. Raskovic, A. Varbanescu, W. Vlothuizen, M. Ditzel, H. Sips","doi":"10.1109/ICPP.2011.89","DOIUrl":"https://doi.org/10.1109/ICPP.2011.89","url":null,"abstract":"Application development for many-core processors is predominately hardware-centric: programmers design, implement, and optimize applications for a pre-chosen target platform. While this approach may deliver very good performance, it lacks portability, being inefficient for applications that aim to use multiple architectures or large-scale parallel platforms with heterogeneous many-core nodes. In this work, we focus on application portability. Therefore, we propose an application-centric approach for developing parallel workloads for many-cores, and we make use of OpenCL to preserve portability until the very last optimization stages. We validate our application-centric approach using 3D body scan, a data intensive application with soft real-time constraints. Thus, we design and implement OCL-body scan (the portable OpenCL-based version of 3D Body scan), and we evaluate its performance on three families of platforms - general purpose multi-cores, graphical processing units, and the Cell/B.E.. Our experiments show that our application-centric strategy enables portability and leads to good performance results. Additionally, typical platform-specific optimizations can be applied in the final implementation stages, leading to performance results similar to those obtained using the native tool-chains.","PeriodicalId":115365,"journal":{"name":"2011 International Conference on Parallel Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122351313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

GSNP: A DNA Single-Nucleotide Polymorphism Detection System with GPU Acceleration GSNP:基于GPU加速的DNA单核苷酸多态性检测系统

2011 International Conference on Parallel Processing Pub Date : 2011-09-13 DOI: 10.1109/ICPP.2011.51

Mian Lu, Jiuxin Zhao, Qiong Luo, Bingqiang Wang, Shaohua Fu, Zhe Lin

引用次数: 21

Implications of Merging Phases on Scalability of Multi-core Architectures 合并阶段对多核体系结构可扩展性的影响

2011 International Conference on Parallel Processing Pub Date : 2011-09-13 DOI: 10.1109/ICPP.2011.74

M. Manivannan, B. Juurlink, P. Stenström

{"title":"Implications of Merging Phases on Scalability of Multi-core Architectures","authors":"M. Manivannan, B. Juurlink, P. Stenström","doi":"10.1109/ICPP.2011.74","DOIUrl":"https://doi.org/10.1109/ICPP.2011.74","url":null,"abstract":"Amdahl's Law dictates that in parallel applications serial sections establish an upper limit on the scalability. Asymmetric chip multiprocessors with a large core in addition to several small cores have been advocated for recently as a promising design paradigm because the large core can accelerate the execution of serial sections and hence mitigate the scalability bottlenecks due to large serial sections. This paper studies the scalability of a set of data mining workloads that have negligible serial sections. The formulation of Amdahl's Law, that optimistically assumes constant serial sections, estimates these workloads to scale to hundreds of cores in a chip multiprocessor (CMP). However the overhead in carrying out merging (or reduction) operations makes scalability to peak at lesser number. We establish this by extending theAmdahl's speedup model to factor in the impact of reduction operations on the speedup of applications on symmetric as well as asymmetric CMP designs. Our analytical model estimates that asymmetric CMPs with one large and many tiny cores are only optimal for applications with a low reduction overhead. However, as the overhead starts to increase, the balance is shifted towards using fewer but more capable cores. This eventually limits the performance advantage of asymmetric over symmetric CMPs.","PeriodicalId":115365,"journal":{"name":"2011 International Conference on Parallel Processing","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133431028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Eager Meets Lazy: The Impact of Write-Buffering on Hardware Transactional Memory 急切遇到懒惰:写缓冲对硬件事务性内存的影响

2011 International Conference on Parallel Processing Pub Date : 2011-09-13 DOI: 10.1109/ICPP.2011.63

A. Negi, J. Gil, M. Acacio, José M. García, P. Stenström

{"title":"Eager Meets Lazy: The Impact of Write-Buffering on Hardware Transactional Memory","authors":"A. Negi, J. Gil, M. Acacio, José M. García, P. Stenström","doi":"10.1109/ICPP.2011.63","DOIUrl":"https://doi.org/10.1109/ICPP.2011.63","url":null,"abstract":"Hardware transactional memory (HTM) systems have been studied extensively along the dimensions of speculative versioning and contention management policies. The relative performance of several designs policies has been discussed at length in prior work within the framework of scalable chip-multiprocessing systems. Yet, the impact of simple structural optimizations like write-buffering has not been investigated and performance deviations due to the presence or absence of these optimizations remains unclear. This lack of insight into the effective use and impact of these interfacial structures between the processor core and the coherent memory hierarchy forms the crux of the problem we study in this paper. Through detailed modeling of various write-buffering configurations we show that they play a major role in determining the overall performance of a practical HTM system. Our study of both eager and lazy conflict resolution mechanisms in a scalable parallel architecture notes a remarkable convergence of the performance of these two diametrically opposite design points when write buffers are introduced and used well to support the common case. Mitigation of redundant actions, fewer invalidations on abort, latency-hiding and prefetch effects contribute towards reducing execution times for transactions. Shorter transaction durations also imply a lower contention probability, thereby amplifying gains even further. The insights, related to the interplay between buffering mechanisms, system policies and workload characteristics, contained in this paper clearly distinguish gains in performance to be had from write-buffering from those that can be ascribed to HTM policy. We believe that this information would facilitate sound design decisions when incorporating HTMs into parallel architectures.","PeriodicalId":115365,"journal":{"name":"2011 International Conference on Parallel Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115609581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10