2014 43rd International Conference on Parallel Processing最新文献_第2页

NetMaster: Taming Energy Devourers on Smartphones NetMaster:驯服智能手机上的能源消耗者

2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI: 10.1109/ICPP.2014.39

Yi Zhang, Yuan He, Xiaopei Wu, Yunhao Liu, Wenbo He

引用次数: 4

A Case for Resource Efficient Prefetching in Multicores 多核环境下资源高效预取的一个案例

2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI: 10.1109/ICPP.2014.19

Muneeb Khan, Andreas Sandberg, Erik Hagersten

{"title":"A Case for Resource Efficient Prefetching in Multicores","authors":"Muneeb Khan, Andreas Sandberg, Erik Hagersten","doi":"10.1109/ICPP.2014.19","DOIUrl":"https://doi.org/10.1109/ICPP.2014.19","url":null,"abstract":"Modern processors typically employ sophisticated prefetching techniques for hiding memory latency. Hardware prefetching has proven very effective and can speed up some SPEC CPU 2006 benchmarks by more than 40% when running in isolation. However, this speedup often comes at the cost of prefetching a significant volume of useless data (sometimes more than twice the data required) which wastes shared last level cache space and off-chip bandwidth. This paper explores how an accurate resource-efficient prefetching scheme can benefit performance by conserving shared resources in multicores. We present a framework that uses low-overhead runtime sampling and fast cache modeling to accurately identify memory instructions that frequently miss in the cache. We then use this information to automatically insert software prefetches in the application. Our prefetching scheme has good accuracy and employs cache bypassing whenever possible. These properties help reduce off-chip bandwidth consumption and last-level cache pollution. While single-thread performance remains comparable to hardware prefetching, the full advantage of the scheme is realized when several cores are used and demand for shared resources grows. We evaluate our method on two modern commodity multicores. Across 180 mixed workloads that fully utilize a multicore, the proposed software prefetching mechanism achieves up to 24% better throughput than hardware prefetching, and performs 10% better on average.","PeriodicalId":441115,"journal":{"name":"2014 43rd International Conference on Parallel Processing","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122270620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Fast Parallel Algorithms for Edge-Switching to Achieve a Target Visit Rate in Heterogeneous Graphs 异构图中实现目标访问速率的快速并行边交换算法

2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI: 10.1109/ICPP.2014.15

Md Hasanuzzaman Bhuiyan, Jiangzhuo Chen, Maleq Khan, M. Marathe

{"title":"Fast Parallel Algorithms for Edge-Switching to Achieve a Target Visit Rate in Heterogeneous Graphs","authors":"Md Hasanuzzaman Bhuiyan, Jiangzhuo Chen, Maleq Khan, M. Marathe","doi":"10.1109/ICPP.2014.15","DOIUrl":"https://doi.org/10.1109/ICPP.2014.15","url":null,"abstract":"An edge switch is an operation on a network (graph) where two edges are selected randomly and one of their end vertices are swapped with each other. Usually, a sequence of these operations are performed to generate network perturbations having the same degree sequence of the original network. Edge switch operations have important applications in graph theory and network analysis, such as in generating random networks with a given degree sequence, modeling and analyzing dynamic networks (e.g., peer-to-peer networks), studying various dynamic phenomena over a network (e.g., disease dynamics over a social contact network). The growth of real-world networks motivates the need to develop efficient parallel algorithms for performing a large sequence of edge switch operations. The dependencies among successive edge switch operations and the requirement of keeping the graph simple (i.e., no self-loops or parallel edges) as the edges are switched lead to significant challenges in designing a parallel algorithm. Addressing these challenges requires complex synchronization and communication among the processors. In this paper, we present a distributed memory parallel algorithm for switching edges in massive networks (networks with billions of edges) and achieve a speedup factor of 85 with 1024 processors. One of the steps in our edge switch algorithm requires the computation of multinomial random variables in parallel. The paper presents the first non-trivial parallel algorithm for the problem. The algorithm achieves a speedup of 925 using 1024 processors.","PeriodicalId":441115,"journal":{"name":"2014 43rd International Conference on Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131166646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Energy-Aware Scheduling for Aperiodic Tasks on Multi-core Processors 多核处理器上非周期任务的能量感知调度

2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI: 10.1109/ICPP.2014.45

Dawei Li, Jie Wu

{"title":"Energy-Aware Scheduling for Aperiodic Tasks on Multi-core Processors","authors":"Dawei Li, Jie Wu","doi":"10.1109/ICPP.2014.45","DOIUrl":"https://doi.org/10.1109/ICPP.2014.45","url":null,"abstract":"As the performance of modern multi-core processors increases, the energy consumption in these systems also increases significantly. Dynamic Voltage and Frequency Scaling (DVFS) is considered an efficient scheme for achieving the goal of saving energy. In this paper, we consider scheduling a set of independent aperiodic tasks, whose release times, deadlines and execution requirements are arbitrarily given, on DVFS-enabled multi-core processors. Our goal is to meet the execution requirements of all the tasks, and to minimize the overall energy consumption on the processor. Instead of seeking optimal solutions with high complexity, we aim to design lightweight algorithms suitable for real-time systems, with good performances. By applying a subinterval-based method, we come up with a simple algorithm to allocate tasks' available execution times during a heavily overlapped subinterval based on their desired execution requirement during that subinterval. Based on the allocated available execution times, we further consider the final frequency setting and task scheduling, which guarantee that all tasks meet their execution requirements, and tries to minimize the overall energy consumption. Extensive simulations for various platform and task characteristics and evaluations using a practical processor's power configuration indicate that our proposed algorithm has a good performance in terms of saving processor energy, though it has low complexity. Besides, the proposed algorithm is easy to be implemented in practical systems.","PeriodicalId":441115,"journal":{"name":"2014 43rd International Conference on Parallel Processing","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132824159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Scaling the ISAM Land Surface Model through Parallelization of Inter-component Data Transfer 基于组件间数据传输并行化的ISAM地表模型缩放

2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI: 10.1109/ICPP.2014.51

P. Miller, Michael P. Robson, B. El-Masri, R. Barman, G. Zheng, Atul K. Jain, L. Kalé

引用次数: 2

High-Performance Inverse Modeling with Reverse Monte Carlo Simulations 高性能的反向建模与反向蒙特卡罗模拟

2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI: 10.1109/ICPP.2014.29

Abhinav Sarje, X. Li, A. Hexemer

{"title":"High-Performance Inverse Modeling with Reverse Monte Carlo Simulations","authors":"Abhinav Sarje, X. Li, A. Hexemer","doi":"10.1109/ICPP.2014.29","DOIUrl":"https://doi.org/10.1109/ICPP.2014.29","url":null,"abstract":"In the field of nanoparticle material science, X-ray scattering techniques are widely used for characterization of macromolecules and particle systems (ordered, partially-ordered or custom) based on their structural properties at the micro- and nano-scales. Numerous applications utilize these, including design and fabrication of energy-relevant nanodevices such as photovoltaic and energy storage devices. Due to its size, analysis of raw data obtained through present ultra-fast light beamlines and X-ray scattering detectors has been a primary bottleneck in such characterization processes. To address this hurdle, we are developing high-performance parallel algorithms and codes for analysis of X-ray scattering data for several of the scattering methods, such as the Small Angle X-ray Scattering (SAXS), which we talk about in this paper. As an inverse modeling problem, structural fitting of the raw data obtained through SAXS experiments is a method used for extracting meaningful information on the structural properties of materials. Such fitting processes involve a large number of variable parameters and, hence, require a large amount of computational power. In this paper, we focus on this problem and present a high-performance and scalable parallel solution based on the Reverse Monte Carlo simulation algorithm, on highly-parallel systems such as clusters of multicore CPUs and graphics processors. We have implemented and optimized our algorithm on generic multi-core CPUs as well as the Nvidia GPU architectures with C++ and CUDA. We also present detailed performance results and computational analysis of our code.","PeriodicalId":441115,"journal":{"name":"2014 43rd International Conference on Parallel Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128006790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

A Sampling-Based Hybrid Approximate Query Processing System in the Cloud 基于采样的云混合近似查询处理系统

2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI: 10.1109/ICPP.2014.38

Yuxiang Wang, Junzhou Luo, Aibo Song, Fang Dong

{"title":"A Sampling-Based Hybrid Approximate Query Processing System in the Cloud","authors":"Yuxiang Wang, Junzhou Luo, Aibo Song, Fang Dong","doi":"10.1109/ICPP.2014.38","DOIUrl":"https://doi.org/10.1109/ICPP.2014.38","url":null,"abstract":"Sampling-based approximate query processing method provides the way, in which the users can save their time and resources for 'Big Data' analytical applications, if the estimated results can satisfy the accuracy expectation earlier before a long wait for the final accurate results. Online aggregation (OLA) is such an attractive technology to respond aggregation queries by calculating approximate results with the confidence interval getting tighter over time. It has been built into the MapReuduce-based cloud system for big data analytics, which allows users to monitor the query progress and save money by killing the computation earlier once sufficient accuracy has been obtained. Unfortunately, there exists a major obstacle that is the estimation failure of OLA affects the OLA performance, which is resulted from the biased sample set that violates the unbiased assumption of OLA sampling. To handle this problem, we first propose a hybrid approximate query processing model to improve the overall OLA performance, where a dynamic scheme switching mechanism is deliberately designed to switch unpromising OLA queries into the bootstrap scheme for further processing, avoiding the whole dataset scanning resulted from the OLA estimation failure. In addition, we also present a progressive estimation method to reduce the false positive ratio of our dynamic scheme switching mechanism. Moreover, we have implemented our hybrid approximate query processing system in Hadoop, and conducted extensive experiments on the TPC-H benchmark for skewed data distribution. Our results demonstrate that our hybrid system can produce acceptable approximate results within a time period one order of magnitude shorter compared to the original OLA over Hadoop.","PeriodicalId":441115,"journal":{"name":"2014 43rd International Conference on Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128522059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Crystal: A Design-Time Resource Partitioning Method for Hybrid Main Memory 一种混合主存的设计时资源划分方法

2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI: 10.1109/ICPP.2014.18

Dmitry Knyaginin, G. Gaydadjiev, P. Stenström

{"title":"Crystal: A Design-Time Resource Partitioning Method for Hybrid Main Memory","authors":"Dmitry Knyaginin, G. Gaydadjiev, P. Stenström","doi":"10.1109/ICPP.2014.18","DOIUrl":"https://doi.org/10.1109/ICPP.2014.18","url":null,"abstract":"Non-Volatile Memory (NVM) technologies can be used to reduce system-level execution time, energy, or cost but they add a new design dimension. Finding the best amounts of DRAM and NVM in hybrid main memory systems is a nontrivial design-time issue, the best solution to which depends on many factors. Such resource partitioning between DRAM and NVM can be framed as an optimization problem where the minimum of a target metric is sought, trends matter more than absolute values, and thus the precision of detailed modeling is overkill. Here we present Crystal, an analytic approach to early and rapid design-time resource partitioning of hybrid main memories. Crystal provides first-order estimates of system-level execution time and energy, sufficient to enable exhaustive search of the best amount and type of NVM for given workloads and partitioning goals. Crystal thus helps system designers to quickly find the most promising hybrid configurations for detailed evaluation. E.g., Crystal shows how for specific workloads higher system-level performance and energy efficiency can be achieved by employing an NVM with the speed and energy consumption of NAND Flash instead of a much faster and more energy efficient NVM like phase-change memory.","PeriodicalId":441115,"journal":{"name":"2014 43rd International Conference on Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128831332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

An Infrastructure-less Vehicle Counting without Disruption 无基础设施、无中断的车辆计数

2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI: 10.1109/ICPP.2014.61

Jie Wu, Paul Sabatino, J. Tsan, Zhen Jiang

引用次数: 0

An Energy-Efficient Task Scheduler for Multi-core Platforms with Per-core DVFS Based on Task Characteristics 基于任务特征的单核DVFS多核平台节能任务调度

2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI: 10.1109/ICPP.2014.47

Ching-Chi Lin, Chao-Jui Chang, You-Cheng Syu, Jan-Jan Wu, Pangfeng Liu, Po-Wen Cheng, W. Hsu

{"title":"An Energy-Efficient Task Scheduler for Multi-core Platforms with Per-core DVFS Based on Task Characteristics","authors":"Ching-Chi Lin, Chao-Jui Chang, You-Cheng Syu, Jan-Jan Wu, Pangfeng Liu, Po-Wen Cheng, W. Hsu","doi":"10.1109/ICPP.2014.47","DOIUrl":"https://doi.org/10.1109/ICPP.2014.47","url":null,"abstract":"Energy-efficient task scheduling is a fundamental issue in many application domains, such as energy conservation for mobile devices and the operation of green computing data centers. Modern processors support dynamic voltage and frequency scaling (DVFS) on a per-core basis, i.e., the CPU can adjust the voltage or frequency of each core. As a result, the core in a processor may have different computing power and energy consumption. To conserve energy in multi-core platforms, we propose task scheduling algorithms that leverage per-core DVFS and achieve a balance between performance and energy consumption. We consider two task execution modes: the batch mode, which runs jobs in batches, and the online mode in which jobs with different time constraints, arrival times, and computation workloads co-exist in the system. For tasks executed in the batch mode, we propose an algorithm that finds the optimal scheduling policy, and for the online mode, we present a heuristic algorithm that determines the execution order and processing speed of tasks in an online fashion. The heuristic ensures that the total cost is minimal for every time interval during a task's execution.","PeriodicalId":441115,"journal":{"name":"2014 43rd International Conference on Parallel Processing","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127951838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10