2014 43rd International Conference on Parallel Processing最新文献

筛选
英文 中文
NetMaster: Taming Energy Devourers on Smartphones NetMaster:驯服智能手机上的能源消耗者
2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI: 10.1109/ICPP.2014.39
Yi Zhang, Yuan He, Xiaopei Wu, Yunhao Liu, Wenbo He
{"title":"NetMaster: Taming Energy Devourers on Smartphones","authors":"Yi Zhang, Yuan He, Xiaopei Wu, Yunhao Liu, Wenbo He","doi":"10.1109/ICPP.2014.39","DOIUrl":"https://doi.org/10.1109/ICPP.2014.39","url":null,"abstract":"Smartphones nowadays are installed with diverse applications, each of which consumes energy and bandwidth. As more and more applications are crowded into a smart- phone, they cause serious problems with regard to battery life and bandwidth utilization. Existing proposals to tackle such challenges usually resort to two ways: avoiding energy- consuming network activities or improving communication efficiency in terms of power consumption. Those approaches either affect the smartphone users' experience, or offer little benefit in prolonging the battery life. Motivated by insightful understanding of users' habit, we in this paper propose a novel approach to orchestrate network activities of smartphone applications, based on user's habit. We implement our approach on smartphones as a middleware service called NetMaster. The performance evaluation with real traces shows that NetMaster reduces energy consumption of network activities by 77.8% in average and increases network bandwidth utilization by over 200%. The user experience is surprisingly well preserved. The chance of undesired interrupt during normal usage is less than 1%.","PeriodicalId":441115,"journal":{"name":"2014 43rd International Conference on Parallel Processing","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126519179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Case for Resource Efficient Prefetching in Multicores 多核环境下资源高效预取的一个案例
2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI: 10.1109/ICPP.2014.19
Muneeb Khan, Andreas Sandberg, Erik Hagersten
{"title":"A Case for Resource Efficient Prefetching in Multicores","authors":"Muneeb Khan, Andreas Sandberg, Erik Hagersten","doi":"10.1109/ICPP.2014.19","DOIUrl":"https://doi.org/10.1109/ICPP.2014.19","url":null,"abstract":"Modern processors typically employ sophisticated prefetching techniques for hiding memory latency. Hardware prefetching has proven very effective and can speed up some SPEC CPU 2006 benchmarks by more than 40% when running in isolation. However, this speedup often comes at the cost of prefetching a significant volume of useless data (sometimes more than twice the data required) which wastes shared last level cache space and off-chip bandwidth. This paper explores how an accurate resource-efficient prefetching scheme can benefit performance by conserving shared resources in multicores. We present a framework that uses low-overhead runtime sampling and fast cache modeling to accurately identify memory instructions that frequently miss in the cache. We then use this information to automatically insert software prefetches in the application. Our prefetching scheme has good accuracy and employs cache bypassing whenever possible. These properties help reduce off-chip bandwidth consumption and last-level cache pollution. While single-thread performance remains comparable to hardware prefetching, the full advantage of the scheme is realized when several cores are used and demand for shared resources grows. We evaluate our method on two modern commodity multicores. Across 180 mixed workloads that fully utilize a multicore, the proposed software prefetching mechanism achieves up to 24% better throughput than hardware prefetching, and performs 10% better on average.","PeriodicalId":441115,"journal":{"name":"2014 43rd International Conference on Parallel Processing","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122270620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Fast Parallel Algorithms for Edge-Switching to Achieve a Target Visit Rate in Heterogeneous Graphs 异构图中实现目标访问速率的快速并行边交换算法
2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI: 10.1109/ICPP.2014.15
Md Hasanuzzaman Bhuiyan, Jiangzhuo Chen, Maleq Khan, M. Marathe
{"title":"Fast Parallel Algorithms for Edge-Switching to Achieve a Target Visit Rate in Heterogeneous Graphs","authors":"Md Hasanuzzaman Bhuiyan, Jiangzhuo Chen, Maleq Khan, M. Marathe","doi":"10.1109/ICPP.2014.15","DOIUrl":"https://doi.org/10.1109/ICPP.2014.15","url":null,"abstract":"An edge switch is an operation on a network (graph) where two edges are selected randomly and one of their end vertices are swapped with each other. Usually, a sequence of these operations are performed to generate network perturbations having the same degree sequence of the original network. Edge switch operations have important applications in graph theory and network analysis, such as in generating random networks with a given degree sequence, modeling and analyzing dynamic networks (e.g., peer-to-peer networks), studying various dynamic phenomena over a network (e.g., disease dynamics over a social contact network). The growth of real-world networks motivates the need to develop efficient parallel algorithms for performing a large sequence of edge switch operations. The dependencies among successive edge switch operations and the requirement of keeping the graph simple (i.e., no self-loops or parallel edges) as the edges are switched lead to significant challenges in designing a parallel algorithm. Addressing these challenges requires complex synchronization and communication among the processors. In this paper, we present a distributed memory parallel algorithm for switching edges in massive networks (networks with billions of edges) and achieve a speedup factor of 85 with 1024 processors. One of the steps in our edge switch algorithm requires the computation of multinomial random variables in parallel. The paper presents the first non-trivial parallel algorithm for the problem. The algorithm achieves a speedup of 925 using 1024 processors.","PeriodicalId":441115,"journal":{"name":"2014 43rd International Conference on Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131166646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Energy-Aware Scheduling for Aperiodic Tasks on Multi-core Processors 多核处理器上非周期任务的能量感知调度
2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI: 10.1109/ICPP.2014.45
Dawei Li, Jie Wu
{"title":"Energy-Aware Scheduling for Aperiodic Tasks on Multi-core Processors","authors":"Dawei Li, Jie Wu","doi":"10.1109/ICPP.2014.45","DOIUrl":"https://doi.org/10.1109/ICPP.2014.45","url":null,"abstract":"As the performance of modern multi-core processors increases, the energy consumption in these systems also increases significantly. Dynamic Voltage and Frequency Scaling (DVFS) is considered an efficient scheme for achieving the goal of saving energy. In this paper, we consider scheduling a set of independent aperiodic tasks, whose release times, deadlines and execution requirements are arbitrarily given, on DVFS-enabled multi-core processors. Our goal is to meet the execution requirements of all the tasks, and to minimize the overall energy consumption on the processor. Instead of seeking optimal solutions with high complexity, we aim to design lightweight algorithms suitable for real-time systems, with good performances. By applying a subinterval-based method, we come up with a simple algorithm to allocate tasks' available execution times during a heavily overlapped subinterval based on their desired execution requirement during that subinterval. Based on the allocated available execution times, we further consider the final frequency setting and task scheduling, which guarantee that all tasks meet their execution requirements, and tries to minimize the overall energy consumption. Extensive simulations for various platform and task characteristics and evaluations using a practical processor's power configuration indicate that our proposed algorithm has a good performance in terms of saving processor energy, though it has low complexity. Besides, the proposed algorithm is easy to be implemented in practical systems.","PeriodicalId":441115,"journal":{"name":"2014 43rd International Conference on Parallel Processing","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132824159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Scaling the ISAM Land Surface Model through Parallelization of Inter-component Data Transfer 基于组件间数据传输并行化的ISAM地表模型缩放
2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI: 10.1109/ICPP.2014.51
P. Miller, Michael P. Robson, B. El-Masri, R. Barman, G. Zheng, Atul K. Jain, L. Kalé
{"title":"Scaling the ISAM Land Surface Model through Parallelization of Inter-component Data Transfer","authors":"P. Miller, Michael P. Robson, B. El-Masri, R. Barman, G. Zheng, Atul K. Jain, L. Kalé","doi":"10.1109/ICPP.2014.51","DOIUrl":"https://doi.org/10.1109/ICPP.2014.51","url":null,"abstract":"We present the progression of developments necessary to scale the ISAM landsurface model from single nodes and small clusters with unusually largeper-node memory to much larger systems with more common configurations. These efforts include load balancing, conventional library-based output parallelization to reduce memory load, and parallel-in-time data input. On Hopper, a Cray XE6 machine, the result was strong scaling from 256 cores to 16k coreswith an efficiency of 32.9%. On Edison, a Cray XC30 machine, the code strong scales from 256 cores to 16k cores with an efficiency of 51.4%. These large-scale gains, and the associated performance increases at smaller scale, enable greater scientific productivity for the users of ISAM and open the possibilities of increased resolution in time and space and greater physical fidelity for the simulated processes while remaining computationally feasible.","PeriodicalId":441115,"journal":{"name":"2014 43rd International Conference on Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129451561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
High-Performance Inverse Modeling with Reverse Monte Carlo Simulations 高性能的反向建模与反向蒙特卡罗模拟
2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI: 10.1109/ICPP.2014.29
Abhinav Sarje, X. Li, A. Hexemer
{"title":"High-Performance Inverse Modeling with Reverse Monte Carlo Simulations","authors":"Abhinav Sarje, X. Li, A. Hexemer","doi":"10.1109/ICPP.2014.29","DOIUrl":"https://doi.org/10.1109/ICPP.2014.29","url":null,"abstract":"In the field of nanoparticle material science, X-ray scattering techniques are widely used for characterization of macromolecules and particle systems (ordered, partially-ordered or custom) based on their structural properties at the micro- and nano-scales. Numerous applications utilize these, including design and fabrication of energy-relevant nanodevices such as photovoltaic and energy storage devices. Due to its size, analysis of raw data obtained through present ultra-fast light beamlines and X-ray scattering detectors has been a primary bottleneck in such characterization processes. To address this hurdle, we are developing high-performance parallel algorithms and codes for analysis of X-ray scattering data for several of the scattering methods, such as the Small Angle X-ray Scattering (SAXS), which we talk about in this paper. As an inverse modeling problem, structural fitting of the raw data obtained through SAXS experiments is a method used for extracting meaningful information on the structural properties of materials. Such fitting processes involve a large number of variable parameters and, hence, require a large amount of computational power. In this paper, we focus on this problem and present a high-performance and scalable parallel solution based on the Reverse Monte Carlo simulation algorithm, on highly-parallel systems such as clusters of multicore CPUs and graphics processors. We have implemented and optimized our algorithm on generic multi-core CPUs as well as the Nvidia GPU architectures with C++ and CUDA. We also present detailed performance results and computational analysis of our code.","PeriodicalId":441115,"journal":{"name":"2014 43rd International Conference on Parallel Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128006790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Sampling-Based Hybrid Approximate Query Processing System in the Cloud 基于采样的云混合近似查询处理系统
2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI: 10.1109/ICPP.2014.38
Yuxiang Wang, Junzhou Luo, Aibo Song, Fang Dong
{"title":"A Sampling-Based Hybrid Approximate Query Processing System in the Cloud","authors":"Yuxiang Wang, Junzhou Luo, Aibo Song, Fang Dong","doi":"10.1109/ICPP.2014.38","DOIUrl":"https://doi.org/10.1109/ICPP.2014.38","url":null,"abstract":"Sampling-based approximate query processing method provides the way, in which the users can save their time and resources for 'Big Data' analytical applications, if the estimated results can satisfy the accuracy expectation earlier before a long wait for the final accurate results. Online aggregation (OLA) is such an attractive technology to respond aggregation queries by calculating approximate results with the confidence interval getting tighter over time. It has been built into the MapReuduce-based cloud system for big data analytics, which allows users to monitor the query progress and save money by killing the computation earlier once sufficient accuracy has been obtained. Unfortunately, there exists a major obstacle that is the estimation failure of OLA affects the OLA performance, which is resulted from the biased sample set that violates the unbiased assumption of OLA sampling. To handle this problem, we first propose a hybrid approximate query processing model to improve the overall OLA performance, where a dynamic scheme switching mechanism is deliberately designed to switch unpromising OLA queries into the bootstrap scheme for further processing, avoiding the whole dataset scanning resulted from the OLA estimation failure. In addition, we also present a progressive estimation method to reduce the false positive ratio of our dynamic scheme switching mechanism. Moreover, we have implemented our hybrid approximate query processing system in Hadoop, and conducted extensive experiments on the TPC-H benchmark for skewed data distribution. Our results demonstrate that our hybrid system can produce acceptable approximate results within a time period one order of magnitude shorter compared to the original OLA over Hadoop.","PeriodicalId":441115,"journal":{"name":"2014 43rd International Conference on Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128522059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Crystal: A Design-Time Resource Partitioning Method for Hybrid Main Memory 一种混合主存的设计时资源划分方法
2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI: 10.1109/ICPP.2014.18
Dmitry Knyaginin, G. Gaydadjiev, P. Stenström
{"title":"Crystal: A Design-Time Resource Partitioning Method for Hybrid Main Memory","authors":"Dmitry Knyaginin, G. Gaydadjiev, P. Stenström","doi":"10.1109/ICPP.2014.18","DOIUrl":"https://doi.org/10.1109/ICPP.2014.18","url":null,"abstract":"Non-Volatile Memory (NVM) technologies can be used to reduce system-level execution time, energy, or cost but they add a new design dimension. Finding the best amounts of DRAM and NVM in hybrid main memory systems is a nontrivial design-time issue, the best solution to which depends on many factors. Such resource partitioning between DRAM and NVM can be framed as an optimization problem where the minimum of a target metric is sought, trends matter more than absolute values, and thus the precision of detailed modeling is overkill. Here we present Crystal, an analytic approach to early and rapid design-time resource partitioning of hybrid main memories. Crystal provides first-order estimates of system-level execution time and energy, sufficient to enable exhaustive search of the best amount and type of NVM for given workloads and partitioning goals. Crystal thus helps system designers to quickly find the most promising hybrid configurations for detailed evaluation. E.g., Crystal shows how for specific workloads higher system-level performance and energy efficiency can be achieved by employing an NVM with the speed and energy consumption of NAND Flash instead of a much faster and more energy efficient NVM like phase-change memory.","PeriodicalId":441115,"journal":{"name":"2014 43rd International Conference on Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128831332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
An Infrastructure-less Vehicle Counting without Disruption 无基础设施、无中断的车辆计数
2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI: 10.1109/ICPP.2014.61
Jie Wu, Paul Sabatino, J. Tsan, Zhen Jiang
{"title":"An Infrastructure-less Vehicle Counting without Disruption","authors":"Jie Wu, Paul Sabatino, J. Tsan, Zhen Jiang","doi":"10.1109/ICPP.2014.61","DOIUrl":"https://doi.org/10.1109/ICPP.2014.61","url":null,"abstract":"This paper presents a solution to count all moving vehicles in a target region. This is a large-scale counting that cannot be easily solved without a global view. However, there is no single force that can provide such a global view. To achieve an accurate result without either double- or miscounting, the local counting at each checkpoint is synchronized in our wireless communication by using the information carried by vehicles along the traffic flow. Our analytical and experimental results illustrate the correctness of the proposed scheme in both closed and open road systems - even when the wireless signal is affected by many factors. In this way, we provide an essential support for the resource management in VANETs.","PeriodicalId":441115,"journal":{"name":"2014 43rd International Conference on Parallel Processing","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125688951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Energy-Efficient Task Scheduler for Multi-core Platforms with Per-core DVFS Based on Task Characteristics 基于任务特征的单核DVFS多核平台节能任务调度
2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI: 10.1109/ICPP.2014.47
Ching-Chi Lin, Chao-Jui Chang, You-Cheng Syu, Jan-Jan Wu, Pangfeng Liu, Po-Wen Cheng, W. Hsu
{"title":"An Energy-Efficient Task Scheduler for Multi-core Platforms with Per-core DVFS Based on Task Characteristics","authors":"Ching-Chi Lin, Chao-Jui Chang, You-Cheng Syu, Jan-Jan Wu, Pangfeng Liu, Po-Wen Cheng, W. Hsu","doi":"10.1109/ICPP.2014.47","DOIUrl":"https://doi.org/10.1109/ICPP.2014.47","url":null,"abstract":"Energy-efficient task scheduling is a fundamental issue in many application domains, such as energy conservation for mobile devices and the operation of green computing data centers. Modern processors support dynamic voltage and frequency scaling (DVFS) on a per-core basis, i.e., the CPU can adjust the voltage or frequency of each core. As a result, the core in a processor may have different computing power and energy consumption. To conserve energy in multi-core platforms, we propose task scheduling algorithms that leverage per-core DVFS and achieve a balance between performance and energy consumption. We consider two task execution modes: the batch mode, which runs jobs in batches, and the online mode in which jobs with different time constraints, arrival times, and computation workloads co-exist in the system. For tasks executed in the batch mode, we propose an algorithm that finds the optimal scheduling policy, and for the online mode, we present a heuristic algorithm that determines the execution order and processing speed of tasks in an online fashion. The heuristic ensures that the total cost is minimal for every time interval during a task's execution.","PeriodicalId":441115,"journal":{"name":"2014 43rd International Conference on Parallel Processing","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127951838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信