2010 39th International Conference on Parallel Processing最新文献

筛选
英文 中文
Parallel Exact Inference on a CPU-GPGPU Heterogenous System CPU-GPGPU异构系统的并行精确推理
2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.15
Hyeran Jeon, Yinglong Xia, V. Prasanna
{"title":"Parallel Exact Inference on a CPU-GPGPU Heterogenous System","authors":"Hyeran Jeon, Yinglong Xia, V. Prasanna","doi":"10.1109/ICPP.2010.15","DOIUrl":"https://doi.org/10.1109/ICPP.2010.15","url":null,"abstract":"Exact inference is a key problem in exploring probabilistic graphical models. The computational complexity of inference increases dramatically with the parameters of the graphical model. To achieve scalability over hundreds of threads remains a fundamental challenge. In this paper, we use a lightweight scheduler hosted by the CPU to allocate cliques in junction trees to the GPGPU at run time. The scheduler merges multiple small cliques or splits large cliques dynamically so as to maximize the utilization of the GPGPU resources. We implement node level primitves on the GPGPU to process the cliques assigned by the CPU. We propose a conflict free potential table organization and an efficient data layout for coalescing memory accesses. In addition, we develop a double buffering based asynchronous data transfer between CPU and GPGPU to overlap clique processing on the GPGPU with data transfer and scheduling activities. Our implementation achieved 30X speedup compared with state-of-the-art multicore processors.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126282021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
System-Level, Unified In-band and Out-of-band Dynamic Thermal Control 系统级,统一带内带外动态热控制
2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.22
Dong Li, Rong Ge, K. Cameron
{"title":"System-Level, Unified In-band and Out-of-band Dynamic Thermal Control","authors":"Dong Li, Rong Ge, K. Cameron","doi":"10.1109/ICPP.2010.22","DOIUrl":"https://doi.org/10.1109/ICPP.2010.22","url":null,"abstract":"High-density computer racks become increasingly commonplace in supercomputing centers and data centers. With tight integration of high-powered computing components in the racks, hot spots or pockets of elevated temperatures on the chips and system can be easily formed when room air circulation is not effective. Hot spots reduce the reliability of high-density systems and increase the chances of thermal emergencies, which further trigger system slowdowns or shutdowns. Techniques such as dynamically scaling down the voltage of the CPUs and fan control are available on today’s systems to reduce heat generation and dissipate heat. Unfortunately, these techniques work independently on their own without cooperation. As a result, to prevent thermal emergencies, systems may work at reduced capacity when full capacity is required. We propose a combined in-band and out-of-band approach to reduce the likelihood of thermal emergency slowdowns and improve the reliability of systems. Our thermal control framework unifies temperature control mechanisms in systems to balance temperature, power consumption, and performance. More precisely, we balance the use of in-band dynamic voltage and frequency scaling (DVFS) with out-of-band proactive fan control. Our results on a power-aware cluster indicate the coordinated use of fan control and DVFS is more effective than either technique in isolation at reducing average system operating temperatures with expected performance.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128645178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Model-Driven Traffic Data Acquisition in Vehicular Sensor Networks 基于模型驱动的车辆传感器网络交通数据采集
2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.50
Chih-Chieh Hung, Wen-Chih Peng
{"title":"Model-Driven Traffic Data Acquisition in Vehicular Sensor Networks","authors":"Chih-Chieh Hung, Wen-Chih Peng","doi":"10.1109/ICPP.2010.50","DOIUrl":"https://doi.org/10.1109/ICPP.2010.50","url":null,"abstract":"In recent years, the global position system (GPS) is widely used in technical products, such as navigation devices, GPS loggers, PDAs and mobile phones. Hence, traffic data collection platforms are proposed to collect GPS data points for traffic monitoring. In traffic data collection platforms, each vehicle equips with GPS modules and the wireless communication interfaces, such as 3G or WiFi networks, and the GPS data sensed (e.g., the speed and the position) are sent to the server. One challenge issue is that if a significant number of vehicles upload their GPS data points at the same time, it is possible that the wireless network cannot offer enough network resources for simultaneous network connections. This paper proposes a framework MDC (standing for Model-based Data Collection) to reduce the amount of data transmission and the number of vehicles reporting their GPS data points. The MDC framework is executed at the server and vehicle side collaboratively. In the vehicle side, given a series of GPS data points, model functions are derived to represent the raw GPS data points. Hence, each vehicle could report some coefficients that describe its movements instead of reporting all position information. Since vehicles move along with road segments that are usually a set of line segments, algorithm LR (standing for Liner Regression) is proposed to determine a set of line functions to represent movements of vehicles. By observing the spatial-temporal locality in traffic data, algorithm KR (standing for Kernel Regression) is developed to derive a set of kernel functions to model a series of speed readings sensed. Moreover, with the spatial-temporal locality feature in traffic data, an in-network aggregation mechanism are proposed to determine a set of groups and for each group, only one vehicle needs to report traffic data, thereby further reducing the number of simultaneous connections. Experimental results show that MDC can collect traffic data effectively and the efficiently.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125845235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Optimal Overlay Construction on Heterogeneous Live Peer-to-Peer Streaming Systems 异构实时点对点流系统的最优覆盖结构
2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.77
Min Yang, Yuanyuan Yang
{"title":"Optimal Overlay Construction on Heterogeneous Live Peer-to-Peer Streaming Systems","authors":"Min Yang, Yuanyuan Yang","doi":"10.1109/ICPP.2010.77","DOIUrl":"https://doi.org/10.1109/ICPP.2010.77","url":null,"abstract":"Media streaming is an important Internet application and has received more and more attention in recent years. Traditional media streaming systems are deployed in a server-client mode which scales poorly with the increasing population of the clients. Peer-to-peer media streaming can greatly enhance the scalability of the system by employing the clients to help forward the media content. In this paper, we consider optimizing the overlay construction for peer-to-peer streaming systems with heterogeneous access link bandwidths. Our goal is to maximize the total downloading rate and satisfy the heterogeneous downloading requirements when the uplink bandwidth is limited. We first formalize it into a problem of finding maximum number of edge disjoint trees in a graph which models the peers and their access link bandwidths. Then we give a centralized heuristic algorithm to solve the problem. Based on the centralized algorithm, we further propose a distributed algorithm which constructs an adaptive overlay topology that can adapt itself to the changing peers such that the end-to-end delay and link stress are minimized. We compare our scheme with another recently proposed scheme called MDM through simulations. Our simulation results show that the proposed scheme outperforms MDM by about 30% with respect to the average peer satisfaction. In addition, the proposed scheme achieves less link stress than MDM.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130726922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Extending the Monte Carlo Processor Modeling Technique: Statistical Performance Models of the Niagara 2 Processor 扩展蒙特卡罗处理器建模技术:Niagara 2处理器的统计性能模型
2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.44
Waleed Alkohlani, Jeanine E. Cook, R. Srinivasan
{"title":"Extending the Monte Carlo Processor Modeling Technique: Statistical Performance Models of the Niagara 2 Processor","authors":"Waleed Alkohlani, Jeanine E. Cook, R. Srinivasan","doi":"10.1109/ICPP.2010.44","DOIUrl":"https://doi.org/10.1109/ICPP.2010.44","url":null,"abstract":"With the complexity of contemporary single- and multi-core, multi-threaded processors comes a greater need for faster methods of performance analysis and design. It is no longer practical to use only cycle-accurate processor simulators for design space analysis of modern processors and systems. Therefore, we propose a statistical processor modeling method that is based on Monte Carlo techniques. In this paper, we present new details of the methodology and the recent extensions that we have made to it, including the capability to model multi-core processors. We detail the steps to develop a new model and then present statistical performance models of the Sun Niagara 2 processor micro-architecture that, together with a previously published Itanium 2 Monte Carlo model, demonstrates the validity of the technique and its new capabilities. We show that we can accurately predict single and multi-core performance within 7% of actual on average, and we can use the models to quickly pinpoint performance problems at various components.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116776270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Machine Learning Approach for Optimizing Parallel Logic Simulation 优化并行逻辑仿真的机器学习方法
2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.62
S. Meraji, C. Tropper
{"title":"A Machine Learning Approach for Optimizing Parallel Logic Simulation","authors":"S. Meraji, C. Tropper","doi":"10.1109/ICPP.2010.62","DOIUrl":"https://doi.org/10.1109/ICPP.2010.62","url":null,"abstract":"Parallel discrete event simulation can be applied as a fast and cost effective approach for the gate level simulation of current VLSI circuits. In this paper we combine a dynamic load balancing algorithm and a bounded window algorithm for optimistic gate level simulation. The bounded time window prevents the simulation from being too optimistic and from excessive rollbacks. We utilize a machine learning algorithm (Qlearning) to effect this combination. We introduce two dynamic load-balancing algorithms for balancing the communication and computational load and use two learning agents to combine these algorithms. One learning agent combines the two learning algorithms and learns their corresponding parameters, while the second optimizes the value of the time window. Experimental results show up to a 46% improvement in the simulation time using this combined algorithm for several open source circuits. To the best of our knowledge, this is the first time that Q-learning has been used to optimize an optimistic gate level simulation.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131944758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Efficient Work Stealing for Fine Grained Parallelism 细粒度并行的高效工作窃取
2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.39
Karl-Filip Faxén
{"title":"Efficient Work Stealing for Fine Grained Parallelism","authors":"Karl-Filip Faxén","doi":"10.1109/ICPP.2010.39","DOIUrl":"https://doi.org/10.1109/ICPP.2010.39","url":null,"abstract":"This paper deals with improving the performance of fine grain task parallelism. It is often either cumbersome or impossible to increase the grain size of such programs. Increasing core counts exacerbates the problem; a program that appears coarse-grained on eight cores may well look a lot more fine-grained on sixty four. In this paper we present the direct task stack, a novel work stealing algorithm with unusually low overheads, both for creating tasks and for stealing. We compare the performance of our scheduler to Cilk++, the icc implementation of OpenMP 3.0 and the Intel TBB library on an eight core, dual socket Opteron machine. We also analyze the reasons why our techniques achieve consistent speed ups over the other systems ranging from 2-3x on many fine grained workloads to over 50 in extreme cases and show quantitatively how each of the techniques we use contribute to the improved performance.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"208 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122617181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
Energy Modeling of Wireless Sensor Nodes Based on Petri Nets 基于Petri网的无线传感器节点能量建模
2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.19
Ali Shareef, Yifeng Zhu
{"title":"Energy Modeling of Wireless Sensor Nodes Based on Petri Nets","authors":"Ali Shareef, Yifeng Zhu","doi":"10.1109/ICPP.2010.19","DOIUrl":"https://doi.org/10.1109/ICPP.2010.19","url":null,"abstract":"Energy minimization is of great importance in wireless sensor networks in extending the battery lifetime. Accurately understanding the energy consumption characteristics of each sensor node is a critical step for the design of energy saving strategies. This paper develops a detailed probabilistic model based on Petri nets to evaluate the energy consumption of a wireless sensor node. The model factors critical components of a sensor node, including processors with emerging energy-saving features, wireless communication components, and an open or closed workload generator. Experimental results show that this model is more flexible and accurate than Markov models. The model provides a useful simulation platform to study energy-saving strategies in wireless sensor networks.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121423271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Distributing a Metric-Space Search Index onto Processors 在处理器上分配度量空间搜索索引
2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.51
Mauricio Marín, Flavio Ferrarotti, V. Gil-Costa
{"title":"Distributing a Metric-Space Search Index onto Processors","authors":"Mauricio Marín, Flavio Ferrarotti, V. Gil-Costa","doi":"10.1109/ICPP.2010.51","DOIUrl":"https://doi.org/10.1109/ICPP.2010.51","url":null,"abstract":"This paper studies the problem of distributing a metric-space search index based on compact clustering onto a set of distributed memory processors. The aim is enabling efficient similarity search in large-scale Web search engines. The index data structure is composed of a set of clusters enclosing the database objects and we propose distribution methods based on two different solution approaches. The first one makes use of specific knowledge about the work-load generated by user queries. Here the challenge is how to represent and use such a knowledge into a method capable of producing a cluster distribution leading to high performance. The second one follows a novel direction by completely disregarding user behavior to look instead at the relationships among the index clusters themselves to decide their placement onto processors. Both methods perform efficiently depending on the context and they are generic enough to be applied to different distributed index data structures for metric-space databases.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129303611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters InfiniBand集群的功耗感知集体通信算法设计
2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.78
K. Kandalla, E. Mancini, S. Sur, D. Panda
{"title":"Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters","authors":"K. Kandalla, E. Mancini, S. Sur, D. Panda","doi":"10.1109/ICPP.2010.78","DOIUrl":"https://doi.org/10.1109/ICPP.2010.78","url":null,"abstract":"Modern supercomputing systems have witnessed a phenomenal growth in the recent history owing to the advent of multi-core architectures and high speed networks. However, the operational and maintenance costs of these systems have also grown rapidly. Several concepts such as Dynamic Voltage and Frequency Scaling (DVFS) and CPU Throttling have been proposed to conserve the power consumed by the compute nodes during idle periods. However, it is necessary to design software stacks in a power-aware manner to minimize the amount of power drawn by the system during the execution of applications. It is also critical to minimize the performance overheads associated with power-aware algorithms, as the benefits of saving power could be lost if the application runs for a longer time. Modern multi-core architectures such as the Intel “Nehalem” allow for DVFS and CPU throttling operations to be performed with little overheads. In this paper, we explore how these features can be leveraged to design algorithms to deliver fine-grained power savings during the communication phases of parallel applications. We also propose a theoretical model to analyze the power consumption characteristics of communication operations. We use microbenchmarks and application benchmarks such as NAS and CPMD to measure the performance of our proposed algorithms and to demonstrate the potential for saving power with 32 and 64 processes. We observe about 8% improvement in the overall energy consumed by these applications with little performance overheads.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115244616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信