2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum最新文献

筛选
英文 中文
Real-Time Monitoring of Multicore SoCs through Specialized Hardware Agents on NoC Network Interfaces 基于NoC网络接口的专用硬件代理的多核soc实时监控
Georgios Kornaros, D. Pnevmatikatos
{"title":"Real-Time Monitoring of Multicore SoCs through Specialized Hardware Agents on NoC Network Interfaces","authors":"Georgios Kornaros, D. Pnevmatikatos","doi":"10.1109/IPDPSW.2012.27","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.27","url":null,"abstract":"Network-on-chip based multicore systems need efficient management of a multitude of processing resources, hence avoiding hardware and system software from making inefficient time- and power-decisions at runtime. Hardware event management is a necessary path to assist in high-speed management of captured events and enable efficient reaction mechanisms. This paper proposes different micro architecture alternatives and describes an infrastructure for real-time monitoring and management of network-on-chip based systems. High-speed and energy efficient circuit techniques are deployed for monitoring agents that reside at the network interfaces in order to be configured dynamically and communicate computed statistics to centralized hardware monitor managers of different functionality and complexity. An implementation of a pipelined centralized monitor manager is shown, with the capacity to maintain event ordering and process different types of concurrent events. A single event is served with a latency of seven clock cycles. The presented results of a quantitative evaluation provide guidelines for system-level designers, proving the need for flexible and at the same time efficient filters for real-time monitors inside complex NoC-based SoCs.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124038888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
An Optimized Reconfigurable System for Computing the Phylogenetic Likelihood Function on DNA Data 基于DNA数据计算系统发育似然函数的优化可重构系统
S. Berger, Nikolaos S. Alachiotis, A. Stamatakis
{"title":"An Optimized Reconfigurable System for Computing the Phylogenetic Likelihood Function on DNA Data","authors":"S. Berger, Nikolaos S. Alachiotis, A. Stamatakis","doi":"10.1109/IPDPSW.2012.43","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.43","url":null,"abstract":"The Phylogenetic Likelihood Function (PLF) is an important statistical function for evaluating phylogenetic trees. To this end, the PLF is the computational kernel of all state-of-the-art likelihood-based phylogenetic inference programs. Typically, it accounts for more than 85% of total execution time in such programs. We present a substantially improved hardware architecture for computing the PLF based on previous experiences with implementing the PLF on reconfigurable logic. Our new design is optimized for computing the PLF on four-state (DNA) input data. It is also adapted to the computational requirements of real-world tree inference programs and completely independent of the specific tree search algorithm at hand. Furthermore, we describe how our architecture can be modified and adapted to handle general n-state data, such as protein (20 states) or RNA secondary structure data (6, 7, or 16 states, depending on the model). Finally, we designed an interface mechanism such that our PLF hardware architecture can interact with the widely-used phylogenetic inference tool RAxML. We deploy FPGA technology to verify the correctness of the architecture and to evaluate performance.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127744097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A Fault-Tolerant Target-Tracking Strategy Based on Unreliable Sensing in Wireless Sensor Networks 基于不可靠感知的无线传感器网络容错目标跟踪策略
Yi Xie, Guoming Tang, Daifei Wang, W. Xiao, Daquan Tang, Jiuyang Tang
{"title":"A Fault-Tolerant Target-Tracking Strategy Based on Unreliable Sensing in Wireless Sensor Networks","authors":"Yi Xie, Guoming Tang, Daifei Wang, W. Xiao, Daquan Tang, Jiuyang Tang","doi":"10.1109/IPDPSW.2012.261","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.261","url":null,"abstract":"Focusing on the unreliable sensing phenomenon in wireless sensor networks and its impact on target-tracking accuracy, this paper first analyzes the uncertain area and its boundaries. Then the monitor area can be divided into faces by these uncertain boundaries and each face has an identical signature vector. On the other hand, for each target localization, any pair-wise nodes' RSS is ordinal or flipped can be determined by multiple grouping samplings and the sampling vector is built. Hence, the Fault-Tolerant Target-Tracking (FTTT) strategy is proposed, which transforms the tracking problem into a vector matching process in order to improve the tracking flexibility, increase the tracking accuracy and reduce the influence of in-the-filed factors. In addition, a heuristic matching algorithm is introduced to reduce the computational complexity. Results have shown that FTTT is more flexible and has higher tracking accuracy than congenerous methods.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127995947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Coverage-aware Geocast Routing in Urban Vehicular Networks 城市车辆网络中覆盖感知的地理广播路由
Ruobing Jiang, Yanmin Zhu
{"title":"Coverage-aware Geocast Routing in Urban Vehicular Networks","authors":"Ruobing Jiang, Yanmin Zhu","doi":"10.1109/IPDPSW.2012.317","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.317","url":null,"abstract":"Geo cast routing in vehicular ad hoc networks plays an important role as the basis of applications such as traffic information sharing, emergency alarming, and geographic advertisement. It is quite challenging, however, to geo cast packets through multi-hop relay vehicles because of the highly dynamic network topology, large scale city road system and fast moving vehicles. Our idea is to measure vehicles' coverage capability and forward packets to those vehicles with higher probability to successfully deliver the packets. The idea is rooted in the widely accepted concept that vehicular trajectories improve packet routing and the fact that vehicular trajectories are nowadays available through widely used navigation system. To accomplish the idea, the difficulty is to measure the coverage capability of a vehicle over a specific region with only partially available vehicular trajectories without accurate timing information. We propose a novel coverage graph to maintain collected trajectories of all the encountered vehicles and their most update timing information so that the extended coverage capability of each vehicle can be estimated. The coverage graph is constructed in a distributed way based on locally shared information and the packet forwarding decisions can be adaptively made to meet different routing objectives.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125992545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Simulating the Spread of Infectious Disease over Large Realistic Social Networks Using Charm++ 使用Charm++模拟传染病在大型现实社会网络中的传播
K. Bisset, Ashwin M. Aji, Eric J. Bohm, L. Kalé, Tariq Kamal, M. Marathe, Jae-Seung Yeom
{"title":"Simulating the Spread of Infectious Disease over Large Realistic Social Networks Using Charm++","authors":"K. Bisset, Ashwin M. Aji, Eric J. Bohm, L. Kalé, Tariq Kamal, M. Marathe, Jae-Seung Yeom","doi":"10.1109/IPDPSW.2012.65","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.65","url":null,"abstract":"Preventing and controlling outbreaks of infectious diseases such as pandemic influenza is a top public health priority. EpiSimdemics is an implementation of a scalable parallel algorithm to simulate the spread of contagion, including disease, fear and information, in large (108 individuals), realistic social contact networks using individual-based models. It also has a rich language for describing public policy and agent behavior. We describe CharmSimdemics and evaluate its performance on national scale populations. Charm++ is a machine independent parallel programming system, providing high-level mechanisms and strategies to facilitate the task of developing highly complex parallel applications. Our design includes mapping of application entities to tasks, leveraging the efficient and scalable communication, synchronization and load balancing strategies of Charm++. Our experimental results on a 768 core system show that the Charm++ version achieves up to a 4-fold increase in performance when compared to the MPI version.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132106319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
LDPLFS: Improving I/O Performance without Application Modification LDPLFS:无需修改应用程序即可提高I/O性能
Steven A. Wright, S. Hammond, S. Pennycook, I. Miller, J. Herdman, S. Jarvis
{"title":"LDPLFS: Improving I/O Performance without Application Modification","authors":"Steven A. Wright, S. Hammond, S. Pennycook, I. Miller, J. Herdman, S. Jarvis","doi":"10.1109/IPDPSW.2012.172","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.172","url":null,"abstract":"Input/Output (I/O) operations can represent a significant proportion of run-time when large scientific applications are run in parallel and at scale. In order to address the growing divergence between processing speeds and I/O performance, the Parallel Log-structured File System (PLFS) has been developed by EMC Corporation and the Los Alamos National Laboratory (LANL) to improve the performance of parallel file activities. Currently, PLFS requires the use of either (i) the FUSE Linux Kernel module, (ii) a modified MPI library with a customised ROMIO MPI-IO library, or (iii) an application rewrite to utilise the PLFS API directly. In this paper we present an alternative method of utilising PLFS in applications. This method employs a dynamic library to intercept the low-level POSIX operations and retarget them to use the equivalents offered by PLFS. We demonstrate our implementation of this approach, named LDPLFS, on a set of standard UNIX tools, as well on as a set of standard parallel I/O intensive mini-applications. The results demonstrate almost equivalent performance to a modified build of ROMIO and improvements over the FUSE-based approach. Furthermore, through our experiments we demonstrate decreased performance in PLFS when ran at scale on the Lustre file system.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130422592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
QoS-Oriented Data Dissemination in VANETs
Lifeng Zhang, Beihong Jin
{"title":"QoS-Oriented Data Dissemination in VANETs","authors":"Lifeng Zhang, Beihong Jin","doi":"10.1109/IPDPSW.2012.316","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.316","url":null,"abstract":"Data dissemination over long distances in urban scenarios is the foundation of many VANET applications, but rapid shifting in network topology, unstable quality of wireless communication and channel capacity constraints of VANETs pose many challenges to data dissemination. In response, we propose a connectivity-aware data delivery mechanism on the basis of an improved greedy broadcasting. Moreover, we present an in-network and hierarchical data aggregation mechanism to reduce the transferring of the redundant data which result from multi-source data collecting and multi-path data transmitting. Both mechanisms are intended to improve the qualities of data dissemination in VANETs either by enhancing the adaptability to varying traffic flows or by aggregating data in a hierarchical way.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"68 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130755740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
SWAPP: A Framework for Performance Projections of HPC Applications Using Benchmarks SWAPP:一个使用基准的高性能计算应用程序的性能预测框架
S. Sharkawi, Don DeSota, R. Panda, Stephen Stevens, V. Taylor, Xingfu Wu
{"title":"SWAPP: A Framework for Performance Projections of HPC Applications Using Benchmarks","authors":"S. Sharkawi, Don DeSota, R. Panda, Stephen Stevens, V. Taylor, Xingfu Wu","doi":"10.1109/IPDPSW.2012.214","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.214","url":null,"abstract":"Surrogate-based Workload Application Performance Projection (SWAPP) is a framework for performance projections of High Performance Computing (HPC) applications using benchmark data. Performance projections of HPC applications onto various hardware platforms are important for hardware vendors and HPC users. The projections aid hardware vendors in the design of future systems and help HPC users with system procurement. SWAPP assumes that one has access to a base system and only benchmark data for a target system, the target system is not available for running the HPC application. Projections are developed using the performance profiles of the benchmarks and application on the base system and the benchmark data for the target system. SWAPP projects the performances of compute and communication components separately then combine the two projections to get the full application projection. In this paper SWAPP was used to project the performance of three NAS Multi-Zone benchmarks onto three systems (an IBM POWER6 575 cluster and an IBM Intel West mere x5670 both using an Infiniband interconnect and an IBM Blue Gene/P with a 3D Torus and Collective Tree interconnects), the base system is an IBM POWER5+ 575 cluster. The projected performance of the three benchmarks was within 11.44% average error magnitude and standard deviation of 2.64% for the three systems.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130827841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Automatic Offloading C++ Expression Templates to CUDA Enabled GPUs 自动卸载c++表达式模板到CUDA支持的gpu
Jie Chen, B. Joó, W. Watson, R. Edwards
{"title":"Automatic Offloading C++ Expression Templates to CUDA Enabled GPUs","authors":"Jie Chen, B. Joó, W. Watson, R. Edwards","doi":"10.1109/IPDPSW.2012.293","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.293","url":null,"abstract":"In the last few years, many scientific applications have been developed for powerful graphics processing units (GPUs) and have achieved remarkable speedups. This success can be partially attributed to high performance host callable GPU library routines that are offloaded to GPUs at runtime. These library routines are based on C/C++-like programming toolkits such as CUDA from NVIDIA and have the same calling signatures as their CPU counterparts. Recently, with the sufficient support of C++ templates from CUDA, the emergence of template libraries have enabled further advancement in code reusability and rapid software development for GPUs. However, Expression Templates (ET), which have been very popular for implementing data parallel scientific software for host CPUs because of their intuitive and mathematics-like syntax, have been underutilized by GPU development libraries. The lack of ET usage is caused by the difficulty of offloading expression templates from hosts to GPUs due to the inability to pass instantiated expressions to GPU kernels as well as the absence of the exact form of the expressions for the templates at the time of coding. This paper presents a general approach that enables automatic offloading of C++ expression templates to CUDA enabled GPUs by using the C++ metaprogramming technique and Just-In-Time (JIT) compilation methodology to generate and compile CUDA kernels for corresponding expression templates followed by executing the kernels with appropriate arguments. This approach allows developers to port applications to run on GPUs with virtually no code modifications. More specifically, this paper uses a large ET based data parallel physics library called QDP++ as an example to illustrate many aspects of the approach to offload expression templates automatically and to demonstrate very good speedups for typical QDP++ applications running on GPUs against running on CPUs using this method of offloading. In addition, this approach of automatic offloading expression templates could be applied to other many-core accelerators that provide C++ programming toolkits with the support of C++ template.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133139272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Scalable and Efficient Associative Processor Solution to Guarantee Real-Time Requirements for Air Traffic Control Systems 可扩展和高效的关联处理器解决方案,以保证空中交通管制系统的实时性要求
M. Yuan, J. Baker, W. Meilander, K. Schaffer
{"title":"Scalable and Efficient Associative Processor Solution to Guarantee Real-Time Requirements for Air Traffic Control Systems","authors":"M. Yuan, J. Baker, W. Meilander, K. Schaffer","doi":"10.1109/IPDPSW.2012.210","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.210","url":null,"abstract":"This paper proposes a solution to air traffic control (ATC) using an enhanced SIMD machine model called an Associative Processor (AP). Our solution differs from previous ATC systems that are designed for MIMD computers and have a great deal of difficulty meeting the predictability requirements for ATC, which are critical for meeting the strict certification standards required for safety critical software components. The proposed AP solution supports accurate predictions of worst case execution times and guarantees all deadlines are met. Furthermore, the software developed based on the AP model is much simpler and smaller in size than the current corresponding ATC software. As the associative processor is built from SIMD hardware, it is considerably cheaper and simpler than the MIMD hardware currently used to support ATC. We have designed a prototype for eight ATC real-time tasks on Clear Speed CSX600 accelerator that is used to emulate AP. Performance is evaluated in terms of execution time and predictability and is compared to the fastest host-only version implemented using OpenMP on an 8-core multiprocessor (MIMD). Our extensive experiments show that the AP implementation meets all deadlines that can be statically scheduled. To the contrary, some tasks miss their deadlines when implemented on MIMD. It is shown that the proposed AP solution will support accurate and meaningful predictions of worst case execution times and will guarantee that all deadlines are met.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134465180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信